Spaces:
Running
Running
Update common.py
Browse files
common.py
CHANGED
@@ -163,9 +163,21 @@ We’d love to hear your feedback! For general feature requests or to submit / s
|
|
163 |
\nPlease file any issues on our [Github](https://github.com/atla-ai/judge-arena)."""
|
164 |
|
165 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
166 |
|
167 |
#**What are the Evaluator Prompt Templates based on?**
|
168 |
|
169 |
#As a quick start, we've set up templates that cover the most popular evaluation metrics out there on LLM evaluation / monitoring tools, often known as 'base metrics'. The data samples used in these were randomly picked from popular datasets from academia - [ARC](https://huggingface.co/datasets/allenai/ai2_arc), [Preference Collection](https://huggingface.co/datasets/prometheus-eval/Preference-Collection), [RewardBench](https://huggingface.co/datasets/allenai/reward-bench), [RAGTruth](https://arxiv.org/abs/2401.00396).
|
170 |
|
171 |
-
#These templates are designed as a starting point to showcase how to interact with the Judge Arena, especially for those less familiar with using LLM judges.
|
|
|
163 |
\nPlease file any issues on our [Github](https://github.com/atla-ai/judge-arena)."""
|
164 |
|
165 |
|
166 |
+
# Default values for compatible mode
|
167 |
+
DEFAULT_EVAL_CRITERIA = """Evaluate the helpfulness of the chatbot response given the user's instructions. Focus on relevance, accuracy, and completeness while being objective. Do not consider response length in your evaluation."""
|
168 |
+
|
169 |
+
DEFAULT_SCORE_1 = "The response is unhelpful, providing irrelevant or incorrect content that does not address the request."
|
170 |
+
|
171 |
+
DEFAULT_SCORE_2 = "The response is partially helpful, missing key elements or including minor inaccuracies, and lacks depth in addressing the request."
|
172 |
+
|
173 |
+
DEFAULT_SCORE_3 = "The response is adequately helpful, correctly addressing the main request with relevant information and some depth."
|
174 |
+
|
175 |
+
DEFAULT_SCORE_4 = "The response is very helpful, addressing the request thoroughly with accurate and detailed content, but may lack a minor aspect of helpfulness."
|
176 |
+
|
177 |
+
DEFAULT_SCORE_5 = "The response is exceptionally helpful, providing precise, comprehensive content that fully resolves the request with insight and clarity."
|
178 |
|
179 |
#**What are the Evaluator Prompt Templates based on?**
|
180 |
|
181 |
#As a quick start, we've set up templates that cover the most popular evaluation metrics out there on LLM evaluation / monitoring tools, often known as 'base metrics'. The data samples used in these were randomly picked from popular datasets from academia - [ARC](https://huggingface.co/datasets/allenai/ai2_arc), [Preference Collection](https://huggingface.co/datasets/prometheus-eval/Preference-Collection), [RewardBench](https://huggingface.co/datasets/allenai/reward-bench), [RAGTruth](https://arxiv.org/abs/2401.00396).
|
182 |
|
183 |
+
#These templates are designed as a starting point to showcase how to interact with the Judge Arena, especially for those less familiar with using LLM judges.
|