kaikaidai commited on
Commit
e098d1e
·
verified ·
1 Parent(s): 8bba8de

Update common.py

Browse files
Files changed (1) hide show
  1. common.py +13 -1
common.py CHANGED
@@ -163,9 +163,21 @@ We’d love to hear your feedback! For general feature requests or to submit / s
163
  \nPlease file any issues on our [Github](https://github.com/atla-ai/judge-arena)."""
164
 
165
 
 
 
 
 
 
 
 
 
 
 
 
 
166
 
167
  #**What are the Evaluator Prompt Templates based on?**
168
 
169
  #As a quick start, we've set up templates that cover the most popular evaluation metrics out there on LLM evaluation / monitoring tools, often known as 'base metrics'. The data samples used in these were randomly picked from popular datasets from academia - [ARC](https://huggingface.co/datasets/allenai/ai2_arc), [Preference Collection](https://huggingface.co/datasets/prometheus-eval/Preference-Collection), [RewardBench](https://huggingface.co/datasets/allenai/reward-bench), [RAGTruth](https://arxiv.org/abs/2401.00396).
170
 
171
- #These templates are designed as a starting point to showcase how to interact with the Judge Arena, especially for those less familiar with using LLM judges.
 
163
  \nPlease file any issues on our [Github](https://github.com/atla-ai/judge-arena)."""
164
 
165
 
166
+ # Default values for compatible mode
167
+ DEFAULT_EVAL_CRITERIA = """Evaluate the helpfulness of the chatbot response given the user's instructions. Focus on relevance, accuracy, and completeness while being objective. Do not consider response length in your evaluation."""
168
+
169
+ DEFAULT_SCORE_1 = "The response is unhelpful, providing irrelevant or incorrect content that does not address the request."
170
+
171
+ DEFAULT_SCORE_2 = "The response is partially helpful, missing key elements or including minor inaccuracies, and lacks depth in addressing the request."
172
+
173
+ DEFAULT_SCORE_3 = "The response is adequately helpful, correctly addressing the main request with relevant information and some depth."
174
+
175
+ DEFAULT_SCORE_4 = "The response is very helpful, addressing the request thoroughly with accurate and detailed content, but may lack a minor aspect of helpfulness."
176
+
177
+ DEFAULT_SCORE_5 = "The response is exceptionally helpful, providing precise, comprehensive content that fully resolves the request with insight and clarity."
178
 
179
  #**What are the Evaluator Prompt Templates based on?**
180
 
181
  #As a quick start, we've set up templates that cover the most popular evaluation metrics out there on LLM evaluation / monitoring tools, often known as 'base metrics'. The data samples used in these were randomly picked from popular datasets from academia - [ARC](https://huggingface.co/datasets/allenai/ai2_arc), [Preference Collection](https://huggingface.co/datasets/prometheus-eval/Preference-Collection), [RewardBench](https://huggingface.co/datasets/allenai/reward-bench), [RAGTruth](https://arxiv.org/abs/2401.00396).
182
 
183
+ #These templates are designed as a starting point to showcase how to interact with the Judge Arena, especially for those less familiar with using LLM judges.