zli12321 commited on
Commit
74f2202
Β·
verified Β·
1 Parent(s): 9f9885a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +16 -7
README.md CHANGED
@@ -14,10 +14,10 @@ pipeline_tag: text-classification
14
  [![PyPI version qa-metrics](https://img.shields.io/pypi/v/qa-metrics.svg)](https://pypi.org/project/qa-metrics/)
15
  [![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1Ke23KIeHFdPWad0BModmcWKZ6jSbF5nI?usp=sharing)
16
 
17
- > Check out the main [Repo](https://github.com/zli12321/qa_metrics)
18
-
19
  > A fast and lightweight Python package for evaluating question-answering models and prompting of black-box and open-source large language models.
20
 
 
 
21
  ## πŸŽ‰ Latest Updates
22
 
23
  - **Version 0.2.19 Released!**
@@ -30,6 +30,13 @@ pipeline_tag: text-classification
30
 
31
  ## πŸš€ Quick Start
32
 
 
 
 
 
 
 
 
33
  ### Prerequisites
34
  - Python >= 3.6
35
  - openai >= 1.0
@@ -51,9 +58,11 @@ Our package offers six QA evaluation methods with varying strengths:
51
  | [Open Source LLM Evaluation](https://huggingface.co/zli12321/prometheus2-2B) | All QA types | Free | High |
52
  | Black-box LLM Evaluation | All QA types | Paid | Highest |
53
 
 
 
54
  ## πŸ“– Documentation
55
 
56
- ### 1. Normalized Exact Match
57
 
58
  #### Method: `em_match`
59
  **Parameters**
@@ -71,7 +80,7 @@ candidate_answer = "The movie \"The Princess and the Frog\" is loosely based off
71
  match_result = em_match(reference_answer, candidate_answer)
72
  ```
73
 
74
- ### 2. F1 Score
75
 
76
  #### Method: `f1_score_with_precision_recall`
77
  **Parameters**
@@ -97,7 +106,7 @@ f1_stats = f1_score_with_precision_recall(reference_answer[0], candidate_answer)
97
  match_result = f1_match(reference_answer, candidate_answer, threshold=0.5)
98
  ```
99
 
100
- ### 3. PEDANTS
101
 
102
  #### Method: `get_score`
103
  **Parameters**
@@ -160,7 +169,7 @@ scores = pedant.get_scores(reference_answer, candidate_answer, question)
160
  match_result = pedant.evaluate(reference_answer, candidate_answer, question)
161
  ```
162
 
163
- ### 4. Transformer Neural Evaluation
164
 
165
  #### Method: `get_score`
166
  **Parameters**
@@ -206,7 +215,7 @@ tm = TransformerMatcher("zli12321/answer_equivalence_tiny_bert")
206
  match_result = tm.transformer_match(reference_answer, candidate_answer, question)
207
  ```
208
 
209
- ### 5. LLM Integration
210
 
211
  #### Method: `prompt_gpt`
212
  **Parameters**
 
14
  [![PyPI version qa-metrics](https://img.shields.io/pypi/v/qa-metrics.svg)](https://pypi.org/project/qa-metrics/)
15
  [![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1Ke23KIeHFdPWad0BModmcWKZ6jSbF5nI?usp=sharing)
16
 
 
 
17
  > A fast and lightweight Python package for evaluating question-answering models and prompting of black-box and open-source large language models.
18
 
19
+ > `pip install qa-metrics` is all you need!
20
+
21
  ## πŸŽ‰ Latest Updates
22
 
23
  - **Version 0.2.19 Released!**
 
30
 
31
  ## πŸš€ Quick Start
32
 
33
+ ## Table of Contents
34
+ * 1. [Normalized Exact Match](#em)
35
+ * 2. [Token F1 Score](#f1)
36
+ * 3. [PEDANTS](#pedants)
37
+ * 4. [Finetuned Neural Matching](#neural)
38
+ * 5. [Prompting LLM](#llm)
39
+
40
  ### Prerequisites
41
  - Python >= 3.6
42
  - openai >= 1.0
 
58
  | [Open Source LLM Evaluation](https://huggingface.co/zli12321/prometheus2-2B) | All QA types | Free | High |
59
  | Black-box LLM Evaluation | All QA types | Paid | Highest |
60
 
61
+
62
+
63
  ## πŸ“– Documentation
64
 
65
+ ### 1. <a name='em'></a>Normalized Exact Match
66
 
67
  #### Method: `em_match`
68
  **Parameters**
 
80
  match_result = em_match(reference_answer, candidate_answer)
81
  ```
82
 
83
+ ### 2. <a name='f1'></a>F1 Score
84
 
85
  #### Method: `f1_score_with_precision_recall`
86
  **Parameters**
 
106
  match_result = f1_match(reference_answer, candidate_answer, threshold=0.5)
107
  ```
108
 
109
+ ### 3. <a name='pedants'></a>PEDANTS
110
 
111
  #### Method: `get_score`
112
  **Parameters**
 
169
  match_result = pedant.evaluate(reference_answer, candidate_answer, question)
170
  ```
171
 
172
+ ### 4. <a name='neural'></a>Transformer Neural Evaluation
173
 
174
  #### Method: `get_score`
175
  **Parameters**
 
215
  match_result = tm.transformer_match(reference_answer, candidate_answer, question)
216
  ```
217
 
218
+ ### 5. <a name='llm'></a>LLM Integration
219
 
220
  #### Method: `prompt_gpt`
221
  **Parameters**