alexmarques commited on
Commit
8f89d5f
·
verified ·
1 Parent(s): a5278f0

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +23 -21
README.md CHANGED
@@ -1,6 +1,13 @@
1
  ---
2
  language:
3
  - en
 
 
 
 
 
 
 
4
  pipeline_tag: text-generation
5
  license: llama3.1
6
  ---
@@ -14,15 +21,15 @@ license: llama3.1
14
  - **Model Optimizations:**
15
  - **Activation quantization:** INT8
16
  - **Weight quantization:** INT8
17
- - **Intended Use Cases:** Intended for commercial and research use in English. Similarly to [Meta-Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct), this models is intended for assistant-like chat.
18
- - **Out-of-scope:** Use in any manner that violates applicable laws or regulations (including trade compliance laws). Use in languages other than English.
19
  - **Release Date:** 7/11/2024
20
  - **Version:** 1.0
21
- - **License(s):** [Llama3](https://llama.meta.com/llama3/license/)
22
  - **Model Developers:** Neural Magic
23
 
24
  Quantized version of [Meta-Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct).
25
- It achieves an average score of 69.27 on the [OpenLLM](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard) benchmark (version 1), whereas the unquantized model achieves 69.33.
26
 
27
  ### Model Optimizations
28
 
@@ -120,14 +127,9 @@ model.save_pretrained("Meta-Llama-3.1-8B-Instruct-quantized.w8a8")
120
 
121
  ## Evaluation
122
 
123
- The model was evaluated on the [OpenLLM](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard) leaderboard tasks (version 1) with the [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness/tree/383bbd54bc621086e05aa1b030d8d4d5635b25e6) (commit 383bbd54bc621086e05aa1b030d8d4d5635b25e6) and the [vLLM](https://docs.vllm.ai/en/stable/) engine, using the following command:
124
- ```
125
- lm_eval \
126
- --model vllm \
127
- --model_args pretrained="neuralmagic/Meta-Llama-3.1-8B-Instruct-quantized.w8a8",dtype=auto,gpu_memory_utilization=0.4,add_bos_token=True,max_model_len=4096,tensor_parallel_size=1 \
128
- --tasks openllm \
129
- --batch_size auto
130
- ```
131
 
132
  ### Accuracy
133
 
@@ -156,21 +158,21 @@ lm_eval \
156
  <tr>
157
  <td>ARC Challenge (25-shot)
158
  </td>
159
- <td>62.63
160
  </td>
161
- <td>62.20
162
  </td>
163
- <td>99.5%
164
  </td>
165
  </tr>
166
  <tr>
167
  <td>GSM-8K (5-shot, strict-match)
168
  </td>
169
- <td>75.66
170
  </td>
171
- <td>76.57
172
  </td>
173
- <td>101.2%
174
  </td>
175
  </tr>
176
  <tr>
@@ -206,11 +208,11 @@ lm_eval \
206
  <tr>
207
  <td><strong>Average</strong>
208
  </td>
209
- <td><strong>69.33</strong>
210
  </td>
211
- <td><strong>69.27</strong>
212
  </td>
213
- <td><strong>99.9%</strong>
214
  </td>
215
  </tr>
216
  </table>
 
1
  ---
2
  language:
3
  - en
4
+ - de
5
+ - fr
6
+ - it
7
+ - pt
8
+ - hi
9
+ - es
10
+ - th
11
  pipeline_tag: text-generation
12
  license: llama3.1
13
  ---
 
21
  - **Model Optimizations:**
22
  - **Activation quantization:** INT8
23
  - **Weight quantization:** INT8
24
+ - **Intended Use Cases:** Intended for commercial and research use multiple languages. Similarly to [Meta-Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct), this models is intended for assistant-like chat.
25
+ - **Out-of-scope:** Use in any manner that violates applicable laws or regulations (including trade compliance laws).
26
  - **Release Date:** 7/11/2024
27
  - **Version:** 1.0
28
+ - **License(s):** [Llama3.1]
29
  - **Model Developers:** Neural Magic
30
 
31
  Quantized version of [Meta-Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct).
32
+ It achieves scores within 1.3% of the scores of the unquantized model for MMLU, ARC-Challenge, GSM-8k, Hellaswag, Winogrande and TruthfulQA.
33
 
34
  ### Model Optimizations
35
 
 
127
 
128
  ## Evaluation
129
 
130
+ The model was evaluated on MMLU, ARC-Challenge, GSM-8K, Hellaswag, Winogrande and TruthfulQA.
131
+ Evaluation was conducted using the Neural Magic fork of [lm-evaluation-harness](https://github.com/neuralmagic/lm-evaluation-harness/tree/llama_3.1_instruct) (branch llama_3.1_instruct) and the [vLLM](https://docs.vllm.ai/en/stable/) engine.
132
+ This version of the lm-evaluation-harness includes versions of ARC-Challenge and GSM-8K that match the prompting style of [Meta-Llama-3.1-Instruct-evals](https://huggingface.co/datasets/meta-llama/Meta-Llama-3.1-8B-Instruct-evals).
 
 
 
 
 
133
 
134
  ### Accuracy
135
 
 
158
  <tr>
159
  <td>ARC Challenge (25-shot)
160
  </td>
161
+ <td>83.19
162
  </td>
163
+ <td>82.08
164
  </td>
165
+ <td>98.7%
166
  </td>
167
  </tr>
168
  <tr>
169
  <td>GSM-8K (5-shot, strict-match)
170
  </td>
171
+ <td>82.79
172
  </td>
173
+ <td>81.96
174
  </td>
175
+ <td>99.0%
176
  </td>
177
  </tr>
178
  <tr>
 
208
  <tr>
209
  <td><strong>Average</strong>
210
  </td>
211
+ <td><strong>74.31</strong>
212
  </td>
213
+ <td><strong>73.79</strong>
214
  </td>
215
+ <td><strong>99.3%</strong>
216
  </td>
217
  </tr>
218
  </table>