Lin-K76 commited on
Commit
ce3efea
·
verified ·
1 Parent(s): 8132006

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +19 -19
README.md CHANGED
@@ -33,7 +33,7 @@ base_model: meta-llama/Meta-Llama-3.1-405B-Instruct
33
  - **Model Developers:** Neural Magic
34
 
35
  Quantized version of [Meta-Llama-3.1-405B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-405B-Instruct) with the updated 8 kv-heads.
36
- It achieves an average score of 86.60 on the [OpenLLM](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard) benchmark (version 1), whereas the unquantized model achieves 86.79.
37
 
38
  ### Model Optimizations
39
 
@@ -118,11 +118,11 @@ model_stub = "meta-llama/Meta-Llama-3.1-405B-Instruct"
118
  model_name = model_stub.split("/")[-1]
119
 
120
  device_map = calculate_offload_device_map(
121
- model_stub, reserve_for_hessians=False, num_gpus=8, torch_dtype=torch.float16
122
  )
123
 
124
  model = SparseAutoModelForCausalLM.from_pretrained(
125
- model_stub, torch_dtype=torch.float16, device_map=device_map
126
  )
127
  tokenizer = AutoTokenizer.from_pretrained(model_stub)
128
 
@@ -193,9 +193,9 @@ This version of the lm-evaluation-harness includes versions of ARC-Challenge, GS
193
  </td>
194
  <td>87.41
195
  </td>
196
- <td>87.05
197
  </td>
198
- <td>99.59%
199
  </td>
200
  </tr>
201
  <tr>
@@ -203,9 +203,9 @@ This version of the lm-evaluation-harness includes versions of ARC-Challenge, GS
203
  </td>
204
  <td>88.11
205
  </td>
206
- <td>87.87
207
  </td>
208
- <td>99.73%
209
  </td>
210
  </tr>
211
  <tr>
@@ -213,9 +213,9 @@ This version of the lm-evaluation-harness includes versions of ARC-Challenge, GS
213
  </td>
214
  <td>94.97
215
  </td>
216
- <td>94.97
217
  </td>
218
- <td>100.0%
219
  </td>
220
  </tr>
221
  <tr>
@@ -223,9 +223,9 @@ This version of the lm-evaluation-harness includes versions of ARC-Challenge, GS
223
  </td>
224
  <td>95.98
225
  </td>
226
- <td>95.83
227
  </td>
228
- <td>99.84%
229
  </td>
230
  </tr>
231
  <tr>
@@ -233,9 +233,9 @@ This version of the lm-evaluation-harness includes versions of ARC-Challenge, GS
233
  </td>
234
  <td>88.54
235
  </td>
236
- <td>88.11
237
  </td>
238
- <td>99.51%
239
  </td>
240
  </tr>
241
  <tr>
@@ -243,9 +243,9 @@ This version of the lm-evaluation-harness includes versions of ARC-Challenge, GS
243
  </td>
244
  <td>87.21
245
  </td>
246
- <td>87.77
247
  </td>
248
- <td>100.6%
249
  </td>
250
  </tr>
251
  <tr>
@@ -253,9 +253,9 @@ This version of the lm-evaluation-harness includes versions of ARC-Challenge, GS
253
  </td>
254
  <td>65.31
255
  </td>
256
- <td>64.58
257
  </td>
258
- <td>98.88%
259
  </td>
260
  </tr>
261
  <tr>
@@ -263,9 +263,9 @@ This version of the lm-evaluation-harness includes versions of ARC-Challenge, GS
263
  </td>
264
  <td><strong>86.79</strong>
265
  </td>
266
- <td><strong>86.60</strong>
267
  </td>
268
- <td><strong>99.74%</strong>
269
  </td>
270
  </tr>
271
  </table>
 
33
  - **Model Developers:** Neural Magic
34
 
35
  Quantized version of [Meta-Llama-3.1-405B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-405B-Instruct) with the updated 8 kv-heads.
36
+ It achieves an average score of 86.78 on the [OpenLLM](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard) benchmark (version 1), whereas the unquantized model achieves 86.79.
37
 
38
  ### Model Optimizations
39
 
 
118
  model_name = model_stub.split("/")[-1]
119
 
120
  device_map = calculate_offload_device_map(
121
+ model_stub, reserve_for_hessians=False, num_gpus=8, torch_dtype="auto"
122
  )
123
 
124
  model = SparseAutoModelForCausalLM.from_pretrained(
125
+ model_stub, torch_dtype="auto", device_map=device_map
126
  )
127
  tokenizer = AutoTokenizer.from_pretrained(model_stub)
128
 
 
193
  </td>
194
  <td>87.41
195
  </td>
196
+ <td>87.41
197
  </td>
198
+ <td>100.0%
199
  </td>
200
  </tr>
201
  <tr>
 
203
  </td>
204
  <td>88.11
205
  </td>
206
+ <td>88.02
207
  </td>
208
+ <td>99.90%
209
  </td>
210
  </tr>
211
  <tr>
 
213
  </td>
214
  <td>94.97
215
  </td>
216
+ <td>94.88
217
  </td>
218
+ <td>99.91%
219
  </td>
220
  </tr>
221
  <tr>
 
223
  </td>
224
  <td>95.98
225
  </td>
226
+ <td>96.29
227
  </td>
228
+ <td>100.3%
229
  </td>
230
  </tr>
231
  <tr>
 
233
  </td>
234
  <td>88.54
235
  </td>
236
+ <td>88.54
237
  </td>
238
+ <td>100.0%
239
  </td>
240
  </tr>
241
  <tr>
 
243
  </td>
244
  <td>87.21
245
  </td>
246
+ <td>86.98
247
  </td>
248
+ <td>99.74%
249
  </td>
250
  </tr>
251
  <tr>
 
253
  </td>
254
  <td>65.31
255
  </td>
256
+ <td>65.33
257
  </td>
258
+ <td>100.0%
259
  </td>
260
  </tr>
261
  <tr>
 
263
  </td>
264
  <td><strong>86.79</strong>
265
  </td>
266
+ <td><strong>86.78</strong>
267
  </td>
268
+ <td><strong>99.99%</strong>
269
  </td>
270
  </tr>
271
  </table>