Update README.md
Browse files
README.md
CHANGED
@@ -7,18 +7,19 @@ tags:
|
|
7 |
- sparse
|
8 |
---
|
9 |
|
10 |
-
#
|
11 |
|
12 |
This repo contains model files for a 2:4 (N:M) sparse [Meta-Llama-3-8B](meta-llama/Meta-Llama-3-8B) model pruned in one-shot with [SparseGPT](https://arxiv.org/abs/2301.00774), and then additionally retrained with the [SquareHead](https://arxiv.org/abs/2310.06927) knowledge distillation while maintaining the 2:4 sparsity mask.
|
|
|
13 |
|
14 |
-
|
15 |
|
16 |
```python
|
17 |
# pip install transformers accelerate
|
18 |
from transformers import AutoTokenizer, AutoModelForCausalLM
|
19 |
|
20 |
-
tokenizer = AutoTokenizer.from_pretrained("nm-testing/
|
21 |
-
model = AutoModelForCausalLM.from_pretrained("nm-testing/
|
22 |
|
23 |
input_text = "A poem about Machine Learning goes as follows:"
|
24 |
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
|
@@ -31,7 +32,7 @@ print(tokenizer.decode(outputs[0]))
|
|
31 |
|
32 |
Model evaluation results obtained via [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) following the configuration of [Open LLM Leaderboard](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard).
|
33 |
|
34 |
-
| Benchmark | Meta-Llama-3-8B |
|
35 |
|:----------------------------------------------:|:-----------:|:-----------------------------:|
|
36 |
| [ARC-c](https://arxiv.org/abs/1911.01547)<br> 25-shot | 59.47% | 57.76% |
|
37 |
| [MMLU](https://arxiv.org/abs/2009.03300)<br> 5-shot | 65.29% | 60.44% |
|
@@ -45,7 +46,7 @@ Model evaluation results obtained via [lm-evaluation-harness](https://github.com
|
|
45 |
|
46 |
Model evaluation results obtained via [Mosaic Eval Gauntlet](https://github.com/mosaicml/llm-foundry/blob/main/scripts/eval/local_data/EVAL_GAUNTLET.md) following the configuration of [Eval Gauntlet v0.3](https://github.com/mosaicml/llm-foundry/blob/main/scripts/eval/yamls/eval_gauntlet_v0.3.yaml).
|
47 |
|
48 |
-
| Benchmark | Meta-Llama-3-8B |
|
49 |
|:------------------------:|:----------------:|:----------------------------------------------:|
|
50 |
| World Knowledge | 58.08% | 54.61% |
|
51 |
| Commonsense Reasoning | 47.66% | 47.62% |
|
@@ -58,4 +59,8 @@ Model evaluation results obtained via [Mosaic Eval Gauntlet](https://github.com/
|
|
58 |
|
59 |
## Help
|
60 |
|
61 |
-
For further support, and discussions on these models and AI in general, join [Neural Magic's Slack Community](https://join.slack.com/t/discuss-neuralmagic/shared_invite/zt-q1a1cnvo-YBoICSIw3L1dmQpjBeDurQ)
|
|
|
|
|
|
|
|
|
|
7 |
- sparse
|
8 |
---
|
9 |
|
10 |
+
# SparseLlama-3-8B-pruned_50.2of4
|
11 |
|
12 |
This repo contains model files for a 2:4 (N:M) sparse [Meta-Llama-3-8B](meta-llama/Meta-Llama-3-8B) model pruned in one-shot with [SparseGPT](https://arxiv.org/abs/2301.00774), and then additionally retrained with the [SquareHead](https://arxiv.org/abs/2310.06927) knowledge distillation while maintaining the 2:4 sparsity mask.
|
13 |
+
This is still a work in progress and subject to change. We expect to release new weights with even better accuracy soon.
|
14 |
|
15 |
+
## Running the model
|
16 |
|
17 |
```python
|
18 |
# pip install transformers accelerate
|
19 |
from transformers import AutoTokenizer, AutoModelForCausalLM
|
20 |
|
21 |
+
tokenizer = AutoTokenizer.from_pretrained("nm-testing/SparseLlama-3-8B-pruned_50.2of4")
|
22 |
+
model = AutoModelForCausalLM.from_pretrained("nm-testing/SparseLlama-3-8B-pruned_50.2of4", device_map="auto")
|
23 |
|
24 |
input_text = "A poem about Machine Learning goes as follows:"
|
25 |
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
|
|
|
32 |
|
33 |
Model evaluation results obtained via [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) following the configuration of [Open LLM Leaderboard](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard).
|
34 |
|
35 |
+
| Benchmark | Meta-Llama-3-8B | SparseLlama-3-8B-pruned_50.2of4<br>(this model) |
|
36 |
|:----------------------------------------------:|:-----------:|:-----------------------------:|
|
37 |
| [ARC-c](https://arxiv.org/abs/1911.01547)<br> 25-shot | 59.47% | 57.76% |
|
38 |
| [MMLU](https://arxiv.org/abs/2009.03300)<br> 5-shot | 65.29% | 60.44% |
|
|
|
46 |
|
47 |
Model evaluation results obtained via [Mosaic Eval Gauntlet](https://github.com/mosaicml/llm-foundry/blob/main/scripts/eval/local_data/EVAL_GAUNTLET.md) following the configuration of [Eval Gauntlet v0.3](https://github.com/mosaicml/llm-foundry/blob/main/scripts/eval/yamls/eval_gauntlet_v0.3.yaml).
|
48 |
|
49 |
+
| Benchmark | Meta-Llama-3-8B | SparseLlama-3-8B-pruned_50.2of4<br>(this model) |
|
50 |
|:------------------------:|:----------------:|:----------------------------------------------:|
|
51 |
| World Knowledge | 58.08% | 54.61% |
|
52 |
| Commonsense Reasoning | 47.66% | 47.62% |
|
|
|
59 |
|
60 |
## Help
|
61 |
|
62 |
+
For further support, and discussions on these models and AI in general, join [Neural Magic's Slack Community](https://join.slack.com/t/discuss-neuralmagic/shared_invite/zt-q1a1cnvo-YBoICSIw3L1dmQpjBeDurQ)
|
63 |
+
|
64 |
+
## Acknowledgment
|
65 |
+
|
66 |
+
This model is built with Meta Llama 3. For more details on its licence please check the model card of [Meta-Llama-3-8B](meta-llama/Meta-Llama-3-8B).
|