ekurtic commited on
Commit
50a00c8
·
verified ·
1 Parent(s): 463e12e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +12 -7
README.md CHANGED
@@ -7,18 +7,19 @@ tags:
7
  - sparse
8
  ---
9
 
10
- # Meta-Llama-3-8B-pruned_50.2of4
11
 
12
  This repo contains model files for a 2:4 (N:M) sparse [Meta-Llama-3-8B](meta-llama/Meta-Llama-3-8B) model pruned in one-shot with [SparseGPT](https://arxiv.org/abs/2301.00774), and then additionally retrained with the [SquareHead](https://arxiv.org/abs/2310.06927) knowledge distillation while maintaining the 2:4 sparsity mask.
 
13
 
14
- ### Running the model
15
 
16
  ```python
17
  # pip install transformers accelerate
18
  from transformers import AutoTokenizer, AutoModelForCausalLM
19
 
20
- tokenizer = AutoTokenizer.from_pretrained("nm-testing/Meta-Llama-3-8B-pruned_50.2of4")
21
- model = AutoModelForCausalLM.from_pretrained("nm-testing/Meta-Llama-3-8B-pruned_50.2of4", device_map="auto")
22
 
23
  input_text = "A poem about Machine Learning goes as follows:"
24
  input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
@@ -31,7 +32,7 @@ print(tokenizer.decode(outputs[0]))
31
 
32
  Model evaluation results obtained via [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) following the configuration of [Open LLM Leaderboard](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard).
33
 
34
- | Benchmark | Meta-Llama-3-8B | Meta-Llama-3-8B-pruned_50.2of4<br>(this model) |
35
  |:----------------------------------------------:|:-----------:|:-----------------------------:|
36
  | [ARC-c](https://arxiv.org/abs/1911.01547)<br> 25-shot | 59.47% | 57.76% |
37
  | [MMLU](https://arxiv.org/abs/2009.03300)<br> 5-shot | 65.29% | 60.44% |
@@ -45,7 +46,7 @@ Model evaluation results obtained via [lm-evaluation-harness](https://github.com
45
 
46
  Model evaluation results obtained via [Mosaic Eval Gauntlet](https://github.com/mosaicml/llm-foundry/blob/main/scripts/eval/local_data/EVAL_GAUNTLET.md) following the configuration of [Eval Gauntlet v0.3](https://github.com/mosaicml/llm-foundry/blob/main/scripts/eval/yamls/eval_gauntlet_v0.3.yaml).
47
 
48
- | Benchmark | Meta-Llama-3-8B | Meta-Llama-3-8B-pruned_50.2of4<br>(this model) |
49
  |:------------------------:|:----------------:|:----------------------------------------------:|
50
  | World Knowledge | 58.08% | 54.61% |
51
  | Commonsense Reasoning | 47.66% | 47.62% |
@@ -58,4 +59,8 @@ Model evaluation results obtained via [Mosaic Eval Gauntlet](https://github.com/
58
 
59
  ## Help
60
 
61
- For further support, and discussions on these models and AI in general, join [Neural Magic's Slack Community](https://join.slack.com/t/discuss-neuralmagic/shared_invite/zt-q1a1cnvo-YBoICSIw3L1dmQpjBeDurQ)
 
 
 
 
 
7
  - sparse
8
  ---
9
 
10
+ # SparseLlama-3-8B-pruned_50.2of4
11
 
12
  This repo contains model files for a 2:4 (N:M) sparse [Meta-Llama-3-8B](meta-llama/Meta-Llama-3-8B) model pruned in one-shot with [SparseGPT](https://arxiv.org/abs/2301.00774), and then additionally retrained with the [SquareHead](https://arxiv.org/abs/2310.06927) knowledge distillation while maintaining the 2:4 sparsity mask.
13
+ This is still a work in progress and subject to change. We expect to release new weights with even better accuracy soon.
14
 
15
+ ## Running the model
16
 
17
  ```python
18
  # pip install transformers accelerate
19
  from transformers import AutoTokenizer, AutoModelForCausalLM
20
 
21
+ tokenizer = AutoTokenizer.from_pretrained("nm-testing/SparseLlama-3-8B-pruned_50.2of4")
22
+ model = AutoModelForCausalLM.from_pretrained("nm-testing/SparseLlama-3-8B-pruned_50.2of4", device_map="auto")
23
 
24
  input_text = "A poem about Machine Learning goes as follows:"
25
  input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
 
32
 
33
  Model evaluation results obtained via [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) following the configuration of [Open LLM Leaderboard](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard).
34
 
35
+ | Benchmark | Meta-Llama-3-8B | SparseLlama-3-8B-pruned_50.2of4<br>(this model) |
36
  |:----------------------------------------------:|:-----------:|:-----------------------------:|
37
  | [ARC-c](https://arxiv.org/abs/1911.01547)<br> 25-shot | 59.47% | 57.76% |
38
  | [MMLU](https://arxiv.org/abs/2009.03300)<br> 5-shot | 65.29% | 60.44% |
 
46
 
47
  Model evaluation results obtained via [Mosaic Eval Gauntlet](https://github.com/mosaicml/llm-foundry/blob/main/scripts/eval/local_data/EVAL_GAUNTLET.md) following the configuration of [Eval Gauntlet v0.3](https://github.com/mosaicml/llm-foundry/blob/main/scripts/eval/yamls/eval_gauntlet_v0.3.yaml).
48
 
49
+ | Benchmark | Meta-Llama-3-8B | SparseLlama-3-8B-pruned_50.2of4<br>(this model) |
50
  |:------------------------:|:----------------:|:----------------------------------------------:|
51
  | World Knowledge | 58.08% | 54.61% |
52
  | Commonsense Reasoning | 47.66% | 47.62% |
 
59
 
60
  ## Help
61
 
62
+ For further support, and discussions on these models and AI in general, join [Neural Magic's Slack Community](https://join.slack.com/t/discuss-neuralmagic/shared_invite/zt-q1a1cnvo-YBoICSIw3L1dmQpjBeDurQ)
63
+
64
+ ## Acknowledgment
65
+
66
+ This model is built with Meta Llama 3. For more details on its licence please check the model card of [Meta-Llama-3-8B](meta-llama/Meta-Llama-3-8B).