neuralmagic
/

SparseLlama-3-8B-pruned_50.2of4

@@ -7,18 +7,19 @@ tags:
 - sparse
 ---
-# Meta-Llama-3-8B-pruned_50.2of4
 This repo contains model files for a 2:4 (N:M) sparse [Meta-Llama-3-8B](meta-llama/Meta-Llama-3-8B) model pruned in one-shot with [SparseGPT](https://arxiv.org/abs/2301.00774), and then additionally retrained with the [SquareHead](https://arxiv.org/abs/2310.06927) knowledge distillation while maintaining the 2:4 sparsity mask.
-### Running the model
 ```python
 # pip install transformers accelerate
 from transformers import AutoTokenizer, AutoModelForCausalLM
-tokenizer = AutoTokenizer.from_pretrained("nm-testing/Meta-Llama-3-8B-pruned_50.2of4")
-model = AutoModelForCausalLM.from_pretrained("nm-testing/Meta-Llama-3-8B-pruned_50.2of4", device_map="auto")
 input_text = "A poem about Machine Learning goes as follows:"
 input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
@@ -31,7 +32,7 @@ print(tokenizer.decode(outputs[0]))
 Model evaluation results obtained via [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) following the configuration of [Open LLM Leaderboard](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard).
-| Benchmark                                      | Meta-Llama-3-8B  | Meta-Llama-3-8B-pruned_50.2of4<br>(this model) |
 |:----------------------------------------------:|:-----------:|:-----------------------------:|
 | [ARC-c](https://arxiv.org/abs/1911.01547)<br> 25-shot      | 59.47%       | 57.76%                         |
 | [MMLU](https://arxiv.org/abs/2009.03300)<br> 5-shot       | 65.29%       | 60.44%                         |
@@ -45,7 +46,7 @@ Model evaluation results obtained via [lm-evaluation-harness](https://github.com
 Model evaluation results obtained via [Mosaic Eval Gauntlet](https://github.com/mosaicml/llm-foundry/blob/main/scripts/eval/local_data/EVAL_GAUNTLET.md) following the configuration of [Eval Gauntlet v0.3](https://github.com/mosaicml/llm-foundry/blob/main/scripts/eval/yamls/eval_gauntlet_v0.3.yaml).
-| Benchmark                | Meta-Llama-3-8B  | Meta-Llama-3-8B-pruned_50.2of4<br>(this model) |
 |:------------------------:|:----------------:|:----------------------------------------------:|
 | World Knowledge          | 58.08%       | 54.61%                         |
 | Commonsense Reasoning    | 47.66%       | 47.62%                         |
@@ -58,4 +59,8 @@ Model evaluation results obtained via [Mosaic Eval Gauntlet](https://github.com/
 ## Help
-For further support, and discussions on these models and AI in general, join [Neural Magic's Slack Community](https://join.slack.com/t/discuss-neuralmagic/shared_invite/zt-q1a1cnvo-YBoICSIw3L1dmQpjBeDurQ)

 - sparse
 ---
+# SparseLlama-3-8B-pruned_50.2of4
 This repo contains model files for a 2:4 (N:M) sparse [Meta-Llama-3-8B](meta-llama/Meta-Llama-3-8B) model pruned in one-shot with [SparseGPT](https://arxiv.org/abs/2301.00774), and then additionally retrained with the [SquareHead](https://arxiv.org/abs/2310.06927) knowledge distillation while maintaining the 2:4 sparsity mask.
+This is still a work in progress and subject to change. We expect to release new weights with even better accuracy soon.
+## Running the model
 ```python
 # pip install transformers accelerate
 from transformers import AutoTokenizer, AutoModelForCausalLM
+tokenizer = AutoTokenizer.from_pretrained("nm-testing/SparseLlama-3-8B-pruned_50.2of4")
+model = AutoModelForCausalLM.from_pretrained("nm-testing/SparseLlama-3-8B-pruned_50.2of4", device_map="auto")
 input_text = "A poem about Machine Learning goes as follows:"
 input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
 Model evaluation results obtained via [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) following the configuration of [Open LLM Leaderboard](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard).
+| Benchmark                                      | Meta-Llama-3-8B  | SparseLlama-3-8B-pruned_50.2of4<br>(this model) |
 |:----------------------------------------------:|:-----------:|:-----------------------------:|
 | [ARC-c](https://arxiv.org/abs/1911.01547)<br> 25-shot      | 59.47%       | 57.76%                         |
 | [MMLU](https://arxiv.org/abs/2009.03300)<br> 5-shot       | 65.29%       | 60.44%                         |
 Model evaluation results obtained via [Mosaic Eval Gauntlet](https://github.com/mosaicml/llm-foundry/blob/main/scripts/eval/local_data/EVAL_GAUNTLET.md) following the configuration of [Eval Gauntlet v0.3](https://github.com/mosaicml/llm-foundry/blob/main/scripts/eval/yamls/eval_gauntlet_v0.3.yaml).
+| Benchmark                | Meta-Llama-3-8B  | SparseLlama-3-8B-pruned_50.2of4<br>(this model) |
 |:------------------------:|:----------------:|:----------------------------------------------:|
 | World Knowledge          | 58.08%       | 54.61%                         |
 | Commonsense Reasoning    | 47.66%       | 47.62%                         |
 ## Help
+For further support, and discussions on these models and AI in general, join [Neural Magic's Slack Community](https://join.slack.com/t/discuss-neuralmagic/shared_invite/zt-q1a1cnvo-YBoICSIw3L1dmQpjBeDurQ)
+## Acknowledgment
+This model is built with Meta Llama 3. For more details on its licence please check the model card of [Meta-Llama-3-8B](meta-llama/Meta-Llama-3-8B).