neuralmagic
/

SparseLlama-3-8B-pruned_50.2of4

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

mgoin commited on Jun 25, 2024

Commit

4b35527

·

verified ·

1 Parent(s): 0229442

Update README.md

Files changed (1) hide show

README.md +18 -0

README.md CHANGED Viewed

@@ -15,6 +15,7 @@ This repo contains model files for a 2:4 (N:M) sparse [Meta-Llama-3-8B](meta-lla
 ## Running the model
 ```python
 # pip install transformers accelerate
 from transformers import AutoTokenizer, AutoModelForCausalLM
@@ -29,6 +30,23 @@ outputs = model.generate(**input_ids)
 print(tokenizer.decode(outputs[0]))
 ```
 ## Evaluation Benchmark Results
 Model evaluation results obtained via [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) following the configuration of [Open LLM Leaderboard](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard).

 ## Running the model
+It can be run naively in transformers for testing purposes:
 ```python
 # pip install transformers accelerate
 from transformers import AutoTokenizer, AutoModelForCausalLM
 print(tokenizer.decode(outputs[0]))
 ```
+To take advantage of the 2:4 sparsity present, install [nm-vllm](https://github.com/neuralmagic/nm-vllm) for fast inference and low memory-usage:
+```bash
+pip install nm-vllm[sparse] --extra-index-url https://pypi.neuralmagic.com/simple
+```
+```python
+from vllm import LLM, SamplingParams
+model = LLM("nm-testing/SparseLlama-3-8B-pruned_50.2of4", sparsity="semi_structured_sparse_w16a16")
+prompt = "A poem about Machine Learning goes as follows:"
+sampling_params = SamplingParams(max_tokens=100, temperature=0)
+outputs = model.generate(prompt, sampling_params=sampling_params)
+print(outputs[0].outputs[0].text)
+```
 ## Evaluation Benchmark Results
 Model evaluation results obtained via [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) following the configuration of [Open LLM Leaderboard](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard).