Update README.md
Browse files
README.md
CHANGED
@@ -15,6 +15,7 @@ This repo contains model files for a 2:4 (N:M) sparse [Meta-Llama-3-8B](meta-lla
|
|
15 |
|
16 |
## Running the model
|
17 |
|
|
|
18 |
```python
|
19 |
# pip install transformers accelerate
|
20 |
from transformers import AutoTokenizer, AutoModelForCausalLM
|
@@ -29,6 +30,23 @@ outputs = model.generate(**input_ids)
|
|
29 |
print(tokenizer.decode(outputs[0]))
|
30 |
```
|
31 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
32 |
## Evaluation Benchmark Results
|
33 |
|
34 |
Model evaluation results obtained via [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) following the configuration of [Open LLM Leaderboard](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard).
|
|
|
15 |
|
16 |
## Running the model
|
17 |
|
18 |
+
It can be run naively in transformers for testing purposes:
|
19 |
```python
|
20 |
# pip install transformers accelerate
|
21 |
from transformers import AutoTokenizer, AutoModelForCausalLM
|
|
|
30 |
print(tokenizer.decode(outputs[0]))
|
31 |
```
|
32 |
|
33 |
+
To take advantage of the 2:4 sparsity present, install [nm-vllm](https://github.com/neuralmagic/nm-vllm) for fast inference and low memory-usage:
|
34 |
+
```bash
|
35 |
+
pip install nm-vllm[sparse] --extra-index-url https://pypi.neuralmagic.com/simple
|
36 |
+
```
|
37 |
+
|
38 |
+
```python
|
39 |
+
from vllm import LLM, SamplingParams
|
40 |
+
|
41 |
+
model = LLM("nm-testing/SparseLlama-3-8B-pruned_50.2of4", sparsity="semi_structured_sparse_w16a16")
|
42 |
+
|
43 |
+
prompt = "A poem about Machine Learning goes as follows:"
|
44 |
+
sampling_params = SamplingParams(max_tokens=100, temperature=0)
|
45 |
+
|
46 |
+
outputs = model.generate(prompt, sampling_params=sampling_params)
|
47 |
+
print(outputs[0].outputs[0].text)
|
48 |
+
```
|
49 |
+
|
50 |
## Evaluation Benchmark Results
|
51 |
|
52 |
Model evaluation results obtained via [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) following the configuration of [Open LLM Leaderboard](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard).
|