File size: 2,070 Bytes
f64e285
 
 
 
 
 
1e4ced4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
b8e7608
 
 
8f9a881
ca91490
1e4ced4
 
 
bcb8d30
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
---
base_model:
- meta-llama/Meta-Llama-3-8B
library_name: transformers
---

# MaskLLM: Learnable Semi-structured Sparsity for Large Language Models

<div align="center">
<figure>
 <img src="https://github.com/NVlabs/MaskLLM/blob/main/assets/teaser.png?raw=true" style="width:70%; display:block; margin-left:auto; margin-right:auto;"
</figure>
</div>

This work introduces [MaskLLM](https://github.com/NVlabs/MaskLLM), a **learnable** pruning method that establishes **Semi-structured (or ``N:M'') Sparsity** in LLMs, aimed at reducing computational overhead during inference. The proposed method is scalable and stands to benefit from larger training datasets.

## Requirements
We provide pre-computed masks for Huggingface Models such as Llama-2 7B and Llama-3 8B with the minimum requirements. It will not involve docker, Megatron or data preprocessing. 
```bash
pip install transformers accelerate datasets SentencePiece 
```

## Pre-computed Masks

The following masks were trained and provided by [@VainF](https://github.com/VainF). We use ``huggingface_hub`` to automatically download those masks and apply them to offcical LLMs for evaluation. Those mask files were compressed using [numpy.savez_compressed](tool_compress_mask.py). More results for baselines (SparseGPT, Wanda) can be found in the appendix.
| Model | Pattern | Training Data | Training/Eval SeqLen | PPL (Dense) | PPL (SparseGPT) | **PPL (MaskLLM)** | Link |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| LLaMA-2 7B | 2:4 | C4 (2B Tokens)| 4096 | 5.12 | 10.42 | **6.78** | [HuggingFace](https://huggingface.co/Vinnnf/LLaMA-2-7B-MaskLLM-C4) |
| LLaMA-3 8B | 2:4 | C4 (2B Tokens) | 4096 | 5.75 | 17.64 | **8.49** | [HuggingFace](https://huggingface.co/Vinnnf/LLaMA-3-8B-MaskLLM-C4) |
| LLaMA-3.1 8B | 2:4 | C4 (2B Tokens) | 4096 | 5.89 | 18.65 | **8.58** | [HuggingFace](https://huggingface.co/Vinnnf/LLaMA-3.1-8B-MaskLLM-C4) |

## How to use it

Please see [NVlabs/MaskLLM](https://github.com/NVlabs/MaskLLM?tab=readme-ov-file#1-pre-trained-masks-for-hugging-face-models-).