readme: add information about tokenizer
Browse files
README.md
CHANGED
@@ -19,18 +19,25 @@ Low bit quantizations of Meta's Llama 3.1 405B Instruct model. Quantized from ol
|
|
19 |
|
20 |
Quantized with llama.cpp [b3449](https://github.com/ggerganov/llama.cpp/releases/tag/b3449)
|
21 |
|
22 |
-
|
23 |
-
|
24 |
-
|
25 |
-
-
|
26 |
-
|
27 |
-
|
28 |
-
|
29 |
-
-
|
30 |
-
|
|
|
|
|
31 |
|
32 |
For higher quality quantizations (q4+), please refer to [nisten/meta-405b-instruct-cpu-optimized-gguf](https://huggingface.co/nisten/meta-405b-instruct-cpu-optimized-gguf).
|
33 |
|
|
|
|
|
|
|
|
|
|
|
34 |
## imatrix
|
35 |
|
36 |
Generated from Q2_K quant.
|
|
|
19 |
|
20 |
Quantized with llama.cpp [b3449](https://github.com/ggerganov/llama.cpp/releases/tag/b3449)
|
21 |
|
22 |
+
| Quant | Notes |
|
23 |
+
|-------------|--------------------------------------------|
|
24 |
+
| Q2_K | Suitable for general inference tasks |
|
25 |
+
| IQ2_XXS | Best for ultra-low memory footprint |
|
26 |
+
| IQ2_S | Optimized for small VRAM environments |
|
27 |
+
| Q3_K_M | Good balance between speed and accuracy |
|
28 |
+
| Q3_K_S | Faster inference with minor quality loss |
|
29 |
+
| Q3_K_L | High-quality with more VRAM requirement |
|
30 |
+
| Q4_K_M | Superior balance, suitable for production (although this is dequanted from q4_0, don't expect higher quality) |
|
31 |
+
| Q4_0 | Basic quantization, good for experimentation|
|
32 |
+
| Q4_K_S | Fast inference, efficient for scaling |
|
33 |
|
34 |
For higher quality quantizations (q4+), please refer to [nisten/meta-405b-instruct-cpu-optimized-gguf](https://huggingface.co/nisten/meta-405b-instruct-cpu-optimized-gguf).
|
35 |
|
36 |
+
Regarding the `smaug-bpe` tokenizer, this doesn't make a difference (they are identical). However, if you have concerns you can use the following command to set the `llama-bpe` tokenizer:
|
37 |
+
```
|
38 |
+
./gguf-py/scripts/gguf_new_metadata.py --pre-tokenizer "llama-bpe" Llama-3.1-405B-Instruct-old.gguf LLama-3.1-405B-Instruct-fixed.gguf
|
39 |
+
```
|
40 |
+
|
41 |
## imatrix
|
42 |
|
43 |
Generated from Q2_K quant.
|