leafspark
/

Meta-Llama-3.1-405B-Instruct-GGUF

leafspark commited on Jul 24, 2024

Commit

b520c32

verified ·

1 Parent(s): b1f49d8

readme: add information about tokenizer

Files changed (1) hide show

README.md CHANGED Viewed

@@ -19,18 +19,25 @@ Low bit quantizations of Meta's Llama 3.1 405B Instruct model. Quantized from ol
 Quantized with llama.cpp [b3449](https://github.com/ggerganov/llama.cpp/releases/tag/b3449)
-**Quants:**
-- Q2_K
-- (imatrix)
-- Q3_K_M
-- Q3_K_S
-- Q3_K_L
-- Q4_K_M
-- Q4_0
-- Q4_K_S
 For higher quality quantizations (q4+), please refer to [nisten/meta-405b-instruct-cpu-optimized-gguf](https://huggingface.co/nisten/meta-405b-instruct-cpu-optimized-gguf).
 ## imatrix
 Generated from Q2_K quant.

 Quantized with llama.cpp [b3449](https://github.com/ggerganov/llama.cpp/releases/tag/b3449)
+| Quant       | Notes                                      |
+|-------------|--------------------------------------------|
+| Q2_K        | Suitable for general inference tasks       |
+| IQ2_XXS     | Best for ultra-low memory footprint        |
+| IQ2_S       | Optimized for small VRAM environments      |
+| Q3_K_M      | Good balance between speed and accuracy    |
+| Q3_K_S      | Faster inference with minor quality loss   |
+| Q3_K_L      | High-quality with more VRAM requirement    |
+| Q4_K_M      | Superior balance, suitable for production (although this is dequanted from q4_0, don't expect higher quality)  |
+| Q4_0        | Basic quantization, good for experimentation|
+| Q4_K_S      | Fast inference, efficient for scaling      |
 For higher quality quantizations (q4+), please refer to [nisten/meta-405b-instruct-cpu-optimized-gguf](https://huggingface.co/nisten/meta-405b-instruct-cpu-optimized-gguf).
+Regarding the `smaug-bpe` tokenizer, this doesn't make a difference (they are identical). However, if you have concerns you can use the following command to set the `llama-bpe` tokenizer:
+```
+./gguf-py/scripts/gguf_new_metadata.py --pre-tokenizer "llama-bpe" Llama-3.1-405B-Instruct-old.gguf LLama-3.1-405B-Instruct-fixed.gguf
+```
 ## imatrix
 Generated from Q2_K quant.