leafspark commited on
Commit
b520c32
·
verified ·
1 Parent(s): b1f49d8

readme: add information about tokenizer

Browse files
Files changed (1) hide show
  1. README.md +16 -9
README.md CHANGED
@@ -19,18 +19,25 @@ Low bit quantizations of Meta's Llama 3.1 405B Instruct model. Quantized from ol
19
 
20
  Quantized with llama.cpp [b3449](https://github.com/ggerganov/llama.cpp/releases/tag/b3449)
21
 
22
- **Quants:**
23
- - Q2_K
24
- - (imatrix)
25
- - Q3_K_M
26
- - Q3_K_S
27
- - Q3_K_L
28
- - Q4_K_M
29
- - Q4_0
30
- - Q4_K_S
 
 
31
 
32
  For higher quality quantizations (q4+), please refer to [nisten/meta-405b-instruct-cpu-optimized-gguf](https://huggingface.co/nisten/meta-405b-instruct-cpu-optimized-gguf).
33
 
 
 
 
 
 
34
  ## imatrix
35
 
36
  Generated from Q2_K quant.
 
19
 
20
  Quantized with llama.cpp [b3449](https://github.com/ggerganov/llama.cpp/releases/tag/b3449)
21
 
22
+ | Quant | Notes |
23
+ |-------------|--------------------------------------------|
24
+ | Q2_K | Suitable for general inference tasks |
25
+ | IQ2_XXS | Best for ultra-low memory footprint |
26
+ | IQ2_S | Optimized for small VRAM environments |
27
+ | Q3_K_M | Good balance between speed and accuracy |
28
+ | Q3_K_S | Faster inference with minor quality loss |
29
+ | Q3_K_L | High-quality with more VRAM requirement |
30
+ | Q4_K_M | Superior balance, suitable for production (although this is dequanted from q4_0, don't expect higher quality) |
31
+ | Q4_0 | Basic quantization, good for experimentation|
32
+ | Q4_K_S | Fast inference, efficient for scaling |
33
 
34
  For higher quality quantizations (q4+), please refer to [nisten/meta-405b-instruct-cpu-optimized-gguf](https://huggingface.co/nisten/meta-405b-instruct-cpu-optimized-gguf).
35
 
36
+ Regarding the `smaug-bpe` tokenizer, this doesn't make a difference (they are identical). However, if you have concerns you can use the following command to set the `llama-bpe` tokenizer:
37
+ ```
38
+ ./gguf-py/scripts/gguf_new_metadata.py --pre-tokenizer "llama-bpe" Llama-3.1-405B-Instruct-old.gguf LLama-3.1-405B-Instruct-fixed.gguf
39
+ ```
40
+
41
  ## imatrix
42
 
43
  Generated from Q2_K quant.