leafspark commited on
Commit
b30d5e1
·
verified ·
1 Parent(s): eb6e73d

readme: update quant info and banner

Browse files
Files changed (1) hide show
  1. README.md +17 -10
README.md CHANGED
@@ -13,7 +13,7 @@ library_name: ggml
13
 
14
  # Meta-Llama-3.1-405B-Instruct-GGUF
15
 
16
- ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6604e5b21eb292d6df393365/o7DiWuILyzaPLh4Ne1JKr.png)
17
 
18
  Low bit quantizations of Meta's Llama 3.1 405B Instruct model. Quantized from ollama q4_0 GGUF.
19
 
@@ -21,15 +21,22 @@ Quantized with llama.cpp [b3449](https://github.com/ggerganov/llama.cpp/releases
21
 
22
  | Quant | Notes |
23
  |-------------|--------------------------------------------|
24
- | Q2_K | Suitable for general inference tasks |
25
- | IQ2_XXS | Best for ultra-low memory footprint |
26
- | IQ2_S | Optimized for small VRAM environments |
27
- | Q3_K_M | Good balance between speed and accuracy |
28
- | Q3_K_S | Faster inference with minor quality loss |
29
- | Q3_K_L | High-quality with more VRAM requirement |
30
- | Q4_K_M | Superior balance, suitable for production (although this is dequanted from q4_0, don't expect higher quality) |
31
- | Q4_0 | Basic quantization, good for experimentation|
32
- | Q4_K_S | Fast inference, efficient for scaling |
 
 
 
 
 
 
 
33
 
34
  For higher quality quantizations (q4+), please refer to [nisten/meta-405b-instruct-cpu-optimized-gguf](https://huggingface.co/nisten/meta-405b-instruct-cpu-optimized-gguf).
35
 
 
13
 
14
  # Meta-Llama-3.1-405B-Instruct-GGUF
15
 
16
+ ![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/6604e5b21eb292d6df393365/C0YBxvhqz3cqMdgfOUlUL.jpeg)
17
 
18
  Low bit quantizations of Meta's Llama 3.1 405B Instruct model. Quantized from ollama q4_0 GGUF.
19
 
 
21
 
22
  | Quant | Notes |
23
  |-------------|--------------------------------------------|
24
+ | BF16 | Brain floating point, very high quality, smaller than F16 |
25
+ | Q8_0 | 8-bit quantization, high quality, larger size |
26
+ | Q6_K | 6-bit quantization, very good quality-to-size ratio |
27
+ | Q5_K | 5-bit quantization, good balance of quality and size |
28
+ | Q5_0 | Alternative 5-bit quantization, slightly different balance |
29
+ | Q4_K_M | 4-bit quantization, good for production use |
30
+ | Q4_K_S | 4-bit quantization, faster inference, efficient for scaling |
31
+ | Q4_0 | Basic 4-bit quantization, good for experimentation |
32
+ | Q3_K_L | 3-bit quantization, high-quality with more VRAM requirement |
33
+ | Q3_K_M | 3-bit quantization, good balance between speed and accuracy |
34
+ | Q3_K_S | 3-bit quantization, faster inference with minor quality loss |
35
+ | Q2_K | 2-bit quantization, suitable for general inference tasks |
36
+ | IQ2_S | Integer 2-bit quantization, optimized for small VRAM environments |
37
+ | IQ2_XXS | Integer 2-bit quantization, best for ultra-low memory footprint |
38
+ | IQ1_M | Integer 1-bit quantization, usable
39
+ | IQ1_S | Integer 1-bit quantization, not recommended
40
 
41
  For higher quality quantizations (q4+), please refer to [nisten/meta-405b-instruct-cpu-optimized-gguf](https://huggingface.co/nisten/meta-405b-instruct-cpu-optimized-gguf).
42