readme: update quant info and banner
Browse files
README.md
CHANGED
@@ -13,7 +13,7 @@ library_name: ggml
|
|
13 |
|
14 |
# Meta-Llama-3.1-405B-Instruct-GGUF
|
15 |
|
16 |
-
![image/
|
17 |
|
18 |
Low bit quantizations of Meta's Llama 3.1 405B Instruct model. Quantized from ollama q4_0 GGUF.
|
19 |
|
@@ -21,15 +21,22 @@ Quantized with llama.cpp [b3449](https://github.com/ggerganov/llama.cpp/releases
|
|
21 |
|
22 |
| Quant | Notes |
|
23 |
|-------------|--------------------------------------------|
|
24 |
-
|
|
25 |
-
|
|
26 |
-
|
|
27 |
-
|
|
28 |
-
|
|
29 |
-
|
|
30 |
-
|
|
31 |
-
| Q4_0 | Basic quantization, good for experimentation|
|
32 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
33 |
|
34 |
For higher quality quantizations (q4+), please refer to [nisten/meta-405b-instruct-cpu-optimized-gguf](https://huggingface.co/nisten/meta-405b-instruct-cpu-optimized-gguf).
|
35 |
|
|
|
13 |
|
14 |
# Meta-Llama-3.1-405B-Instruct-GGUF
|
15 |
|
16 |
+
![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/6604e5b21eb292d6df393365/C0YBxvhqz3cqMdgfOUlUL.jpeg)
|
17 |
|
18 |
Low bit quantizations of Meta's Llama 3.1 405B Instruct model. Quantized from ollama q4_0 GGUF.
|
19 |
|
|
|
21 |
|
22 |
| Quant | Notes |
|
23 |
|-------------|--------------------------------------------|
|
24 |
+
| BF16 | Brain floating point, very high quality, smaller than F16 |
|
25 |
+
| Q8_0 | 8-bit quantization, high quality, larger size |
|
26 |
+
| Q6_K | 6-bit quantization, very good quality-to-size ratio |
|
27 |
+
| Q5_K | 5-bit quantization, good balance of quality and size |
|
28 |
+
| Q5_0 | Alternative 5-bit quantization, slightly different balance |
|
29 |
+
| Q4_K_M | 4-bit quantization, good for production use |
|
30 |
+
| Q4_K_S | 4-bit quantization, faster inference, efficient for scaling |
|
31 |
+
| Q4_0 | Basic 4-bit quantization, good for experimentation |
|
32 |
+
| Q3_K_L | 3-bit quantization, high-quality with more VRAM requirement |
|
33 |
+
| Q3_K_M | 3-bit quantization, good balance between speed and accuracy |
|
34 |
+
| Q3_K_S | 3-bit quantization, faster inference with minor quality loss |
|
35 |
+
| Q2_K | 2-bit quantization, suitable for general inference tasks |
|
36 |
+
| IQ2_S | Integer 2-bit quantization, optimized for small VRAM environments |
|
37 |
+
| IQ2_XXS | Integer 2-bit quantization, best for ultra-low memory footprint |
|
38 |
+
| IQ1_M | Integer 1-bit quantization, usable
|
39 |
+
| IQ1_S | Integer 1-bit quantization, not recommended
|
40 |
|
41 |
For higher quality quantizations (q4+), please refer to [nisten/meta-405b-instruct-cpu-optimized-gguf](https://huggingface.co/nisten/meta-405b-instruct-cpu-optimized-gguf).
|
42 |
|