maddes8cht
commited on
Commit
·
6765ca4
1
Parent(s):
759f6fd
"Update README.md"
Browse files
README.md
CHANGED
@@ -31,21 +31,7 @@ I'm constantly enhancing these model descriptions to provide you with the most r
|
|
31 |
- Model creator: [mosaicml](https://huggingface.co/mosaicml)
|
32 |
- Original model: [mpt-7b-8k-chat](https://huggingface.co/mosaicml/mpt-7b-8k-chat)
|
33 |
|
34 |
-
|
35 |
-
|
36 |
-
As noted on the [Llama.cpp GitHub repository](https://github.com/ggerganov/llama.cpp#hot-topics), all new Llama.cpp releases after October 18, 2023, will require a re-quantization due to the new BPE tokenizer.
|
37 |
-
|
38 |
-
**Good news!** I am glad that my re-quantization process for Falcon Models is nearly complete. Download the latest quantized models to ensure compatibility with recent llama.cpp software.
|
39 |
-
|
40 |
-
**Key Points:**
|
41 |
-
|
42 |
-
- **Stay Informed:** Keep an eye on software application release schedules using llama.cpp libraries.
|
43 |
-
- **Monitor Upload Times:** Re-quantization is *almost* done. Watch for updates on my Hugging Face Model pages.
|
44 |
-
|
45 |
-
**Important Compatibility Note:** Old software will work with old Falcon models, but expect updated software to exclusively support the new models.
|
46 |
-
|
47 |
-
This change primarily affects **Falcon** and **Starcoder** models, with other models remaining unaffected.
|
48 |
-
|
49 |
|
50 |
|
51 |
|
@@ -57,19 +43,21 @@ The core project making use of the ggml library is the [llama.cpp](https://githu
|
|
57 |
|
58 |
# Quantization variants
|
59 |
|
60 |
-
There is a bunch of quantized files available.
|
61 |
|
62 |
# Legacy quants
|
63 |
|
64 |
Q4_0, Q4_1, Q5_0, Q5_1 and Q8 are `legacy` quantization types.
|
65 |
Nevertheless, they are fully supported, as there are several circumstances that cause certain model not to be compatible with the modern K-quants.
|
66 |
-
|
|
|
|
|
67 |
|
68 |
# K-quants
|
69 |
|
70 |
-
K-quants are
|
71 |
So, if possible, use K-quants.
|
72 |
-
With a Q6_K you
|
73 |
|
74 |
|
75 |
|
|
|
31 |
- Model creator: [mosaicml](https://huggingface.co/mosaicml)
|
32 |
- Original model: [mpt-7b-8k-chat](https://huggingface.co/mosaicml/mpt-7b-8k-chat)
|
33 |
|
34 |
+
MPT-7b and MPT-30B are part of the family of Mosaic Pretrained Transformer (MPT) models, which use a modified transformer architecture optimized for efficient training and inference.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
35 |
|
36 |
|
37 |
|
|
|
43 |
|
44 |
# Quantization variants
|
45 |
|
46 |
+
There is a bunch of quantized files available to cater to your specific needs. Here's how to choose the best option for you:
|
47 |
|
48 |
# Legacy quants
|
49 |
|
50 |
Q4_0, Q4_1, Q5_0, Q5_1 and Q8 are `legacy` quantization types.
|
51 |
Nevertheless, they are fully supported, as there are several circumstances that cause certain model not to be compatible with the modern K-quants.
|
52 |
+
## Note:
|
53 |
+
Now there's a new option to use K-quants even for previously 'incompatible' models, although this involves some fallback solution that makes them not *real* K-quants. More details can be found in affected model descriptions.
|
54 |
+
(This mainly refers to Falcon 7b and Starcoder models)
|
55 |
|
56 |
# K-quants
|
57 |
|
58 |
+
K-quants are designed with the idea that different levels of quantization in specific parts of the model can optimize performance, file size, and memory load.
|
59 |
So, if possible, use K-quants.
|
60 |
+
With a Q6_K, you'll likely find it challenging to discern a quality difference from the original model - ask your model two times the same question and you may encounter bigger quality differences.
|
61 |
|
62 |
|
63 |
|