aifeifei798
commited on
Upload README.md
Browse files
README.md
CHANGED
@@ -14,6 +14,14 @@ tags:
|
|
14 |
- Lewdiculous's superb gguf version, thank you for your conscientious and responsible dedication.
|
15 |
- https://huggingface.co/LWDCLS/llama3-8B-DarkIdol-2.2-Uncensored-1048K-GGUF-IQ-Imatrix-Request
|
16 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
17 |
|
18 |
## Why 1048K?
|
19 |
Due to the optimization of the preferred model, its performance is excellent across the range of 2000-1048K. Personal usage scenarios, such as 8186, 32K, etc., are insufficient. My primary role involves managing virtual idol Twitter accounts and assisting with singing, etc. A good conversation can be very lengthy, and sometimes even 32K is not enough. Imagine having a heated chat with your virtual girlfriend, only for it to abruptly cut off—that feeling is too painful.
|
|
|
14 |
- Lewdiculous's superb gguf version, thank you for your conscientious and responsible dedication.
|
15 |
- https://huggingface.co/LWDCLS/llama3-8B-DarkIdol-2.2-Uncensored-1048K-GGUF-IQ-Imatrix-Request
|
16 |
|
17 |
+
# These are my own quantizations (updated almost daily).
|
18 |
+
The difference with normal quantizations is that I quantize the output and embed tensors to f16.
|
19 |
+
and the other tensors to 15_k,q6_k or q8_0.
|
20 |
+
This creates models that are little or not degraded at all and have a smaller size.
|
21 |
+
They run at about 3-6 t/sec on CPU only using llama.cpp
|
22 |
+
And obviously faster on computers with potent GPUs
|
23 |
+
|
24 |
+
- the fast cat at [ZeroWw/llama3-8B-DarkIdol-2.2-Uncensored-1048K-GGUF](https://huggingface.co/ZeroWw/llama3-8B-DarkIdol-2.2-Uncensored-1048K-GGUF)
|
25 |
|
26 |
## Why 1048K?
|
27 |
Due to the optimization of the preferred model, its performance is excellent across the range of 2000-1048K. Personal usage scenarios, such as 8186, 32K, etc., are insufficient. My primary role involves managing virtual idol Twitter accounts and assisting with singing, etc. A good conversation can be very lengthy, and sometimes even 32K is not enough. Imagine having a heated chat with your virtual girlfriend, only for it to abruptly cut off—that feeling is too painful.
|