Update README.md
Browse files
README.md
CHANGED
@@ -1,17 +1,18 @@
|
|
1 |
Requantization of a Q5_K_M quant of a trending 70b model without better quant/fp16 available, this through a Q8_0 intermediary step.
|
2 |
|
3 |
Miku 70b has a theta of 1,000,000, like CodeLlama, and not 10,000, like Llama 2 models usually have.
|
4 |
-
That feature singularizes it to my knowledge to ALL Llama 2 models, beside Codellamas
|
5 |
|
6 |
-
So, no Alpha or Rope Base Frequency change is needed up to its base 32k context, if it works as intended.
|
7 |
And if it does, no linear/yarn rope is necessary either to reach the base 32k context.
|
8 |
|
9 |
-
|
10 |
|
11 |
- If the Theta of CodeLlama 70b is claimed to be 1,000,000, its base rope actually seems to be 10,000 (see benchs..)
|
12 |
- Which means that CodeLlama might be context limited as Llama 2 is, instead of having a baseline of 100,000 ctx max..
|
13 |
-
- Meanwhile, Miku's
|
14 |
-
-
|
|
|
15 |
|
16 |
So, CodeLlama 70b is nerfed like the other CodeLlama in general benchmarks terms, while Miku is matching a FINETUNED Llama-2 expectations.
|
17 |
|
@@ -67,6 +68,8 @@ And now, the IQ3_XXS, new SOTA 3 bits quant from LlamaCPP, that I made in the sa
|
|
67 |
- miqu-1-70b.IQ3_XXS.gguf,-,wikitext,4.0309,512,512,2024-01-29 01:40:00,RBF1000000,70b,Mistral_Medium,32768,,,GGUF,miqudev,Nexesenex,655
|
68 |
- miqu-1-70b.IQ3_XXS.gguf,-,wikitext,3.5141,4096,4096,2024-01-29 01:40:00,,70b,Mistral_Medium,32768,,,GGUF,miqudev,Nexesenex,
|
69 |
|
|
|
|
|
70 |
Meanwhile, CodeLlama 70b Q2_K benches as such, to compare with Miqu 70B Q2_K originally quantized from FP16 by Miqudev :
|
71 |
|
72 |
- CodeLlama-70b-Instruct-hf-Q2_K.gguf,-,Hellaswag,76.5,,400,2024-01-30 01:40:00,,70b,CodeLlama,32768,,,GGUF,Meta,Lonestriker,
|
@@ -83,4 +86,34 @@ Meanwhile, CodeLlama 70b Q2_K benches as such, to compare with Miqu 70B Q2_K ori
|
|
83 |
- CodeLlama-70b-Instruct-hf-Q2_K.gguf,-,wikitext,8.5822,512,512,2024-01-30 01:40:00,RBF500000,70b,CodeLlama,32768,,,GGUF,Meta,Lonestriker,81
|
84 |
- CodeLlama-70b-Instruct-hf-Q2_K.gguf,-,wikitext,7.1098,512,512,2024-01-30 01:40:00,RBF100000,70b,CodeLlama,32768,,,GGUF,Meta,Lonestriker,81
|
85 |
- CodeLlama-70b-Instruct-hf-Q2_K.gguf,-,wikitext,6.8224,512,512,2024-01-30 01:40:00,RBF50000,70b,CodeLlama,32768,,,GGUF,Meta,Lonestriker,81
|
86 |
-
- CodeLlama-70b-Instruct-hf-Q2_K.gguf,-,wikitext,6.5705,512,512,2024-01-30 01:40:00,RBF10000,70b,CodeLlama,32768,,,GGUF,Meta,Lonestriker,81
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
Requantization of a Q5_K_M quant of a trending 70b model without better quant/fp16 available, this through a Q8_0 intermediary step.
|
2 |
|
3 |
Miku 70b has a theta of 1,000,000, like CodeLlama, and not 10,000, like Llama 2 models usually have.
|
4 |
+
That feature singularizes it to my knowledge to ALL Llama 2 models, beside Codellamas which also have a theta of 1,000,000..
|
5 |
|
6 |
+
-> So, no Alpha or Rope Base Frequency change is needed up to its base 32k context, if it works as intended.
|
7 |
And if it does, no linear/yarn rope is necessary either to reach the base 32k context.
|
8 |
|
9 |
+
BUT Miqu is NOT a CodeLlama 70b (released only a few days after Miqu 70b), because :
|
10 |
|
11 |
- If the Theta of CodeLlama 70b is claimed to be 1,000,000, its base rope actually seems to be 10,000 (see benchs..)
|
12 |
- Which means that CodeLlama might be context limited as Llama 2 is, instead of having a baseline of 100,000 ctx max..
|
13 |
+
- Meanwhile, Miku's max context is 32k, and not 4k like CodeLlama 70b, and 100,000 like the other CodeLlama.
|
14 |
+
- And also, Miku's perplexity is close to 70b Llama 2 (less than 4 at 512ctx), while CL 70b is around 5.5 at least.
|
15 |
+
- Beyond the perplexity, the benchs less sensitive to quantization (Hellaswag, Winogrande, but others as well) confirm this as well..
|
16 |
|
17 |
So, CodeLlama 70b is nerfed like the other CodeLlama in general benchmarks terms, while Miku is matching a FINETUNED Llama-2 expectations.
|
18 |
|
|
|
68 |
- miqu-1-70b.IQ3_XXS.gguf,-,wikitext,4.0309,512,512,2024-01-29 01:40:00,RBF1000000,70b,Mistral_Medium,32768,,,GGUF,miqudev,Nexesenex,655
|
69 |
- miqu-1-70b.IQ3_XXS.gguf,-,wikitext,3.5141,4096,4096,2024-01-29 01:40:00,,70b,Mistral_Medium,32768,,,GGUF,miqudev,Nexesenex,
|
70 |
|
71 |
+
---
|
72 |
+
|
73 |
Meanwhile, CodeLlama 70b Q2_K benches as such, to compare with Miqu 70B Q2_K originally quantized from FP16 by Miqudev :
|
74 |
|
75 |
- CodeLlama-70b-Instruct-hf-Q2_K.gguf,-,Hellaswag,76.5,,400,2024-01-30 01:40:00,,70b,CodeLlama,32768,,,GGUF,Meta,Lonestriker,
|
|
|
86 |
- CodeLlama-70b-Instruct-hf-Q2_K.gguf,-,wikitext,8.5822,512,512,2024-01-30 01:40:00,RBF500000,70b,CodeLlama,32768,,,GGUF,Meta,Lonestriker,81
|
87 |
- CodeLlama-70b-Instruct-hf-Q2_K.gguf,-,wikitext,7.1098,512,512,2024-01-30 01:40:00,RBF100000,70b,CodeLlama,32768,,,GGUF,Meta,Lonestriker,81
|
88 |
- CodeLlama-70b-Instruct-hf-Q2_K.gguf,-,wikitext,6.8224,512,512,2024-01-30 01:40:00,RBF50000,70b,CodeLlama,32768,,,GGUF,Meta,Lonestriker,81
|
89 |
+
- CodeLlama-70b-Instruct-hf-Q2_K.gguf,-,wikitext,6.5705,512,512,2024-01-30 01:40:00,RBF10000,70b,CodeLlama,32768,,,GGUF,Meta,Lonestriker,81
|
90 |
+
|
91 |
+
---
|
92 |
+
|
93 |
+
And, for information, a comparable base Llama 2 70b finetuned by NousResearch for 32k context (Yarn) :
|
94 |
+
|
95 |
+
- Yarn-Llama-2-70b-32k-Q3_K_S.gguf,-,Hellaswag,87,400,,2024-01-23 01:40:00,PEC8,70b,Llama_2,4096,,,GGUF,Meta,Artefact2,
|
96 |
+
- Yarn-Llama-2-70b-32k-Q3_K_S.gguf,-,Hellaswag_Bin,81.25,,400,2024-01-23 01:40:00,PEC8,70b,Llama_2,4096,,,GGUF,Meta,Artefact2,
|
97 |
+
- Yarn-Llama-2-70b-32k-Q3_K_S.gguf,-,Arc-Challenge,43.81270903,,299,2024-01-23 05:40:00,PEC8,70b,Llama_2,4096,,,GGUF,Meta,Artefact2,
|
98 |
+
- Yarn-Llama-2-70b-32k-Q3_K_S.gguf,-,Arc-Easy,65.6140,24.9890,570,2024-01-23 05:40:00,PEC8,70b,Llama_2,4096,,,GGUF,Meta,Artefact2,
|
99 |
+
- Yarn-Llama-2-70b-32k-Q3_K_S.gguf,-,MMLU,,,1548,2024-01-23 05:40:00,PEC8,70b,Llama_2,4096,,,GGUF,Meta,Artefact2,
|
100 |
+
- Yarn-Llama-2-70b-32k-Q3_K_S.gguf,-,Thruthful-QA,30.72215422,19.8590,817,2024-01-23 05:40:00,PEC8,70b,Llama_2,4096,,,GGUF,Meta,Artefact2,
|
101 |
+
- Yarn-Llama-2-70b-32k-Q3_K_S.gguf,-,Winogrande,78.1373,,1267,2024-01-23 05:40:00,PEC8,70b,Llama_2,4096,,,GGUF,Meta,Artefact2,
|
102 |
+
- Yarn-Llama-2-70b-32k-Q3_K_S.gguf,-,wikitext,3.6948,512,512,2024-01-23 01:40:00,PEC8,70b,Llama_2,4096,,,GGUF,Meta,Artefact2,
|
103 |
+
|
104 |
+
This yarn version performs closely to Llama 2 70b (but with 32k max context), and.. Much more poorly than Miqu 70b.
|
105 |
+
|
106 |
+
---
|
107 |
+
|
108 |
+
Also, for information, another requant from a Q4_K_S orphan of a 32k finetune of Sao10K's WinterGoddess 70b :
|
109 |
+
|
110 |
+
- WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1952-iMat-c32_ch2500-Q3_K_XS.gguf,-,Hellaswag,89.25,,400,2024-01-23 01:40:00,PEC2.5,70b,Llama_2,4096,,,GGUF,Mishima,Nexesenex,
|
111 |
+
- WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1952-iMat-c32_ch2500-Q3_K_XS.gguf,-,Hellaswag_Bin,84,,400,2024-01-23 01:40:00,PEC2.5,70b,Llama_2,4096,,,GGUF,Mishima,Nexesenex,
|
112 |
+
- WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1952-iMat-c32_ch2500-Q3_K_XS.gguf,-,Arc-Challenge,54.84949833,,299,2024-01-23 05:40:00,PEC2.5,70b,Llama_2,4096,,,GGUF,Mishima,Nexesenex,
|
113 |
+
- WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1952-iMat-c32_ch2500-Q3_K_XS.gguf,-,Arc-Easy,74.03508772,,570,2024-01-23 05:40:00,PEC2.5,70b,Llama_2,4096,,,GGUF,Mishima,Nexesenex,
|
114 |
+
- WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1952-iMat-c32_ch2500-Q3_K_XS.gguf,-,MMLU,,,1548,2024-01-23 05:40:00,PEC2.5,70b,Llama_2,4096,,,GGUF,Mishima,Nexesenex,
|
115 |
+
- WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1952-iMat-c32_ch2500-Q3_K_XS.gguf,-,Thruthful-QA,39.65728274,19.8590,817,2024-01-23 05:40:00,PEC2.5,70b,Llama_2,4096,,,GGUF,Mishima,Nexesenex,
|
116 |
+
- WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1952-iMat-c32_ch2500-Q3_K_XS.gguf,-,Winogrande,77.8216,,1267,2024-01-23 05:40:00,PEC2.5,70b,Llama_2,4096,,,GGUF,Mishima,Nexesenex,
|
117 |
+
- WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1952-iMat-c32_ch2500-Q3_K_XS.gguf,-,wikitext,4.2327,512,512,2024-01-23 01:40:00,PEC2.5,70b,Llama_2,4096,,,GGUF,Mishima,Nexesenex,
|
118 |
+
|
119 |
+
Draw your own conclusions !
|