File size: 12,707 Bytes
1e95854 6a2bafc 6c09369 8d2eea6 6a2bafc 8d2eea6 6c09369 7acd273 8d2eea6 7acd273 6c09369 8d2eea6 6c09369 7acd273 df8267a 7acd273 a36f92a 7acd273 694fd2b 7acd273 aeed848 cc44b1f 7acd273 e9f53c3 7acd273 6c09369 0a58489 c0066c8 0a58489 8d2eea6 6c09369 de1ec8a 6c09369 8d2eea6 55e3d55 45a1138 8d2eea6 55e3d55 8d2eea6 90192bb 8d2eea6 55e3d55 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 |
Requantizations of a Q5_K_M quant of a trending 70b model without better quant/fp16 available, this through a Q8_0 intermediary step.
Q3_K_M, Q2_K_S, and IQ3_XXS already available. IQ2_XS and Q3_K_S otw. Miqudev provided Q5_K_M, Q4_K_M, and Q2_K.
Miku 70b has a theta of 1,000,000, like CodeLlama, and not 10,000, like Llama 2 models usually have.
That feature singularizes it to my knowledge to ALL Llama 2 models, beside Codellamas which also have a theta of 1,000,000..
-> So, no Alpha or Rope Base Frequency change is needed up to its base 32k context, if it works as intended.
And if it does, no linear/yarn rope is necessary either to reach the base 32k context.
BUT Miqu is NOT a CodeLlama 70b (released only a few days after Miqu 70b), because :
- If the Theta of CodeLlama 70b is claimed to be 1,000,000, its base rope actually seems to be 10,000 (see benchs..)
- Which means that CodeLlama might be context limited as Llama 2 is, instead of having a baseline of 100,000 ctx max..
- Meanwhile, Miku's max context is 32k, and not 4k like CodeLlama 70b, and 100,000 like the other CodeLlama.
- And also, Miku's perplexity is close to 70b Llama 2 (less than 4 at 512ctx), while CL 70b is around 5.5 at least.
- Beyond the perplexity, the benchs less sensitive to quantization (Hellaswag, Winogrande, but others as well) confirm this as well..
So, CodeLlama 70b is nerfed like the other CodeLlama in general benchmarks terms, while Miku is matching a FINETUNED Llama-2 expectations.
Benchs I made with the original Q2_K quant of Miku 70b, made from the FP16 and published by Miqudev :
![image/png](https://cdn-uploads.huggingface.co/production/uploads/6451b24dc5d273f95482bfa4/wiDlIl1FMrVQo0fAcr3YO.png)
A graph, courtesy of Ipechman, with the TQA of WinterGooddess 32k at 39.65728274 and not 20.
Data :
- miqu-1-70b.q2_K.gguf,-,Hellaswag,87.75,,400,2024-01-29 01:40:00,,70b,Mistral_Medium,32768,,,GGUF,miqudev,miqudev,
- miqu-1-70b.q2_K.gguf,-,Hellaswag,86.5,,1000,2024-01-29 01:40:00,,70b,Mistral_Medium,32768,,,GGUF,miqudev,miqudev,
- miqu-1-70b.q2_K.gguf,-,Hellaswag,86,,2000,2024-01-29 01:40:00,,70b,Mistral_Medium,32768,,,GGUF,miqudev,miqudev,
- miqu-1-70b.q2_K.gguf,-,Hellaswag_Bin,81.5,,400,2024-01-29 01:40:00,,70b,Mistral_Medium,32768,,,GGUF,miqudev,miqudev,
- miqu-1-70b.q2_K.gguf,-,Hellaswag_Bin,83.7,,1000,2024-01-29 01:40:00,,70b,Mistral_Medium,32768,,,GGUF,miqudev,miqudev,
- miqu-1-70b.q2_K.gguf,-,Hellaswag_Bin,84,,2000,2024-01-29 01:40:00,,70b,Mistral_Medium,32768,,,GGUF,miqudev,miqudev,
- miqu-1-70b.q2_K.gguf,-,Arc-Challenge,56.18729097,,299,2024-01-29 05:40:00,,70b,Mistral_Medium,32768,,,GGUF,miqudev,miqudev,
- miqu-1-70b.q2_K.gguf,-,Arc-Easy,75.78947368,,570,2024-01-29 05:40:00,,70b,Mistral_Medium,32768,,,GGUF,miqudev,miqudev,
- miqu-1-70b.q2_K.gguf,-,MMLU,46.96485623,,313,2024-01-29 05:40:00,,70b,Mistral_Medium,32768,,,GGUF,miqudev,miqudev,
- miqu-1-70b.q2_K.gguf,-,Thruthful-QA,41.49326805,,817,2024-01-29 05:40:00,,70b,Mistral_Medium,32768,,,GGUF,miqudev,miqude
- miqu-1-70b.q2_K.gguf,-,Winogrande,78.2163,,1267,2024-01-29 05:40:00,,70b,Mistral_Medium,32768,,,GGUF,miqudev,miqudev,
- miqu-1-70b.q2_K.gguf,-,wikitext,4.6476,512,512,2024-01-29 01:40:00,RBF1000000,70b,Mistral_Medium,32768,,,GGUF,miqudev,miqudev,81
- miqu-1-70b.q2_K.gguf,-,wikitext,4.3063,512,512,2024-01-29 01:40:00,RBF1000000,70b,Mistral_Medium,32768,,,GGUF,miqudev,miqudev,655
- miqu-1-70b.q2_K.gguf,-,wikitext,4.6576,512,512,2024-01-29 01:40:00,RBF500000,70b,Mistral_Medium,32768,,,GGUF,miqudev,miqudev,81
- miqu-1-70b.q2_K.gguf,-,wikitext,4.7762,512,512,2024-01-29 01:40:00,RBF100000,70b,Mistral_Medium,32768,,,GGUF,miqudev,miqudev,81
- miqu-1-70b.q2_K.gguf,-,wikitext,4.8766,512,512,2024-01-29 01:40:00,RBF50000,70b,Mistral_Medium,32768,,,GGUF,miqudev,miqudev,81
- miqu-1-70b.q2_K.gguf,-,wikitext,5.3367,512,512,2024-01-29 01:40:00,RBF10000,70b,Mistral_Medium,32768,,,GGUF,miqudev,miqudev,81
- miqu-1-70b.q2_K.gguf,-,wikitext,3.8606,4096,4096,2024-01-29 01:40:00,,70b,Mistral_Medium,32768,,,GGUF,miqudev,miqudev,
- miqu-1-70b.q2_K.gguf,-,wikitext,3.6864,6144,6144,2024-01-29 01:40:00,,70b,Mistral_Medium,32768,,,GGUF,miqudev,miqudev,
Benchs I made with the Q3_K_M I quantized from Miqudev's Q5_K_M with an intermediary Q8_0 step, and an iMatrix of 12800 tokens from wiki.train.raw :
- miqu-1-70b.Q3_K_M.gguf,-,Hellaswag,88.75,,400,2024-01-29 01:40:00,,70b,Mistral_Medium,32768,,,GGUF,miqudev,Nexesenex,
- miqu-1-70b.Q3_K_M.gguf,-,Hellaswag,88.1,,1000,2024-01-29 01:40:00,,70b,Mistral_Medium,32768,,,GGUF,miqudev,Nexesenex,
- miqu-1-70b.Q3_K_M.gguf,-,Hellaswag,87.3,,2000,2024-01-29 01:40:00,,70b,Mistral_Medium,32768,,,GGUF,miqudev,Nexesenex,
- miqu-1-70b.Q3_K_M.gguf,-,Hellaswag_Bin,82,,400,2024-01-29 01:40:00,,70b,Mistral_Medium,32768,,,GGUF,miqudev,Nexesenex,
- miqu-1-70b.Q3_K_M.gguf,-,Hellaswag_Bin,85.1,,1000,2024-01-29 01:40:00,,70b,Mistral_Medium,32768,,,GGUF,miqudev,Nexesenex,
- miqu-1-70b.Q3_K_M.gguf,-,Hellaswag_Bin,84.85,,2000,2024-01-29 01:40:00,,70b,Mistral_Medium,32768,,,GGUF,miqudev,Nexesenex,
- miqu-1-70b.Q3_K_M.gguf,-,Arc-Challenge,57.19063545,,299,2024-01-29 05:40:00,,70b,Mistral_Medium,32768,,,GGUF,miqudev,Nexesenex,
- miqu-1-70b.Q3_K_M.gguf,-,Arc-Easy,77.19298246,,570,2024-01-29 05:40:00,,70b,Mistral_Medium,32768,,,GGUF,miqudev,Nexesenex,
- miqu-1-70b.Q3_K_M.gguf,-,MMLU,50.15974441,,313,2024-01-29 05:40:00,,70b,Mistral_Medium,32768,,,GGUF,miqudev,Nexesenex,
- miqu-1-70b.Q3_K_M.gguf,-,Thruthful-QA,41.49326805,,817,2024-01-29 05:40:00,,70b,Mistral_Medium,32768,,,GGUF,miqudev,Nexesenex,
- miqu-1-70b.Q3_K_M.gguf,-,Winogrande,78.8477,,1267,2024-01-29 05:40:00,,70b,Mistral_Medium,32768,,,GGUF,miqudev,Nexesenex,
- miqu-1-70b.Q3_K_M.gguf,-,wikitext,4.2957,512,512,2024-01-29 01:40:00,RBF1000000,70b,Mistral_Medium,32768,,,GGUF,miqudev,Nexesenex,81
- miqu-1-70b.Q3_K_M.gguf,-,wikitext,3.8380,512,512,2024-01-29 01:40:00,RBF1000000,70b,Mistral_Medium,32768,,,GGUF,miqudev,Nexesenex,655
And now, the IQ3_XXS, new SOTA 3 bits quant from LlamaCPP, that I made in the same way :
- miqu-1-70b.IQ3_XXS.gguf,-,Hellaswag,89,,400,2024-01-29 01:40:00,,70b,Mistral_Medium,32768,,,GGUF,miqudev,Nexesenex,
- miqu-1-70b.IQ3_XXS.gguf,-,Hellaswag,88.3,,1000,2024-01-29 01:40:00,,70b,Mistral_Medium,32768,,,GGUF,miqudev,Nexesenex,
- miqu-1-70b.IQ3_XXS.gguf,-,Hellaswag_Bin,82.75,,400,2024-01-29 01:40:00,,70b,Mistral_Medium,32768,,,GGUF,miqudev,Nexesenex,
- miqu-1-70b.IQ3_XXS.gguf,-,Hellaswag_Bin,85,,1000,2024-01-29 01:40:00,,70b,Mistral_Medium,32768,,,GGUF,miqudev,Nexesenex,
- miqu-1-70b.IQ3_XXS.gguf,-,Arc-Challenge,55.85284281,,299,2024-01-29 05:40:00,,70b,Mistral_Medium,32768,,,GGUF,miqudev,Nexesenex,
- miqu-1-70b.IQ3_XXS.gguf,-,Arc-Easy,78.59649123,,570,2024-01-29 05:40:00,,70b,Mistral_Medium,32768,,,GGUF,miqudev,Nexesenex,
- miqu-1-70b.IQ3_XXS.gguf,-,MMLU,48.88178914,,313,2024-01-29 05:40:00,,70b,Mistral_Medium,32768,,,GGUF,miqudev,Nexesenex,
- miqu-1-70b.IQ3_XXS.gguf,-,Thruthful-QA,41.73806610,,817,2024-01-29 05:40:00,,70b,Mistral_Medium,32768,,,GGUF,miqudev,Nexesenex,
- miqu-1-70b.IQ3_XXS.gguf,-,Winogrande,78.3741,,1267,2024-01-29 05:40:00,,70b,Mistral_Medium,32768,,,GGUF,miqudev,Nexesenex,
- miqu-1-70b.IQ3_XXS.gguf,-,wikitext,4.4319,512,512,2024-01-29 01:40:00,RBF1000000,70b,Mistral_Medium,32768,,,GGUF,miqudev,Nexesenex,81
- miqu-1-70b.IQ3_XXS.gguf,-,wikitext,4.0309,512,512,2024-01-29 01:40:00,RBF1000000,70b,Mistral_Medium,32768,,,GGUF,miqudev,Nexesenex,655
- miqu-1-70b.IQ3_XXS.gguf,-,wikitext,3.5141,4096,4096,2024-01-29 01:40:00,,70b,Mistral_Medium,32768,,,GGUF,miqudev,Nexesenex,
---
Meanwhile, CodeLlama 70b Q2_K benches as such, to compare with Miqu 70B Q2_K originally quantized from FP16 by Miqudev :
- CodeLlama-70b-Instruct-hf-Q2_K.gguf,-,Hellaswag,76.5,,400,2024-01-30 01:40:00,,70b,CodeLlama,32768,,,GGUF,Meta,Lonestriker,
- CodeLlama-70b-Instruct-hf-Q2_K.gguf,-,Hellaswag,76.2,,1000,2024-01-30 01:40:00,,70b,CodeLlama,32768,,,GGUF,Meta,Lonestriker,
- CodeLlama-70b-Instruct-hf-Q2_K.gguf,-,Hellaswag_Bin,69.75,,400,2024-01-30 01:40:00,,70b,CodeLlama,32768,,,GGUF,Meta,Lonestriker,
- CodeLlama-70b-Instruct-hf-Q2_K.gguf,-,Hellaswag_Bin,72.5,,1000,2024-01-30 01:40:00,,70b,CodeLlama,32768,,,GGUF,Meta,Lonestriker,
- CodeLlama-70b-Instruct-hf-Q2_K.gguf,-,Arc-Challenge,35.11705686,,299,2024-01-30 05:40:00,,70b,CodeLlama,32768,,,GGUF,Meta,Lonestriker,
- CodeLlama-70b-Instruct-hf-Q2_K.gguf,-,Arc-Easy,58.77192982,,570,2024-01-30 05:40:00,,70b,CodeLlama,32768,,,GGUF,Meta,Lonestriker,
- CodeLlama-70b-Instruct-hf-Q2_K.gguf,-,MMLU,36.10223642,,313,2024-01-30 05:40:00,,70b,CodeLlama,32768,,,GGUF,Meta,Lonestriker,
- CodeLlama-70b-Instruct-hf-Q2_K.gguf,-,Thruthful-QA,31.08935129,,817,2024-01-30 05:40:00,,70b,CodeLlama,32768,,,GGUF,Meta,Lonestriker,
- CodeLlama-70b-Instruct-hf-Q2_K.gguf,-,Winogrande,70.3236,,1267,2024-01-30 05:40:00,,70b,CodeLlama,32768,,,GGUF,Meta,Lonestriker,
- CodeLlama-70b-Instruct-hf-Q2_K.gguf,-,wikitext,6.4634,512,512,2024-01-30 01:40:00,RBF10000,70b,CodeLlama,32768,,,GGUF,Meta,Lonestriker,655
- CodeLlama-70b-Instruct-hf-Q2_K.gguf,-,wikitext,9.7866,512,512,2024-01-30 01:40:00,RBF1000000,70b,CodeLlama,32768,,,GGUF,Meta,Lonestriker,81
- CodeLlama-70b-Instruct-hf-Q2_K.gguf,-,wikitext,8.5822,512,512,2024-01-30 01:40:00,RBF500000,70b,CodeLlama,32768,,,GGUF,Meta,Lonestriker,81
- CodeLlama-70b-Instruct-hf-Q2_K.gguf,-,wikitext,7.1098,512,512,2024-01-30 01:40:00,RBF100000,70b,CodeLlama,32768,,,GGUF,Meta,Lonestriker,81
- CodeLlama-70b-Instruct-hf-Q2_K.gguf,-,wikitext,6.8224,512,512,2024-01-30 01:40:00,RBF50000,70b,CodeLlama,32768,,,GGUF,Meta,Lonestriker,81
- CodeLlama-70b-Instruct-hf-Q2_K.gguf,-,wikitext,6.5705,512,512,2024-01-30 01:40:00,RBF10000,70b,CodeLlama,32768,,,GGUF,Meta,Lonestriker,81
- CodeLlama-70b-Instruct-hf-Q2_K.gguf,-,wikitext,5.6064,4096,4096,2024-01-30 01:40:00,,70b,CodeLlama,32768,,,GGUF,Meta,Lonestriker,
- CodeLlama-70b-Instruct-hf-Q2_K.gguf,-,wikitext,153.5606,6144,6144,2024-01-30 01:40:00,,70b,CodeLlama,32768,,,GGUF,Meta,Lonestriker,
---
And, for information, a comparable base Llama 2 70b finetuned by NousResearch for 32k context (Yarn) :
- Yarn-Llama-2-70b-32k-Q3_K_S.gguf,-,Hellaswag,87,400,,2024-01-23 01:40:00,PEC8,70b,Llama_2,4096,,,GGUF,Meta,Artefact2,
- Yarn-Llama-2-70b-32k-Q3_K_S.gguf,-,Hellaswag_Bin,81.25,,400,2024-01-23 01:40:00,PEC8,70b,Llama_2,4096,,,GGUF,Meta,Artefact2,
- Yarn-Llama-2-70b-32k-Q3_K_S.gguf,-,Arc-Challenge,43.81270903,,299,2024-01-23 05:40:00,PEC8,70b,Llama_2,4096,,,GGUF,Meta,Artefact2,
- Yarn-Llama-2-70b-32k-Q3_K_S.gguf,-,Arc-Easy,65.6140,24.9890,570,2024-01-23 05:40:00,PEC8,70b,Llama_2,4096,,,GGUF,Meta,Artefact2,
- Yarn-Llama-2-70b-32k-IQ2_XS.gguf,-,MMLU,36.06557377,,1159,2024-01-24 05:40:00,PEC8,70b,Llama_2,4096,,,GGUF,Meta,Artefact2,
- Yarn-Llama-2-70b-32k-Q3_K_S.gguf,-,Thruthful-QA,30.72215422,19.8590,817,2024-01-23 05:40:00,PEC8,70b,Llama_2,4096,,,GGUF,Meta,Artefact2,
- Yarn-Llama-2-70b-32k-Q3_K_S.gguf,-,Winogrande,78.1373,,1267,2024-01-23 05:40:00,PEC8,70b,Llama_2,4096,,,GGUF,Meta,Artefact2,
- Yarn-Llama-2-70b-32k-Q3_K_S.gguf,-,wikitext,3.6948,512,512,2024-01-23 01:40:00,PEC8,70b,Llama_2,4096,,,GGUF,Meta,Artefact2,
This yarn version performs closely to Llama 2 70b (but with 32k max context), and.. Much more poorly than Miqu 70b.
---
Also, for information, another requant from a Q4_K_S orphan of a 32k finetune of Sao10K's WinterGoddess 70b At Linear rope 2.5 (for 10k context) :
- WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1952-iMat-c32_ch2500-Q3_K_XS.gguf,-,Hellaswag,89.25,,400,2024-01-23 01:40:00,PEC2.5,70b,Llama_2,4096,,,GGUF,Mishima,Nexesenex,
- WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1952-iMat-c32_ch2500-Q3_K_XS.gguf,-,Hellaswag_Bin,84,,400,2024-01-23 01:40:00,PEC2.5,70b,Llama_2,4096,,,GGUF,Mishima,Nexesenex,
- WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1952-iMat-c32_ch2500-Q3_K_XS.gguf,-,Arc-Challenge,54.84949833,,299,2024-01-23 05:40:00,PEC2.5,70b,Llama_2,4096,,,GGUF,Mishima,Nexesenex,
- WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1952-iMat-c32_ch2500-Q3_K_XS.gguf,-,Arc-Easy,74.03508772,,570,2024-01-23 05:40:00,PEC2.5,70b,Llama_2,4096,,,GGUF,Mishima,Nexesenex,
- WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1952-iMat-c32_ch2500-Q3_K_XS.gguf,-,Thruthful-QA,39.65728274,19.8590,817,2024-01-23 05:40:00,PEC2.5,70b,Llama_2,4096,,,GGUF,Mishima,Nexesenex,
- WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1952-iMat-c32_ch2500-Q3_K_XS.gguf,-,Winogrande,77.8216,,1267,2024-01-23 05:40:00,PEC2.5,70b,Llama_2,4096,,,GGUF,Mishima,Nexesenex,
- WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1952-iMat-c32_ch2500-Q3_K_XS.gguf,-,wikitext,4.2327,512,512,2024-01-23 01:40:00,PEC2.5,70b,Llama_2,4096,,,GGUF,Mishima,Nexesenex,
Draw your own conclusions as well ! |