Update README.md
Browse files
README.md
CHANGED
@@ -40,6 +40,21 @@ It achieves (running quantized) in
|
|
40 |
- German EQ Bench: Score (v2_de): 62.59 (Parseable: 171.0).
|
41 |
- English EQ Bench: Score (v2): 76.43 (Parseable: 171.0).
|
42 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
43 |
| Model |AGIEval|GPT4All|TruthfulQA|Bigbench|Average|
|
44 |
|--------------------------------------------------------------|------:|------:|---------:|-------:|------:|
|
45 |
|[Spaetzle-v69-7b](https://huggingface.co/cstr/Spaetzle-v69-7b)| 44.48| 75.84| 66.15| 46.59| 58.27|
|
|
|
40 |
- German EQ Bench: Score (v2_de): 62.59 (Parseable: 171.0).
|
41 |
- English EQ Bench: Score (v2): 76.43 (Parseable: 171.0).
|
42 |
|
43 |
+
[Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard):
|
44 |
+
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_cstr__Spaetzle-v69-7b)
|
45 |
+
|
46 |
+
| Metric |Value|
|
47 |
+
|---------------------------------|----:|
|
48 |
+
|Avg. |72.87|
|
49 |
+
|AI2 Reasoning Challenge (25-Shot)|69.54|
|
50 |
+
|HellaSwag (10-Shot) |86.77|
|
51 |
+
|MMLU (5-Shot) |64.63|
|
52 |
+
|TruthfulQA (0-shot) |65.61|
|
53 |
+
|Winogrande (5-shot) |81.93|
|
54 |
+
|GSM8k (5-shot) |68.76|
|
55 |
+
|
56 |
+
Nous benchmark results:
|
57 |
+
|
58 |
| Model |AGIEval|GPT4All|TruthfulQA|Bigbench|Average|
|
59 |
|--------------------------------------------------------------|------:|------:|---------:|-------:|------:|
|
60 |
|[Spaetzle-v69-7b](https://huggingface.co/cstr/Spaetzle-v69-7b)| 44.48| 75.84| 66.15| 46.59| 58.27|
|