leaderboard-pr-bot
commited on
Adding Evaluation Results
Browse filesThis is an automated PR created with https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr
The purpose of this PR is to add evaluation results from the Open LLM Leaderboard to your model card.
If you encounter any issues, please report them to https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr/discussions
README.md
CHANGED
@@ -1,4 +1,6 @@
|
|
1 |
---
|
|
|
|
|
2 |
license: mit
|
3 |
datasets:
|
4 |
- pints-ai/Expository-Prose-V1
|
@@ -10,40 +12,21 @@ datasets:
|
|
10 |
- togethercomputer/llama-instruct
|
11 |
- LDJnr/Capybara
|
12 |
- HuggingFaceH4/ultrafeedback_binarized
|
13 |
-
language:
|
14 |
-
- en
|
15 |
-
model-index:
|
16 |
-
- name: 1.5-Pints
|
17 |
-
results:
|
18 |
-
- task:
|
19 |
-
type: text-generation
|
20 |
-
dataset:
|
21 |
-
name: MTBench
|
22 |
-
type: ai2_arc
|
23 |
-
metrics:
|
24 |
-
- name: MTBench
|
25 |
-
type: LLM-as-a-Judge
|
26 |
-
value: 3.73
|
27 |
-
source:
|
28 |
-
name: MTBench
|
29 |
-
url: https://huggingface.co/spaces/lmsys/mt-bench
|
30 |
pipeline_tag: text-generation
|
31 |
-
extra_gated_prompt:
|
32 |
-
|
33 |
-
|
34 |
-
|
35 |
-
|
36 |
-
|
37 |
-
|
38 |
-
|
39 |
-
|
40 |
-
of
|
41 |
-
|
42 |
-
of
|
43 |
-
|
44 |
-
|
45 |
-
the 'fair use' clause of Copyright Law, in hopes that this will aid the
|
46 |
-
research community in bringing LLMs to the next frontier.
|
47 |
extra_gated_fields:
|
48 |
Company: text
|
49 |
Country: country
|
@@ -56,6 +39,113 @@ extra_gated_fields:
|
|
56 |
- label: Other
|
57 |
value: other
|
58 |
I agree to use this model for in accordance to the afore-mentioned Terms of Use: checkbox
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
59 |
---
|
60 |
|
61 |
# 1.5-Pints -- A model pretrained in 9 days by using high quality data
|
@@ -257,4 +347,17 @@ Though best efforts has been made to ensure, as much as possible, that all texts
|
|
257 |
|
258 |
Additionally, the **user agrees to bear any damages** arising as a direct cause (or otherwise) of using any artifacts released by the pints research team, as well as full responsibility for the consequences of his / her usage (or implementation) of any such released artifacts. The user also indemnifies Pints Research Team (and any of its members or agents) of any damage, related or unrelated, to the release or subsequent usage of any findings, artifacts or code by the team.
|
259 |
|
260 |
-
For the avoidance of doubt, **any artifacts released by the Pints Research team are done so in accordance with the "fair use"** clause of Copyright Law, in hopes that this will aid the research community in bringing LLMs to the next frontier.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
+
language:
|
3 |
+
- en
|
4 |
license: mit
|
5 |
datasets:
|
6 |
- pints-ai/Expository-Prose-V1
|
|
|
12 |
- togethercomputer/llama-instruct
|
13 |
- LDJnr/Capybara
|
14 |
- HuggingFaceH4/ultrafeedback_binarized
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
15 |
pipeline_tag: text-generation
|
16 |
+
extra_gated_prompt: Though best efforts has been made to ensure, as much as possible,
|
17 |
+
that all texts in the training corpora are royalty free, this does not constitute
|
18 |
+
a legal guarantee that such is the case. **By using any of the models, corpora or
|
19 |
+
part thereof, the user agrees to bear full responsibility to do the necessary due
|
20 |
+
diligence to ensure that he / she is in compliance with their local copyright laws.
|
21 |
+
Additionally, the user agrees to bear any damages arising as a direct cause (or
|
22 |
+
otherwise) of using any artifacts released by the pints research team, as well as
|
23 |
+
full responsibility for the consequences of his / her usage (or implementation)
|
24 |
+
of any such released artifacts. The user also indemnifies Pints Research Team (and
|
25 |
+
any of its members or agents) of any damage, related or unrelated, to the release
|
26 |
+
or subsequent usage of any findings, artifacts or code by the team. For the avoidance
|
27 |
+
of doubt, any artifacts released by the Pints Research team are done so in accordance
|
28 |
+
with the 'fair use' clause of Copyright Law, in hopes that this will aid the research
|
29 |
+
community in bringing LLMs to the next frontier.
|
|
|
|
|
30 |
extra_gated_fields:
|
31 |
Company: text
|
32 |
Country: country
|
|
|
39 |
- label: Other
|
40 |
value: other
|
41 |
I agree to use this model for in accordance to the afore-mentioned Terms of Use: checkbox
|
42 |
+
model-index:
|
43 |
+
- name: 1.5-Pints
|
44 |
+
results:
|
45 |
+
- task:
|
46 |
+
type: text-generation
|
47 |
+
dataset:
|
48 |
+
name: MTBench
|
49 |
+
type: ai2_arc
|
50 |
+
metrics:
|
51 |
+
- type: LLM-as-a-Judge
|
52 |
+
value: 3.73
|
53 |
+
name: MTBench
|
54 |
+
source:
|
55 |
+
url: https://huggingface.co/spaces/lmsys/mt-bench
|
56 |
+
name: MTBench
|
57 |
+
- task:
|
58 |
+
type: text-generation
|
59 |
+
name: Text Generation
|
60 |
+
dataset:
|
61 |
+
name: IFEval (0-Shot)
|
62 |
+
type: HuggingFaceH4/ifeval
|
63 |
+
args:
|
64 |
+
num_few_shot: 0
|
65 |
+
metrics:
|
66 |
+
- type: inst_level_strict_acc and prompt_level_strict_acc
|
67 |
+
value: 17.62
|
68 |
+
name: strict accuracy
|
69 |
+
source:
|
70 |
+
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=pints-ai/1.5-Pints-2K-v0.1
|
71 |
+
name: Open LLM Leaderboard
|
72 |
+
- task:
|
73 |
+
type: text-generation
|
74 |
+
name: Text Generation
|
75 |
+
dataset:
|
76 |
+
name: BBH (3-Shot)
|
77 |
+
type: BBH
|
78 |
+
args:
|
79 |
+
num_few_shot: 3
|
80 |
+
metrics:
|
81 |
+
- type: acc_norm
|
82 |
+
value: 2.37
|
83 |
+
name: normalized accuracy
|
84 |
+
source:
|
85 |
+
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=pints-ai/1.5-Pints-2K-v0.1
|
86 |
+
name: Open LLM Leaderboard
|
87 |
+
- task:
|
88 |
+
type: text-generation
|
89 |
+
name: Text Generation
|
90 |
+
dataset:
|
91 |
+
name: MATH Lvl 5 (4-Shot)
|
92 |
+
type: hendrycks/competition_math
|
93 |
+
args:
|
94 |
+
num_few_shot: 4
|
95 |
+
metrics:
|
96 |
+
- type: exact_match
|
97 |
+
value: 0.0
|
98 |
+
name: exact match
|
99 |
+
source:
|
100 |
+
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=pints-ai/1.5-Pints-2K-v0.1
|
101 |
+
name: Open LLM Leaderboard
|
102 |
+
- task:
|
103 |
+
type: text-generation
|
104 |
+
name: Text Generation
|
105 |
+
dataset:
|
106 |
+
name: GPQA (0-shot)
|
107 |
+
type: Idavidrein/gpqa
|
108 |
+
args:
|
109 |
+
num_few_shot: 0
|
110 |
+
metrics:
|
111 |
+
- type: acc_norm
|
112 |
+
value: 0.0
|
113 |
+
name: acc_norm
|
114 |
+
source:
|
115 |
+
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=pints-ai/1.5-Pints-2K-v0.1
|
116 |
+
name: Open LLM Leaderboard
|
117 |
+
- task:
|
118 |
+
type: text-generation
|
119 |
+
name: Text Generation
|
120 |
+
dataset:
|
121 |
+
name: MuSR (0-shot)
|
122 |
+
type: TAUR-Lab/MuSR
|
123 |
+
args:
|
124 |
+
num_few_shot: 0
|
125 |
+
metrics:
|
126 |
+
- type: acc_norm
|
127 |
+
value: 1.84
|
128 |
+
name: acc_norm
|
129 |
+
source:
|
130 |
+
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=pints-ai/1.5-Pints-2K-v0.1
|
131 |
+
name: Open LLM Leaderboard
|
132 |
+
- task:
|
133 |
+
type: text-generation
|
134 |
+
name: Text Generation
|
135 |
+
dataset:
|
136 |
+
name: MMLU-PRO (5-shot)
|
137 |
+
type: TIGER-Lab/MMLU-Pro
|
138 |
+
config: main
|
139 |
+
split: test
|
140 |
+
args:
|
141 |
+
num_few_shot: 5
|
142 |
+
metrics:
|
143 |
+
- type: acc
|
144 |
+
value: 1.15
|
145 |
+
name: accuracy
|
146 |
+
source:
|
147 |
+
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=pints-ai/1.5-Pints-2K-v0.1
|
148 |
+
name: Open LLM Leaderboard
|
149 |
---
|
150 |
|
151 |
# 1.5-Pints -- A model pretrained in 9 days by using high quality data
|
|
|
347 |
|
348 |
Additionally, the **user agrees to bear any damages** arising as a direct cause (or otherwise) of using any artifacts released by the pints research team, as well as full responsibility for the consequences of his / her usage (or implementation) of any such released artifacts. The user also indemnifies Pints Research Team (and any of its members or agents) of any damage, related or unrelated, to the release or subsequent usage of any findings, artifacts or code by the team.
|
349 |
|
350 |
+
For the avoidance of doubt, **any artifacts released by the Pints Research team are done so in accordance with the "fair use"** clause of Copyright Law, in hopes that this will aid the research community in bringing LLMs to the next frontier.
|
351 |
+
# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
|
352 |
+
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_pints-ai__1.5-Pints-2K-v0.1)
|
353 |
+
|
354 |
+
| Metric |Value|
|
355 |
+
|-------------------|----:|
|
356 |
+
|Avg. | 3.83|
|
357 |
+
|IFEval (0-Shot) |17.62|
|
358 |
+
|BBH (3-Shot) | 2.37|
|
359 |
+
|MATH Lvl 5 (4-Shot)| 0.00|
|
360 |
+
|GPQA (0-shot) | 0.00|
|
361 |
+
|MuSR (0-shot) | 1.84|
|
362 |
+
|MMLU-PRO (5-shot) | 1.15|
|
363 |
+
|