leaderboard-pr-bot commited on
Commit
a8e7a6d
·
verified ·
1 Parent(s): b308187

Adding Evaluation Results

Browse files

This is an automated PR created with https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr

The purpose of this PR is to add evaluation results from the Open LLM Leaderboard to your model card.

If you encounter any issues, please report them to https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr/discussions

Files changed (1) hide show
  1. README.md +189 -76
README.md CHANGED
@@ -1,85 +1,185 @@
1
  ---
 
 
 
2
  tags:
3
  - generated_from_trainer
4
  - finance
5
  model-index:
6
  - name: completed-model
7
  results:
8
- - task:
9
- type: text-generation
10
- dataset:
11
- name: ai2_arc
12
- type: ai2_arc
13
- metrics:
14
- - name: AI2 Reasoning Challenge (25-Shot)
15
- type: AI2 Reasoning Challenge (25-Shot)
16
- value: 71.93
17
- source:
18
- name: Open LLM Leaderboard
19
- url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard
20
- - task:
21
- type: text-generation
22
- dataset:
23
- name: hellaswag
24
- type: hellaswag
25
- metrics:
26
- - name: HellaSwag (10-shot)
27
- type: HellaSwag (10-shot)
28
- value: 86.82
29
- source:
30
- name: Open LLM Leaderboard
31
- url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard
32
- - task:
33
- type: text-generation
34
- dataset:
35
- name: multiple
36
- type: miltiple
37
- metrics:
38
- - name: MMLU (5-shot)
39
- type: MMLU (5-shot)
40
- value: 70.38
41
- source:
42
- name: Open LLM Leaderboard
43
- url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard
44
- - task:
45
- type: text-generation
46
- dataset:
47
- name: truthful_qa
48
- type: truthful_qa
49
- metrics:
50
- - name: TruthfulQA (0-shot)
51
- type: TruthfulQA (0-shot)
52
- value: 65.21
53
- source:
54
- name: Open LLM Leaderboard
55
- url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard
56
- - task:
57
- type: text-generation
58
- dataset:
59
- name: winogrande
60
- type: winogrande
61
- metrics:
62
- - name: Winogrande (5-shot)
63
- type: Winogrande (5-shot)
64
- value: 83.58
65
- source:
66
- name: Open LLM Leaderboard
67
- url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard
68
- - task:
69
- type: text-generation
70
- dataset:
71
- name: gsm8k
72
- type: gsm8k
73
- metrics:
74
- - name: GSM8k (5-shot)
75
- type: GSM8k (5-shot)
76
- value: 61.79
77
- source:
78
- name: Open LLM Leaderboard
79
- url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard
80
- license: llama2
81
- language:
82
- - en
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
83
  ---
84
 
85
  **Albatross** is a collection of domain-specific language models for finance applications developed by [Gradient](https://gradient.ai/).
@@ -209,4 +309,17 @@ Gradient is accelerating AI transformation across industries. https://gradient.a
209
 
210
  ## Contact Us
211
 
212
- Drop an email to [[email protected]](mailto:[email protected])
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language:
3
+ - en
4
+ license: llama2
5
  tags:
6
  - generated_from_trainer
7
  - finance
8
  model-index:
9
  - name: completed-model
10
  results:
11
+ - task:
12
+ type: text-generation
13
+ dataset:
14
+ name: ai2_arc
15
+ type: ai2_arc
16
+ metrics:
17
+ - type: AI2 Reasoning Challenge (25-Shot)
18
+ value: 71.93
19
+ name: AI2 Reasoning Challenge (25-Shot)
20
+ source:
21
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard
22
+ name: Open LLM Leaderboard
23
+ - task:
24
+ type: text-generation
25
+ dataset:
26
+ name: hellaswag
27
+ type: hellaswag
28
+ metrics:
29
+ - type: HellaSwag (10-shot)
30
+ value: 86.82
31
+ name: HellaSwag (10-shot)
32
+ source:
33
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard
34
+ name: Open LLM Leaderboard
35
+ - task:
36
+ type: text-generation
37
+ dataset:
38
+ name: multiple
39
+ type: miltiple
40
+ metrics:
41
+ - type: MMLU (5-shot)
42
+ value: 70.38
43
+ name: MMLU (5-shot)
44
+ source:
45
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard
46
+ name: Open LLM Leaderboard
47
+ - task:
48
+ type: text-generation
49
+ dataset:
50
+ name: truthful_qa
51
+ type: truthful_qa
52
+ metrics:
53
+ - type: TruthfulQA (0-shot)
54
+ value: 65.21
55
+ name: TruthfulQA (0-shot)
56
+ source:
57
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard
58
+ name: Open LLM Leaderboard
59
+ - task:
60
+ type: text-generation
61
+ dataset:
62
+ name: winogrande
63
+ type: winogrande
64
+ metrics:
65
+ - type: Winogrande (5-shot)
66
+ value: 83.58
67
+ name: Winogrande (5-shot)
68
+ source:
69
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard
70
+ name: Open LLM Leaderboard
71
+ - task:
72
+ type: text-generation
73
+ dataset:
74
+ name: gsm8k
75
+ type: gsm8k
76
+ metrics:
77
+ - type: GSM8k (5-shot)
78
+ value: 61.79
79
+ name: GSM8k (5-shot)
80
+ source:
81
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard
82
+ name: Open LLM Leaderboard
83
+ - task:
84
+ type: text-generation
85
+ name: Text Generation
86
+ dataset:
87
+ name: AI2 Reasoning Challenge (25-Shot)
88
+ type: ai2_arc
89
+ config: ARC-Challenge
90
+ split: test
91
+ args:
92
+ num_few_shot: 25
93
+ metrics:
94
+ - type: acc_norm
95
+ value: 71.93
96
+ name: normalized accuracy
97
+ source:
98
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=gradientai/v-alpha-tross
99
+ name: Open LLM Leaderboard
100
+ - task:
101
+ type: text-generation
102
+ name: Text Generation
103
+ dataset:
104
+ name: HellaSwag (10-Shot)
105
+ type: hellaswag
106
+ split: validation
107
+ args:
108
+ num_few_shot: 10
109
+ metrics:
110
+ - type: acc_norm
111
+ value: 86.82
112
+ name: normalized accuracy
113
+ source:
114
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=gradientai/v-alpha-tross
115
+ name: Open LLM Leaderboard
116
+ - task:
117
+ type: text-generation
118
+ name: Text Generation
119
+ dataset:
120
+ name: MMLU (5-Shot)
121
+ type: cais/mmlu
122
+ config: all
123
+ split: test
124
+ args:
125
+ num_few_shot: 5
126
+ metrics:
127
+ - type: acc
128
+ value: 70.38
129
+ name: accuracy
130
+ source:
131
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=gradientai/v-alpha-tross
132
+ name: Open LLM Leaderboard
133
+ - task:
134
+ type: text-generation
135
+ name: Text Generation
136
+ dataset:
137
+ name: TruthfulQA (0-shot)
138
+ type: truthful_qa
139
+ config: multiple_choice
140
+ split: validation
141
+ args:
142
+ num_few_shot: 0
143
+ metrics:
144
+ - type: mc2
145
+ value: 65.21
146
+ source:
147
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=gradientai/v-alpha-tross
148
+ name: Open LLM Leaderboard
149
+ - task:
150
+ type: text-generation
151
+ name: Text Generation
152
+ dataset:
153
+ name: Winogrande (5-shot)
154
+ type: winogrande
155
+ config: winogrande_xl
156
+ split: validation
157
+ args:
158
+ num_few_shot: 5
159
+ metrics:
160
+ - type: acc
161
+ value: 83.58
162
+ name: accuracy
163
+ source:
164
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=gradientai/v-alpha-tross
165
+ name: Open LLM Leaderboard
166
+ - task:
167
+ type: text-generation
168
+ name: Text Generation
169
+ dataset:
170
+ name: GSM8k (5-shot)
171
+ type: gsm8k
172
+ config: main
173
+ split: test
174
+ args:
175
+ num_few_shot: 5
176
+ metrics:
177
+ - type: acc
178
+ value: 61.79
179
+ name: accuracy
180
+ source:
181
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=gradientai/v-alpha-tross
182
+ name: Open LLM Leaderboard
183
  ---
184
 
185
  **Albatross** is a collection of domain-specific language models for finance applications developed by [Gradient](https://gradient.ai/).
 
309
 
310
  ## Contact Us
311
 
312
+ Drop an email to [[email protected]](mailto:[email protected])
313
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
314
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_gradientai__v-alpha-tross)
315
+
316
+ | Metric |Value|
317
+ |---------------------------------|----:|
318
+ |Avg. |73.28|
319
+ |AI2 Reasoning Challenge (25-Shot)|71.93|
320
+ |HellaSwag (10-Shot) |86.82|
321
+ |MMLU (5-Shot) |70.38|
322
+ |TruthfulQA (0-shot) |65.21|
323
+ |Winogrande (5-shot) |83.58|
324
+ |GSM8k (5-shot) |61.79|
325
+