Files changed (1) hide show
  1. README.md +111 -2
README.md CHANGED
@@ -3,8 +3,6 @@ language:
3
  - en
4
  license: other
5
  library_name: transformers
6
- license_name: tongyi-qianwen
7
- license_link: https://huggingface.co/Qwen/Qwen2.5-72B-Instruct/blob/main/LICENSE
8
  tags:
9
  - chat
10
  - qwen
@@ -16,9 +14,106 @@ base_model: Qwen/Qwen2.5-72B
16
  datasets:
17
  - MaziyarPanahi/truthy-dpo-v0.1-axolotl
18
  model_name: calme-2.1-qwen2.5-72b
 
 
19
  pipeline_tag: text-generation
20
  inference: false
21
  model_creator: MaziyarPanahi
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
22
  ---
23
 
24
  <img src="./calme-2.webp" alt="Calme-2 Models" width="800" style="margin-left:'auto' margin-right:'auto' display:'block'"/>
@@ -91,3 +186,17 @@ model = AutoModelForCausalLM.from_pretrained("MaziyarPanahi/calme-2.1-qwen2.5-72
91
  # Ethical Considerations
92
 
93
  As with any large language model, users should be aware of potential biases and limitations. We recommend implementing appropriate safeguards and human oversight when deploying this model in production environments.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  - en
4
  license: other
5
  library_name: transformers
 
 
6
  tags:
7
  - chat
8
  - qwen
 
14
  datasets:
15
  - MaziyarPanahi/truthy-dpo-v0.1-axolotl
16
  model_name: calme-2.1-qwen2.5-72b
17
+ license_name: tongyi-qianwen
18
+ license_link: https://huggingface.co/Qwen/Qwen2.5-72B-Instruct/blob/main/LICENSE
19
  pipeline_tag: text-generation
20
  inference: false
21
  model_creator: MaziyarPanahi
22
+ model-index:
23
+ - name: calme-2.1-qwen2.5-72b
24
+ results:
25
+ - task:
26
+ type: text-generation
27
+ name: Text Generation
28
+ dataset:
29
+ name: IFEval (0-Shot)
30
+ type: HuggingFaceH4/ifeval
31
+ args:
32
+ num_few_shot: 0
33
+ metrics:
34
+ - type: inst_level_strict_acc and prompt_level_strict_acc
35
+ value: 86.62
36
+ name: strict accuracy
37
+ source:
38
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=MaziyarPanahi/calme-2.1-qwen2.5-72b
39
+ name: Open LLM Leaderboard
40
+ - task:
41
+ type: text-generation
42
+ name: Text Generation
43
+ dataset:
44
+ name: BBH (3-Shot)
45
+ type: BBH
46
+ args:
47
+ num_few_shot: 3
48
+ metrics:
49
+ - type: acc_norm
50
+ value: 61.66
51
+ name: normalized accuracy
52
+ source:
53
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=MaziyarPanahi/calme-2.1-qwen2.5-72b
54
+ name: Open LLM Leaderboard
55
+ - task:
56
+ type: text-generation
57
+ name: Text Generation
58
+ dataset:
59
+ name: MATH Lvl 5 (4-Shot)
60
+ type: hendrycks/competition_math
61
+ args:
62
+ num_few_shot: 4
63
+ metrics:
64
+ - type: exact_match
65
+ value: 2.27
66
+ name: exact match
67
+ source:
68
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=MaziyarPanahi/calme-2.1-qwen2.5-72b
69
+ name: Open LLM Leaderboard
70
+ - task:
71
+ type: text-generation
72
+ name: Text Generation
73
+ dataset:
74
+ name: GPQA (0-shot)
75
+ type: Idavidrein/gpqa
76
+ args:
77
+ num_few_shot: 0
78
+ metrics:
79
+ - type: acc_norm
80
+ value: 15.1
81
+ name: acc_norm
82
+ source:
83
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=MaziyarPanahi/calme-2.1-qwen2.5-72b
84
+ name: Open LLM Leaderboard
85
+ - task:
86
+ type: text-generation
87
+ name: Text Generation
88
+ dataset:
89
+ name: MuSR (0-shot)
90
+ type: TAUR-Lab/MuSR
91
+ args:
92
+ num_few_shot: 0
93
+ metrics:
94
+ - type: acc_norm
95
+ value: 13.3
96
+ name: acc_norm
97
+ source:
98
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=MaziyarPanahi/calme-2.1-qwen2.5-72b
99
+ name: Open LLM Leaderboard
100
+ - task:
101
+ type: text-generation
102
+ name: Text Generation
103
+ dataset:
104
+ name: MMLU-PRO (5-shot)
105
+ type: TIGER-Lab/MMLU-Pro
106
+ config: main
107
+ split: test
108
+ args:
109
+ num_few_shot: 5
110
+ metrics:
111
+ - type: acc
112
+ value: 51.32
113
+ name: accuracy
114
+ source:
115
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=MaziyarPanahi/calme-2.1-qwen2.5-72b
116
+ name: Open LLM Leaderboard
117
  ---
118
 
119
  <img src="./calme-2.webp" alt="Calme-2 Models" width="800" style="margin-left:'auto' margin-right:'auto' display:'block'"/>
 
186
  # Ethical Considerations
187
 
188
  As with any large language model, users should be aware of potential biases and limitations. We recommend implementing appropriate safeguards and human oversight when deploying this model in production environments.
189
+
190
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
191
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_MaziyarPanahi__calme-2.1-qwen2.5-72b)
192
+
193
+ | Metric |Value|
194
+ |-------------------|----:|
195
+ |Avg. |38.38|
196
+ |IFEval (0-Shot) |86.62|
197
+ |BBH (3-Shot) |61.66|
198
+ |MATH Lvl 5 (4-Shot)| 2.27|
199
+ |GPQA (0-shot) |15.10|
200
+ |MuSR (0-shot) |13.30|
201
+ |MMLU-PRO (5-shot) |51.32|
202
+