FrankC0st1e commited on
Commit
d7cca6a
·
verified ·
1 Parent(s): 65aff61

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +295 -292
README.md CHANGED
@@ -1,293 +1,296 @@
1
- ---
2
- license: apache-2.0
3
- ---
4
- <div align="center">
5
- <img src="https://github.com/OpenBMB/MiniCPM/tree/main/assets/minicpm_logo.png" width="500em" ></img>
6
- </div>
7
-
8
- <p align="center">
9
- <a href="https://github.com/OpenBMB/MiniCPM/" target="_blank">MiniCPM Repo</a> |
10
- <a href="https://arxiv.org/abs/2404.06395" target="_blank">MiniCPM Paper</a> |
11
- <a href="https://github.com/OpenBMB/MiniCPM-V/" target="_blank">MiniCPM-V Repo</a> |
12
- Join us in <a href="https://discord.gg/3cGQn9b3YM" target="_blank">Discord</a> and <a href="https://github.com/OpenBMB/MiniCPM/blob/main/assets/wechat.jpg" target="_blank">WeChat</a>
13
-
14
- </p>
15
-
16
- ## Introduction
17
- MiniCPM3-4B is the 3rd generation of MiniCPM series. The overall performance of MiniCPM3-4B surpasses Phi-3.5-mini-Instruct and GPT-3.5-Turbo-0125, being comparable with many recent 7B~9B models.
18
-
19
- Compared to MiniCPM1.0/MiniCPM2.0, MiniCPM3-4B has a more powerful and versatile skill set to enable more general usage. MiniCPM3-4B supports function call, along with code interpreter. Please refer to [Advanced Features](https://github.com/zh-zheng/minicpm?tab=readme-ov-file#%E8%BF%9B%E9%98%B6%E5%8A%9F%E8%83%BD) for usage guidelines.
20
-
21
- MiniCPM3-4B has a 32k context window. Equipped with LLMxMapReduce, MiniCPM3-4B can handle infinite context theoretically, without requiring huge amount of memory.
22
-
23
- ## Usage
24
- ### Inference with Transformers
25
- ```python
26
- from transformers import AutoModelForCausalLM, AutoTokenizer
27
- import torch
28
-
29
- path = "openbmb/MiniCPM3-4B"
30
- device = "cuda"
31
-
32
- tokenizer = AutoTokenizer.from_pretrained(path, trust_remote_code=True)
33
- model = AutoModelForCausalLM.from_pretrained(path, torch_dtype=torch.bfloat16, device_map=device, trust_remote_code=True)
34
-
35
- messages = [
36
- {"role": "user", "content": "推荐5个北京的景点。"},
37
- ]
38
- model_inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to(device)
39
-
40
- model_outputs = model.generate(
41
- model_inputs,
42
- max_new_tokens=1024,
43
- top_p=0.7,
44
- temperature=0.7,
45
- repetition_penalty=1.02
46
- )
47
-
48
- output_token_ids = [
49
- model_outputs[i][len(model_inputs[i]):] for i in range(len(model_inputs))
50
- ]
51
-
52
- responses = tokenizer.batch_decode(output_token_ids, skip_special_tokens=True)[0]
53
- print(responses)
54
- ```
55
-
56
- ### Inference with [vLLM](https://github.com/vllm-project/vllm)
57
- ```python
58
- from transformers import AutoTokenizer
59
- from vllm import LLM, SamplingParams
60
-
61
- model_name = "openbmb/MiniCPM3-4B"
62
- prompt = [{"role": "user", "content": "推荐5个北京的景点。"}]
63
-
64
- tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
65
- input_text = tokenizer.apply_chat_template(prompt, tokenize=False, add_generation_prompt=True)
66
-
67
- llm = LLM(
68
- model=model_name,
69
- trust_remote_code=True,
70
- tensor_parallel_size=1
71
- )
72
- sampling_params = SamplingParams(top_p=0.7, temperature=0.7, max_tokens=1024, repetition_penalty=1.02)
73
-
74
- outputs = llm.generate(prompts=input_text, sampling_params=sampling_params)
75
-
76
- print(outputs[0].outputs[0].text)
77
- ```
78
-
79
- ## Evaluation Results
80
-
81
- <table>
82
- <tr>
83
- <td>Benchmark</td>
84
- <td>Qwen2-7B-Instruct</td>
85
- <td>GLM-4-9B-Chat</td>
86
- <td>Gemma2-9B-it</td>
87
- <td>Llama3.1-8B-Instruct</td>
88
- <td>GPT-3.5-Turbo-0125</td>
89
- <td>Phi-3.5-mini-Instruct(3.8B)</td>
90
- <td>MiniCPM3-4B </td>
91
- </tr>
92
- <tr>
93
- <td colspan="15" align="left"><strong>English</strong></td>
94
- </tr>
95
- <tr>
96
- <td>MMLU</td>
97
- <td>70.5</td>
98
- <td>72.4</td>
99
- <td>72.6</td>
100
- <td>69.4</td>
101
- <td>69.2</td>
102
- <td>68.4</td>
103
- <td>67.2 </td>
104
- </tr>
105
- <tr>
106
- <td>BBH</td>
107
- <td>64.9</td>
108
- <td>76.3</td>
109
- <td>65.2</td>
110
- <td>67.8</td>
111
- <td>70.3</td>
112
- <td>68.6</td>
113
- <td>70.2 </td>
114
- </tr>
115
- <tr>
116
- <td>MT-Bench</td>
117
- <td>8.41</td>
118
- <td>8.35</td>
119
- <td>7.88</td>
120
- <td>8.28</td>
121
- <td>8.17</td>
122
- <td>8.60</td>
123
- <td>8.41 </td>
124
- </tr>
125
- <tr>
126
- <td>IFEVAL (Prompt Strict-Acc.)</td>
127
- <td>51.0</td>
128
- <td>64.5</td>
129
- <td>71.9</td>
130
- <td>71.5</td>
131
- <td>58.8</td>
132
- <td>49.4</td>
133
- <td>68.4 </td>
134
- </tr>
135
- <tr>
136
- <td colspan="15" align="left"><strong>Chinese</strong></td>
137
- </tr>
138
- <tr>
139
- <td>CMMLU</td>
140
- <td>80.9</td>
141
- <td>71.5</td>
142
- <td>59.5</td>
143
- <td>55.8</td>
144
- <td>54.5</td>
145
- <td>46.9</td>
146
- <td>73.3 </td>
147
- </tr>
148
- <tr>
149
- <td>CEVAL</td>
150
- <td>77.2</td>
151
- <td>75.6</td>
152
- <td>56.7</td>
153
- <td>55.2</td>
154
- <td>52.8</td>
155
- <td>46.1</td>
156
- <td>73.6 </td>
157
- </tr>
158
- <tr>
159
- <td>AlignBench v1.1</td>
160
- <td>7.10</td>
161
- <td>6.61</td>
162
- <td>7.10</td>
163
- <td>5.68</td>
164
- <td>5.82</td>
165
- <td>5.73</td>
166
- <td>6.74 </td>
167
- </tr>
168
- <tr>
169
- <td>FollowBench-zh (SSR)</td>
170
- <td>63.0</td>
171
- <td>56.4</td>
172
- <td>57.0</td>
173
- <td>50.6</td>
174
- <td>64.6</td>
175
- <td>58.1</td>
176
- <td>66.8 </td>
177
- </tr>
178
- <tr>
179
- <td colspan="15" align="left"><strong>Math</strong></td>
180
- </tr>
181
- <tr>
182
- <td>MATH</td>
183
- <td>49.6</td>
184
- <td>50.6</td>
185
- <td>46.0</td>
186
- <td>51.9</td>
187
- <td>41.8</td>
188
- <td>46.4</td>
189
- <td>46.6 </td>
190
- </tr>
191
- <tr>
192
- <td>GSM8K</td>
193
- <td>82.3</td>
194
- <td>79.6</td>
195
- <td>79.7</td>
196
- <td>84.5</td>
197
- <td>76.4</td>
198
- <td>82.7</td>
199
- <td>81.1 </td>
200
- </tr>
201
- <tr>
202
- <td>MathBench</td>
203
- <td>63.4</td>
204
- <td>59.4</td>
205
- <td>45.8</td>
206
- <td>54.3</td>
207
- <td>48.9</td>
208
- <td>54.9</td>
209
- <td>65.6 </td>
210
- </tr>
211
- <tr>
212
- <td colspan="15" align="left"><strong>Code</strong></td>
213
- </tr>
214
- <tr>
215
- <td>HumanEval+</td>
216
- <td>70.1</td>
217
- <td>67.1</td>
218
- <td>61.6</td>
219
- <td>62.8</td>
220
- <td>66.5</td>
221
- <td>68.9</td>
222
- <td>68.3 </td>
223
- </tr>
224
- <tr>
225
- <td>MBPP+</td>
226
- <td>57.1</td>
227
- <td>62.2</td>
228
- <td>64.3</td>
229
- <td>55.3</td>
230
- <td>71.4</td>
231
- <td>55.8</td>
232
- <td>63.2 </td>
233
- </tr>
234
- <tr>
235
- <td>LiveCodeBench</td>
236
- <td>22.2</td>
237
- <td>20.2</td>
238
- <td>19.2</td>
239
- <td>20.4</td>
240
- <td>24.0</td>
241
- <td>19.6</td>
242
- <td>22.6 </td>
243
- </tr>
244
- <tr>
245
- <td colspan="15" align="left"><strong>Function Call</strong></td>
246
- </tr>
247
- <tr>
248
- <td>BFCL</td>
249
- <td>71.6</td>
250
- <td>70.1</td>
251
- <td>19.2</td>
252
- <td>73.3</td>
253
- <td>75.4</td>
254
- <td>48.4</td>
255
- <td>76.0 </td>
256
- </tr>
257
- <tr>
258
- <td colspan="15" align="left"><strong>Overall</strong></td>
259
- </tr>
260
- <tr>
261
- <td>Average</td>
262
- <td>65.3</td>
263
- <td>65.0</td>
264
- <td>57.9</td>
265
- <td>60.8</td>
266
- <td>61.0</td>
267
- <td>57.2</td>
268
- <td><strong>66.3</strong></td>
269
- </tr>
270
- </table>
271
-
272
-
273
- ## Statement
274
- * As a language model, MiniCPM3-4B generates content by learning from a vast amount of text.
275
- * However, it does not possess the ability to comprehend or express personal opinions or value judgments.
276
- * Any content generated by MiniCPM3-4B does not represent the viewpoints or positions of the model developers.
277
- * Therefore, when using content generated by MiniCPM3-4B, users should take full responsibility for evaluating and verifying it on their own.
278
-
279
- ## LICENSE
280
- * This repository is released under the [Apache-2.0](https://github.com/OpenBMB/MiniCPM/blob/main/LICENSE) License.
281
- * The usage of MiniCPM3-4B model weights must strictly follow [MiniCPM Model License.md](https://github.com/OpenBMB/MiniCPM/blob/main/MiniCPM%20Model%20License.md).
282
- * The models and weights of MiniCPM3-4B are completely free for academic research. after filling out a ["questionnaire"](https://modelbest.feishu.cn/share/base/form/shrcnpV5ZT9EJ6xYjh3Kx0J6v8g) for registration, are also available for free commercial use.
283
-
284
- ## Citation
285
-
286
- ```
287
- @article{hu2024minicpm,
288
- title={MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies},
289
- author={Hu, Shengding and Tu, Yuge and Han, Xu and He, Chaoqun and Cui, Ganqu and Long, Xiang and Zheng, Zhi and Fang, Yewei and Huang, Yuxiang and Zhao, Weilin and others},
290
- journal={arXiv preprint arXiv:2404.06395},
291
- year={2024}
292
- }
 
 
 
293
  ```
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - zh
5
+ - en
6
+ ---
7
+ <div align="center">
8
+ <img src="https://github.com/OpenBMB/MiniCPM/tree/main/assets/minicpm_logo.png" width="500em" ></img>
9
+ </div>
10
+
11
+ <p align="center">
12
+ <a href="https://github.com/OpenBMB/MiniCPM/" target="_blank">MiniCPM Repo</a> |
13
+ <a href="https://arxiv.org/abs/2404.06395" target="_blank">MiniCPM Paper</a> |
14
+ <a href="https://github.com/OpenBMB/MiniCPM-V/" target="_blank">MiniCPM-V Repo</a> |
15
+ Join us in <a href="https://discord.gg/3cGQn9b3YM" target="_blank">Discord</a> and <a href="https://github.com/OpenBMB/MiniCPM/blob/main/assets/wechat.jpg" target="_blank">WeChat</a>
16
+
17
+ </p>
18
+
19
+ ## Introduction
20
+ MiniCPM3-4B is the 3rd generation of MiniCPM series. The overall performance of MiniCPM3-4B surpasses Phi-3.5-mini-Instruct and GPT-3.5-Turbo-0125, being comparable with many recent 7B~9B models.
21
+
22
+ Compared to MiniCPM1.0/MiniCPM2.0, MiniCPM3-4B has a more powerful and versatile skill set to enable more general usage. MiniCPM3-4B supports function call, along with code interpreter. Please refer to [Advanced Features](https://github.com/zh-zheng/minicpm?tab=readme-ov-file#%E8%BF%9B%E9%98%B6%E5%8A%9F%E8%83%BD) for usage guidelines.
23
+
24
+ MiniCPM3-4B has a 32k context window. Equipped with LLMxMapReduce, MiniCPM3-4B can handle infinite context theoretically, without requiring huge amount of memory.
25
+
26
+ ## Usage
27
+ ### Inference with Transformers
28
+ ```python
29
+ from transformers import AutoModelForCausalLM, AutoTokenizer
30
+ import torch
31
+
32
+ path = "openbmb/MiniCPM3-4B"
33
+ device = "cuda"
34
+
35
+ tokenizer = AutoTokenizer.from_pretrained(path, trust_remote_code=True)
36
+ model = AutoModelForCausalLM.from_pretrained(path, torch_dtype=torch.bfloat16, device_map=device, trust_remote_code=True)
37
+
38
+ messages = [
39
+ {"role": "user", "content": "推荐5个北京的景点。"},
40
+ ]
41
+ model_inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to(device)
42
+
43
+ model_outputs = model.generate(
44
+ model_inputs,
45
+ max_new_tokens=1024,
46
+ top_p=0.7,
47
+ temperature=0.7,
48
+ repetition_penalty=1.02
49
+ )
50
+
51
+ output_token_ids = [
52
+ model_outputs[i][len(model_inputs[i]):] for i in range(len(model_inputs))
53
+ ]
54
+
55
+ responses = tokenizer.batch_decode(output_token_ids, skip_special_tokens=True)[0]
56
+ print(responses)
57
+ ```
58
+
59
+ ### Inference with [vLLM](https://github.com/vllm-project/vllm)
60
+ ```python
61
+ from transformers import AutoTokenizer
62
+ from vllm import LLM, SamplingParams
63
+
64
+ model_name = "openbmb/MiniCPM3-4B"
65
+ prompt = [{"role": "user", "content": "推荐5个北京的景点。"}]
66
+
67
+ tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
68
+ input_text = tokenizer.apply_chat_template(prompt, tokenize=False, add_generation_prompt=True)
69
+
70
+ llm = LLM(
71
+ model=model_name,
72
+ trust_remote_code=True,
73
+ tensor_parallel_size=1
74
+ )
75
+ sampling_params = SamplingParams(top_p=0.7, temperature=0.7, max_tokens=1024, repetition_penalty=1.02)
76
+
77
+ outputs = llm.generate(prompts=input_text, sampling_params=sampling_params)
78
+
79
+ print(outputs[0].outputs[0].text)
80
+ ```
81
+
82
+ ## Evaluation Results
83
+
84
+ <table>
85
+ <tr>
86
+ <td>Benchmark</td>
87
+ <td>Qwen2-7B-Instruct</td>
88
+ <td>GLM-4-9B-Chat</td>
89
+ <td>Gemma2-9B-it</td>
90
+ <td>Llama3.1-8B-Instruct</td>
91
+ <td>GPT-3.5-Turbo-0125</td>
92
+ <td>Phi-3.5-mini-Instruct(3.8B)</td>
93
+ <td>MiniCPM3-4B </td>
94
+ </tr>
95
+ <tr>
96
+ <td colspan="15" align="left"><strong>English</strong></td>
97
+ </tr>
98
+ <tr>
99
+ <td>MMLU</td>
100
+ <td>70.5</td>
101
+ <td>72.4</td>
102
+ <td>72.6</td>
103
+ <td>69.4</td>
104
+ <td>69.2</td>
105
+ <td>68.4</td>
106
+ <td>67.2 </td>
107
+ </tr>
108
+ <tr>
109
+ <td>BBH</td>
110
+ <td>64.9</td>
111
+ <td>76.3</td>
112
+ <td>65.2</td>
113
+ <td>67.8</td>
114
+ <td>70.3</td>
115
+ <td>68.6</td>
116
+ <td>70.2 </td>
117
+ </tr>
118
+ <tr>
119
+ <td>MT-Bench</td>
120
+ <td>8.41</td>
121
+ <td>8.35</td>
122
+ <td>7.88</td>
123
+ <td>8.28</td>
124
+ <td>8.17</td>
125
+ <td>8.60</td>
126
+ <td>8.41 </td>
127
+ </tr>
128
+ <tr>
129
+ <td>IFEVAL (Prompt Strict-Acc.)</td>
130
+ <td>51.0</td>
131
+ <td>64.5</td>
132
+ <td>71.9</td>
133
+ <td>71.5</td>
134
+ <td>58.8</td>
135
+ <td>49.4</td>
136
+ <td>68.4 </td>
137
+ </tr>
138
+ <tr>
139
+ <td colspan="15" align="left"><strong>Chinese</strong></td>
140
+ </tr>
141
+ <tr>
142
+ <td>CMMLU</td>
143
+ <td>80.9</td>
144
+ <td>71.5</td>
145
+ <td>59.5</td>
146
+ <td>55.8</td>
147
+ <td>54.5</td>
148
+ <td>46.9</td>
149
+ <td>73.3 </td>
150
+ </tr>
151
+ <tr>
152
+ <td>CEVAL</td>
153
+ <td>77.2</td>
154
+ <td>75.6</td>
155
+ <td>56.7</td>
156
+ <td>55.2</td>
157
+ <td>52.8</td>
158
+ <td>46.1</td>
159
+ <td>73.6 </td>
160
+ </tr>
161
+ <tr>
162
+ <td>AlignBench v1.1</td>
163
+ <td>7.10</td>
164
+ <td>6.61</td>
165
+ <td>7.10</td>
166
+ <td>5.68</td>
167
+ <td>5.82</td>
168
+ <td>5.73</td>
169
+ <td>6.74 </td>
170
+ </tr>
171
+ <tr>
172
+ <td>FollowBench-zh (SSR)</td>
173
+ <td>63.0</td>
174
+ <td>56.4</td>
175
+ <td>57.0</td>
176
+ <td>50.6</td>
177
+ <td>64.6</td>
178
+ <td>58.1</td>
179
+ <td>66.8 </td>
180
+ </tr>
181
+ <tr>
182
+ <td colspan="15" align="left"><strong>Math</strong></td>
183
+ </tr>
184
+ <tr>
185
+ <td>MATH</td>
186
+ <td>49.6</td>
187
+ <td>50.6</td>
188
+ <td>46.0</td>
189
+ <td>51.9</td>
190
+ <td>41.8</td>
191
+ <td>46.4</td>
192
+ <td>46.6 </td>
193
+ </tr>
194
+ <tr>
195
+ <td>GSM8K</td>
196
+ <td>82.3</td>
197
+ <td>79.6</td>
198
+ <td>79.7</td>
199
+ <td>84.5</td>
200
+ <td>76.4</td>
201
+ <td>82.7</td>
202
+ <td>81.1 </td>
203
+ </tr>
204
+ <tr>
205
+ <td>MathBench</td>
206
+ <td>63.4</td>
207
+ <td>59.4</td>
208
+ <td>45.8</td>
209
+ <td>54.3</td>
210
+ <td>48.9</td>
211
+ <td>54.9</td>
212
+ <td>65.6 </td>
213
+ </tr>
214
+ <tr>
215
+ <td colspan="15" align="left"><strong>Code</strong></td>
216
+ </tr>
217
+ <tr>
218
+ <td>HumanEval+</td>
219
+ <td>70.1</td>
220
+ <td>67.1</td>
221
+ <td>61.6</td>
222
+ <td>62.8</td>
223
+ <td>66.5</td>
224
+ <td>68.9</td>
225
+ <td>68.3 </td>
226
+ </tr>
227
+ <tr>
228
+ <td>MBPP+</td>
229
+ <td>57.1</td>
230
+ <td>62.2</td>
231
+ <td>64.3</td>
232
+ <td>55.3</td>
233
+ <td>71.4</td>
234
+ <td>55.8</td>
235
+ <td>63.2 </td>
236
+ </tr>
237
+ <tr>
238
+ <td>LiveCodeBench</td>
239
+ <td>22.2</td>
240
+ <td>20.2</td>
241
+ <td>19.2</td>
242
+ <td>20.4</td>
243
+ <td>24.0</td>
244
+ <td>19.6</td>
245
+ <td>22.6 </td>
246
+ </tr>
247
+ <tr>
248
+ <td colspan="15" align="left"><strong>Function Call</strong></td>
249
+ </tr>
250
+ <tr>
251
+ <td>BFCL</td>
252
+ <td>71.6</td>
253
+ <td>70.1</td>
254
+ <td>19.2</td>
255
+ <td>73.3</td>
256
+ <td>75.4</td>
257
+ <td>48.4</td>
258
+ <td>76.0 </td>
259
+ </tr>
260
+ <tr>
261
+ <td colspan="15" align="left"><strong>Overall</strong></td>
262
+ </tr>
263
+ <tr>
264
+ <td>Average</td>
265
+ <td>65.3</td>
266
+ <td>65.0</td>
267
+ <td>57.9</td>
268
+ <td>60.8</td>
269
+ <td>61.0</td>
270
+ <td>57.2</td>
271
+ <td><strong>66.3</strong></td>
272
+ </tr>
273
+ </table>
274
+
275
+
276
+ ## Statement
277
+ * As a language model, MiniCPM3-4B generates content by learning from a vast amount of text.
278
+ * However, it does not possess the ability to comprehend or express personal opinions or value judgments.
279
+ * Any content generated by MiniCPM3-4B does not represent the viewpoints or positions of the model developers.
280
+ * Therefore, when using content generated by MiniCPM3-4B, users should take full responsibility for evaluating and verifying it on their own.
281
+
282
+ ## LICENSE
283
+ * This repository is released under the [Apache-2.0](https://github.com/OpenBMB/MiniCPM/blob/main/LICENSE) License.
284
+ * The usage of MiniCPM3-4B model weights must strictly follow [MiniCPM Model License.md](https://github.com/OpenBMB/MiniCPM/blob/main/MiniCPM%20Model%20License.md).
285
+ * The models and weights of MiniCPM3-4B are completely free for academic research. after filling out a ["questionnaire"](https://modelbest.feishu.cn/share/base/form/shrcnpV5ZT9EJ6xYjh3Kx0J6v8g) for registration, are also available for free commercial use.
286
+
287
+ ## Citation
288
+
289
+ ```
290
+ @article{hu2024minicpm,
291
+ title={MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies},
292
+ author={Hu, Shengding and Tu, Yuge and Han, Xu and He, Chaoqun and Cui, Ganqu and Long, Xiang and Zheng, Zhi and Fang, Yewei and Huang, Yuxiang and Zhao, Weilin and others},
293
+ journal={arXiv preprint arXiv:2404.06395},
294
+ year={2024}
295
+ }
296
  ```