Triangle104 commited on
Commit
25ddab7
·
verified ·
1 Parent(s): f198a1e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +245 -0
README.md CHANGED
@@ -12,6 +12,251 @@ tags:
12
  This model was converted to GGUF format from [`AIDC-AI/Marco-o1`](https://huggingface.co/AIDC-AI/Marco-o1) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
13
  Refer to the [original model card](https://huggingface.co/AIDC-AI/Marco-o1) for more details on the model.
14
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
15
  ## Use with llama.cpp
16
  Install llama.cpp through brew (works on Mac and Linux)
17
 
 
12
  This model was converted to GGUF format from [`AIDC-AI/Marco-o1`](https://huggingface.co/AIDC-AI/Marco-o1) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
13
  Refer to the [original model card](https://huggingface.co/AIDC-AI/Marco-o1) for more details on the model.
14
 
15
+ ---
16
+ Model details:
17
+ -
18
+ Marco-o1 not only focuses on disciplines with
19
+ standard answers, such as mathematics, physics, and coding—which are
20
+ well-suited for reinforcement learning (RL)—but also places greater
21
+ emphasis on open-ended resolutions. We aim to address the question: "Can
22
+ the o1 model effectively generalize to broader domains where clear
23
+ standards are absent and rewards are challenging to quantify?"
24
+
25
+ Currently, Marco-o1 Large Language Model (LLM) is powered by Chain-of-Thought (CoT) fine-tuning, Monte Carlo Tree Search (MCTS), reflection mechanisms, and _innovative reasoning strategies_—optimized for complex real-world problem-solving tasks.
26
+
27
+
28
+ ⚠️ Limitations: We would like to emphasize that
29
+ this research work is inspired by OpenAI's o1 (from which the name is
30
+ also derived). This work aims to explore potential approaches to shed
31
+ light on the currently unclear technical roadmap for large reasoning
32
+ models. Besides, our focus is on open-ended questions, and we have
33
+ observed interesting phenomena in multilingual applications. However, we
34
+ must acknowledge that the current model primarily exhibits o1-like
35
+ reasoning characteristics and its performance still fall short of a
36
+ fully realized "o1" model. This is not a one-time effort, and we remain
37
+ committed to continuous optimization and ongoing improvement.
38
+
39
+
40
+
41
+
42
+
43
+
44
+
45
+
46
+
47
+
48
+ 🚀 Highlights
49
+
50
+
51
+
52
+
53
+ Currently, our work is distinguished by the following highlights:
54
+
55
+
56
+ 🍀 Fine-Tuning with CoT Data: We develop Marco-o1-CoT by performing
57
+ full-parameter fine-tuning on the base model using open-source CoT
58
+ dataset combined with our self-developed synthetic data.
59
+ 🍀 Solution Space Expansion via MCTS: We integrate LLMs with MCTS
60
+ (Marco-o1-MCTS), using the model's output confidence to guide the search
61
+ and expand the solution space.
62
+ 🍀 Reasoning Action Strategy: We implement novel reasoning action
63
+ strategies and a reflection mechanism (Marco-o1-MCTS Mini-Step),
64
+ including exploring different action granularities within the MCTS
65
+ framework and prompting the model to self-reflect, thereby significantly
66
+ enhancing the model's ability to solve complex problems.
67
+ 🍀 Application in Translation Tasks: We are the first to apply Large
68
+ Reasoning Models (LRM) to Machine Translation task, exploring inference
69
+ time scaling laws in the multilingual and translation domain.
70
+
71
+
72
+ OpenAI recently introduced the groundbreaking o1 model, renowned for
73
+ its exceptional reasoning capabilities. This model has demonstrated
74
+ outstanding performance on platforms such as AIME, CodeForces,
75
+ surpassing other leading models. Inspired by this success, we aimed to
76
+ push the boundaries of LLMs even further, enhancing their reasoning
77
+ abilities to tackle complex, real-world challenges.
78
+
79
+
80
+ 🌍 Marco-o1 leverages advanced techniques like CoT fine-tuning, MCTS,
81
+ and Reasoning Action Strategies to enhance its reasoning power. As
82
+ shown in Figure 2, by fine-tuning Qwen2-7B-Instruct with a combination
83
+ of the filtered Open-O1 CoT dataset, Marco-o1 CoT dataset, and Marco-o1
84
+ Instruction dataset, Marco-o1 improved its handling of complex tasks.
85
+ MCTS allows exploration of multiple reasoning paths using confidence
86
+ scores derived from softmax-applied log probabilities of the top-k
87
+ alternative tokens, guiding the model to optimal solutions. Moreover,
88
+ our reasoning action strategy involves varying the granularity of
89
+ actions within steps and mini-steps to optimize search efficiency and
90
+ accuracy.
91
+
92
+
93
+
94
+
95
+
96
+ Figure 2: The overview of Marco-o1.
97
+
98
+
99
+
100
+
101
+
102
+ 🌏 As shown in Figure 3, Marco-o1 achieved accuracy improvements of
103
+ +6.17% on the MGSM (English) dataset and +5.60% on the MGSM (Chinese)
104
+ dataset, showcasing enhanced reasoning capabilities.
105
+
106
+
107
+
108
+
109
+
110
+ Figure 3: The main results of Marco-o1.
111
+
112
+
113
+
114
+
115
+
116
+ 🌎 Additionally, in translation tasks, we demonstrate that Marco-o1
117
+ excels in translating slang expressions, such as translating "这个鞋拥有踩屎感"
118
+ (literal translation: "This shoe offers a stepping-on-poop sensation.")
119
+ to "This shoe has a comfortable sole," demonstrating its superior grasp
120
+ of colloquial nuances.
121
+
122
+
123
+
124
+
125
+
126
+ Figure 4: The demostration of translation task using Marco-o1.
127
+
128
+
129
+
130
+
131
+
132
+ For more information,please visit our Github.
133
+
134
+
135
+
136
+
137
+
138
+
139
+
140
+ Usage
141
+
142
+
143
+
144
+
145
+ Load Marco-o1-CoT model:
146
+
147
+
148
+ # Load model directly
149
+ from transformers import AutoTokenizer, AutoModelForCausalLM
150
+
151
+ tokenizer = AutoTokenizer.from_pretrained("AIDC-AI/Marco-o1")
152
+ model = AutoModelForCausalLM.from_pretrained("AIDC-AI/Marco-o1")
153
+
154
+
155
+
156
+
157
+
158
+
159
+ Inference:
160
+
161
+
162
+ Execute the inference script (you can give any customized inputs inside):
163
+
164
+
165
+ ./src/talk_with_model.py
166
+
167
+ # Use vLLM
168
+ ./src/talk_with_model_vllm.py
169
+
170
+
171
+
172
+
173
+
174
+
175
+
176
+
177
+
178
+
179
+ 👨🏻‍💻 Acknowledgement
180
+
181
+
182
+
183
+
184
+
185
+
186
+
187
+
188
+
189
+ Main Contributors
190
+
191
+
192
+
193
+
194
+ From MarcoPolo Team, AI Business, Alibaba International Digital Commerce:
195
+
196
+
197
+ Yu Zhao
198
+ Huifeng Yin
199
+ Hao Wang
200
+ Longyue Wang
201
+
202
+
203
+
204
+
205
+
206
+
207
+
208
+ Citation
209
+
210
+
211
+
212
+
213
+ If you find Marco-o1 useful for your research and applications, please cite:
214
+
215
+
216
+ @misc{zhao2024marcoo1openreasoningmodels,
217
+ title={Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions},
218
+ author={Yu Zhao and Huifeng Yin and Bo Zeng and Hao Wang and Tianqi Shi and Chenyang Lyu and Longyue Wang and Weihua Luo and Kaifu Zhang},
219
+ year={2024},
220
+ eprint={2411.14405},
221
+ archivePrefix={arXiv},
222
+ primaryClass={cs.CL},
223
+ url={https://arxiv.org/abs/2411.14405},
224
+ }
225
+
226
+
227
+
228
+
229
+
230
+
231
+
232
+
233
+ LICENSE
234
+
235
+
236
+
237
+
238
+ This project is licensed under Apache License Version 2 (SPDX-License-identifier: Apache-2.0).
239
+
240
+
241
+
242
+
243
+
244
+
245
+
246
+ DISCLAIMER
247
+
248
+
249
+
250
+
251
+ We used compliance checking algorithms during the training process,
252
+ to ensure the compliance of the trained model and dataset to the best of
253
+ our ability. Due to complex data and the diversity of language model
254
+ usage scenarios, we cannot guarantee that the model is completely free
255
+ of copyright issues or improper content. If you believe anything
256
+ infringes on your rights or generates improper content, please contact
257
+ us, and we will promptly address the matter.
258
+
259
+ ---
260
  ## Use with llama.cpp
261
  Install llama.cpp through brew (works on Mac and Linux)
262