|
--- |
|
license: apache-2.0 |
|
library_name: transformers |
|
inference: false |
|
base_model: AIDC-AI/Marco-o1 |
|
tags: |
|
- llama-cpp |
|
- gguf-my-repo |
|
--- |
|
|
|
# Triangle104/Marco-o1-Q6_K-GGUF |
|
This model was converted to GGUF format from [`AIDC-AI/Marco-o1`](https://huggingface.co/AIDC-AI/Marco-o1) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space. |
|
Refer to the [original model card](https://huggingface.co/AIDC-AI/Marco-o1) for more details on the model. |
|
|
|
--- |
|
Model details: |
|
- |
|
Marco-o1 not only focuses on disciplines with |
|
standard answers, such as mathematics, physics, and coding—which are |
|
well-suited for reinforcement learning (RL)—but also places greater |
|
emphasis on open-ended resolutions. We aim to address the question: "Can |
|
the o1 model effectively generalize to broader domains where clear |
|
standards are absent and rewards are challenging to quantify?" |
|
|
|
Currently, Marco-o1 Large Language Model (LLM) is powered by Chain-of-Thought (CoT) fine-tuning, Monte Carlo Tree Search (MCTS), reflection mechanisms, and _innovative reasoning strategies_—optimized for complex real-world problem-solving tasks. |
|
|
|
|
|
⚠️ Limitations: We would like to emphasize that |
|
this research work is inspired by OpenAI's o1 (from which the name is |
|
also derived). This work aims to explore potential approaches to shed |
|
light on the currently unclear technical roadmap for large reasoning |
|
models. Besides, our focus is on open-ended questions, and we have |
|
observed interesting phenomena in multilingual applications. However, we |
|
must acknowledge that the current model primarily exhibits o1-like |
|
reasoning characteristics and its performance still fall short of a |
|
fully realized "o1" model. This is not a one-time effort, and we remain |
|
committed to continuous optimization and ongoing improvement. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
🚀 Highlights |
|
|
|
|
|
|
|
|
|
Currently, our work is distinguished by the following highlights: |
|
|
|
|
|
🍀 Fine-Tuning with CoT Data: We develop Marco-o1-CoT by performing |
|
full-parameter fine-tuning on the base model using open-source CoT |
|
dataset combined with our self-developed synthetic data. |
|
🍀 Solution Space Expansion via MCTS: We integrate LLMs with MCTS |
|
(Marco-o1-MCTS), using the model's output confidence to guide the search |
|
and expand the solution space. |
|
🍀 Reasoning Action Strategy: We implement novel reasoning action |
|
strategies and a reflection mechanism (Marco-o1-MCTS Mini-Step), |
|
including exploring different action granularities within the MCTS |
|
framework and prompting the model to self-reflect, thereby significantly |
|
enhancing the model's ability to solve complex problems. |
|
🍀 Application in Translation Tasks: We are the first to apply Large |
|
Reasoning Models (LRM) to Machine Translation task, exploring inference |
|
time scaling laws in the multilingual and translation domain. |
|
|
|
|
|
OpenAI recently introduced the groundbreaking o1 model, renowned for |
|
its exceptional reasoning capabilities. This model has demonstrated |
|
outstanding performance on platforms such as AIME, CodeForces, |
|
surpassing other leading models. Inspired by this success, we aimed to |
|
push the boundaries of LLMs even further, enhancing their reasoning |
|
abilities to tackle complex, real-world challenges. |
|
|
|
|
|
🌍 Marco-o1 leverages advanced techniques like CoT fine-tuning, MCTS, |
|
and Reasoning Action Strategies to enhance its reasoning power. As |
|
shown in Figure 2, by fine-tuning Qwen2-7B-Instruct with a combination |
|
of the filtered Open-O1 CoT dataset, Marco-o1 CoT dataset, and Marco-o1 |
|
Instruction dataset, Marco-o1 improved its handling of complex tasks. |
|
MCTS allows exploration of multiple reasoning paths using confidence |
|
scores derived from softmax-applied log probabilities of the top-k |
|
alternative tokens, guiding the model to optimal solutions. Moreover, |
|
our reasoning action strategy involves varying the granularity of |
|
actions within steps and mini-steps to optimize search efficiency and |
|
accuracy. |
|
|
|
|
|
|
|
|
|
|
|
Figure 2: The overview of Marco-o1. |
|
|
|
|
|
|
|
|
|
|
|
🌏 As shown in Figure 3, Marco-o1 achieved accuracy improvements of |
|
+6.17% on the MGSM (English) dataset and +5.60% on the MGSM (Chinese) |
|
dataset, showcasing enhanced reasoning capabilities. |
|
|
|
|
|
|
|
|
|
|
|
Figure 3: The main results of Marco-o1. |
|
|
|
|
|
|
|
|
|
|
|
🌎 Additionally, in translation tasks, we demonstrate that Marco-o1 |
|
excels in translating slang expressions, such as translating "这个鞋拥有踩屎感" |
|
(literal translation: "This shoe offers a stepping-on-poop sensation.") |
|
to "This shoe has a comfortable sole," demonstrating its superior grasp |
|
of colloquial nuances. |
|
|
|
|
|
|
|
|
|
|
|
Figure 4: The demostration of translation task using Marco-o1. |
|
|
|
|
|
|
|
|
|
|
|
For more information,please visit our Github. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Usage |
|
|
|
|
|
|
|
|
|
Load Marco-o1-CoT model: |
|
|
|
|
|
# Load model directly |
|
from transformers import AutoTokenizer, AutoModelForCausalLM |
|
|
|
tokenizer = AutoTokenizer.from_pretrained("AIDC-AI/Marco-o1") |
|
model = AutoModelForCausalLM.from_pretrained("AIDC-AI/Marco-o1") |
|
|
|
|
|
|
|
|
|
|
|
|
|
Inference: |
|
|
|
|
|
Execute the inference script (you can give any customized inputs inside): |
|
|
|
|
|
./src/talk_with_model.py |
|
|
|
# Use vLLM |
|
./src/talk_with_model_vllm.py |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
👨🏻💻 Acknowledgement |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Main Contributors |
|
|
|
|
|
|
|
|
|
From MarcoPolo Team, AI Business, Alibaba International Digital Commerce: |
|
|
|
|
|
Yu Zhao |
|
Huifeng Yin |
|
Hao Wang |
|
Longyue Wang |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Citation |
|
|
|
|
|
|
|
|
|
If you find Marco-o1 useful for your research and applications, please cite: |
|
|
|
|
|
@misc{zhao2024marcoo1openreasoningmodels, |
|
title={Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions}, |
|
author={Yu Zhao and Huifeng Yin and Bo Zeng and Hao Wang and Tianqi Shi and Chenyang Lyu and Longyue Wang and Weihua Luo and Kaifu Zhang}, |
|
year={2024}, |
|
eprint={2411.14405}, |
|
archivePrefix={arXiv}, |
|
primaryClass={cs.CL}, |
|
url={https://arxiv.org/abs/2411.14405}, |
|
} |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
LICENSE |
|
|
|
|
|
|
|
|
|
This project is licensed under Apache License Version 2 (SPDX-License-identifier: Apache-2.0). |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
DISCLAIMER |
|
|
|
|
|
|
|
|
|
We used compliance checking algorithms during the training process, |
|
to ensure the compliance of the trained model and dataset to the best of |
|
our ability. Due to complex data and the diversity of language model |
|
usage scenarios, we cannot guarantee that the model is completely free |
|
of copyright issues or improper content. If you believe anything |
|
infringes on your rights or generates improper content, please contact |
|
us, and we will promptly address the matter. |
|
|
|
--- |
|
## Use with llama.cpp |
|
Install llama.cpp through brew (works on Mac and Linux) |
|
|
|
```bash |
|
brew install llama.cpp |
|
|
|
``` |
|
Invoke the llama.cpp server or the CLI. |
|
|
|
### CLI: |
|
```bash |
|
llama-cli --hf-repo Triangle104/Marco-o1-Q6_K-GGUF --hf-file marco-o1-q6_k.gguf -p "The meaning to life and the universe is" |
|
``` |
|
|
|
### Server: |
|
```bash |
|
llama-server --hf-repo Triangle104/Marco-o1-Q6_K-GGUF --hf-file marco-o1-q6_k.gguf -c 2048 |
|
``` |
|
|
|
Note: You can also use this checkpoint directly through the [usage steps](https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#usage) listed in the Llama.cpp repo as well. |
|
|
|
Step 1: Clone llama.cpp from GitHub. |
|
``` |
|
git clone https://github.com/ggerganov/llama.cpp |
|
``` |
|
|
|
Step 2: Move into the llama.cpp folder and build it with `LLAMA_CURL=1` flag along with other hardware-specific flags (for ex: LLAMA_CUDA=1 for Nvidia GPUs on Linux). |
|
``` |
|
cd llama.cpp && LLAMA_CURL=1 make |
|
``` |
|
|
|
Step 3: Run inference through the main binary. |
|
``` |
|
./llama-cli --hf-repo Triangle104/Marco-o1-Q6_K-GGUF --hf-file marco-o1-q6_k.gguf -p "The meaning to life and the universe is" |
|
``` |
|
or |
|
``` |
|
./llama-server --hf-repo Triangle104/Marco-o1-Q6_K-GGUF --hf-file marco-o1-q6_k.gguf -c 2048 |
|
``` |
|
|