|
--- |
|
license: apache-2.0 |
|
language: |
|
- en |
|
pipeline_tag: text-generation |
|
base_model: |
|
- allenai/OLMo-2-1124-13B-SFT |
|
library_name: transformers |
|
datasets: |
|
- allenai/olmo-2-1124-13b-preference-mix |
|
--- |
|
|
|
<img src="https://allenai.org/olmo/olmo-7b-animation.gif" alt="OLMo Logo" width="800" style="margin-left:'auto' margin-right:'auto' display:'block'"/> |
|
|
|
# OLMo-2-1124-13B-DPO |
|
|
|
OLMo-2 13B DPO November 2024 is finetuned variant of the [OLMo-2 13B November 2024](https://huggingface.co/allenai/OLMo2-13B-1124) model, which has undergone supervised finetuning on the [Tülu 3 dataset](https://huggingface.co/datasets/allenai/tulu-3-sft-mixture) and further DPO training on [this dataset](https://huggingface.co/datasets/allenai/olmo-2-1124-13b-preference-mix). |
|
Tülu 3 is designed for state-of-the-art performance on a diversity of tasks in addition to chat, such as MATH, GSM8K, and IFEval. |
|
Check out [the OLMo-2 paper](https://TODO) or [Tülu 3 paper](https://arxiv.org/abs/2411.15124) for more details! |
|
|
|
OLMo is a series of **O**pen **L**anguage **Mo**dels designed to enable the science of language models. |
|
These models are trained on the Dolma dataset. We are releasing all code, checkpoints, logs (coming soon), and associated training details. |
|
The core models released in this batch include the following: |
|
|
|
|
|
| **Stage** | **OLMo-2 7B** | **OLMo-2 7B** | |
|
|----------------------|----------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------| |
|
| **Base Model** | [allenai/OLMo2-7B-1124](https://huggingface.co/allenai/OLMo2-7B-1124) | [allenai/OLMo-2-13B-1124](https://huggingface.co/allenai/OLMo-2-13B-1124) | |
|
| **SFT** | [allenai/OLMo-2-1124-7B-SFT](https://huggingface.co/allenai/OLMo-2-1124-7B-SFT) | [allenai/OLMo-2-1124-13B-SFT](https://huggingface.co/allenai/OLMo-2-1124-13B-SFT) | |
|
| **DPO** | [allenai/OLMo-2-1124-7B-DPO](https://huggingface.co/allenai/OLMo-2-1124-7B-DPO) | [allenai/OLMo-2-1124-13B-DPO](https://huggingface.co/allenai/OLMo-2-1124-13B-DPO) | |
|
| **Final Models (RLVR)** | [allenai/OLMo-2-1124-7B-Instruct](https://huggingface.co/allenai/OLMo-2-1124-7B-Instruct) | [allenai/OLMo-2-1124-13B-Instruct](https://huggingface.co/allenai/OLMo-2-1124-13B-Instruct) | |
|
| **Reward Model (RM)**| [allenai/OLMo-2-1124-7B-RM](https://huggingface.co/allenai/OLMo-2-1124-7B-RM) | (Same as 8B) | |
|
|
|
|
|
|
|
## Model description |
|
|
|
- **Model type:** A model trained on a mix of publicly available, synthetic and human-created datasets. |
|
- **Language(s) (NLP):** Primarily English |
|
- **License:** Apache 2.0 |
|
- **Finetuned from model:** allenai/OLMo-2-13B-1124-SFT |
|
|
|
### Model Sources |
|
|
|
- **Project Page:** https://allenai.org/olmo |
|
- **Repositories:** |
|
- Core repo (training, inference, fine-tuning etc.): https://github.com/allenai/OLMo |
|
- Evaluation code: https://github.com/allenai/olmes |
|
- Further fine-tuning code: https://github.com/allenai/open-instruct |
|
- **Paper:** Coming soon! TODO |
|
- **Demo:** https://playground.allenai.org/ |
|
|
|
## Using the model |
|
|
|
### Loading with HuggingFace |
|
|
|
To load the model with HuggingFace, use the following snippet: |
|
``` |
|
from transformers import AutoModelForCausalLM |
|
|
|
olmo_model = AutoModelForCausalLM.from_pretrained("allenai/OLMo-2-1124-13B-DPO") |
|
``` |
|
|
|
### Chat template |
|
|
|
The chat template for our models is formatted as: |
|
``` |
|
<|endoftext|><|user|>\nHow are you doing?\n<|assistant|>\nI'm just a computer program, so I don't have feelings, but I'm functioning as expected. How can I assist you today?<|endoftext|> |
|
``` |
|
Or with new lines expanded: |
|
``` |
|
<|endoftext|><|user|> |
|
How are you doing? |
|
<|assistant|> |
|
I'm just a computer program, so I don't have feelings, but I'm functioning as expected. How can I assist you today?<|endoftext|> |
|
``` |
|
It is embedded within the tokenizer as well, for `tokenizer.apply_chat_template`. |
|
|
|
### System prompt |
|
|
|
In Ai2 demos, we use this system prompt by default: |
|
``` |
|
You are OLMo 2, a helpful and harmless AI Assistant built by the Allen Institute for AI. |
|
``` |
|
The model has not been trained with a specific system prompt in mind. |
|
|
|
### Bias, Risks, and Limitations |
|
|
|
The OLMo-2 models have limited safety training, but are not deployed automatically with in-the-loop filtering of responses like ChatGPT, so the model can produce problematic outputs (especially when prompted to do so). |
|
See the Falcon 180B model card for an example of this. |
|
|
|
|
|
## Performance |
|
|
|
TODO |
|
|
|
## Hyperparameters |
|
|
|
Note we use a length-normalized variant of DPO for training. |
|
|
|
DPO: |
|
- **Learning Rate**: 8E-7 (7B, 13B) |
|
- **Beta**: 5 |
|
- **Effective Batch Size:** 128 (7B, 13B) |
|
- **Max. Sequence Length:** 2048 |
|
- **Learning Rate Schedule:** Linear |
|
- **LR Warmup Ratio:** 0.1 |
|
- **Num. Epochs:** 1 |
|
|
|
## License and use |
|
|
|
OLMo-2 is licensed under the Apache 2.0 license. |
|
OLMo-2 is intended for research and educational use. |
|
For more information, please see our [Responsible Use Guidelines](https://allenai.org/responsible-use). |
|
|
|
## Citation |
|
|
|
If OLMo-2 or any of the related materials were helpful to your work, please cite: |
|
``` |
|
TODO |
|
``` |