datasets:
- jondurbin/gutenberg-dpo-v0.1
- Qwen/Qwen2.5-14B-Instruct
- HuggingFaceH4/ultrafeedback_binarized
base_model:
- Qwen/Qwen2.5-14B-Instruct
- v000000/Qwen2.5-14B-Gutenberg-1e-Delta
- tanliboy/lambda-qwen2.5-14b-dpo-test
library_name: transformers
tags:
- qwen
- qwen2.5
- finetune
- dpo
- orpo
- qwen2
- chat
- conversational
- instruct
- storywriting
- roleplay
license: apache-2.0
language:
- en
pipeline_tag: text-generation
Qwen2.5-Lumen-14B
- Qwen direct preference optimization finetuned for ~3 epochs.
A qwen2.5 preference finetune, targeting prompt adherence, storywriting and roleplay.
Training Notes
Trained Qwen2.5-14B-Instruct for 2 epochs on NVidia A100, and on dataset jondurbin/gutenberg-dpo-v0.1, saving different checkpoints along the way (completely different runs at varying epochs and learning rates).
Tanliboy trained Qwen2.5-14B-Instruct for 1 epoch on HuggingFaceH4/ultrafeedback_binarized, (Credit to Tanliboy! Check out the model here)
Mass checkpoint merged, Based on Qwen2.5-14B-Instruct (Base Model).
Merge
Merged with a sophosympatheia's SLERP gradient "Ultrafeedback-Binarized DPO" and "Gutenberg DPO"
Merged with a sophosympatheia's SLERP gradient "Qwen2.5-14B-Instruct" and "Gutenberg DPO"
Merged all DPO checkpoints and SLERP variations with MODEL_STOCK to analyze geometric properties and get the most performant aspects of all runs/merges. Model Stock was chosen due to the similarity between the merged models.
This was chosen due to the fact that evaluation for ORPO is unclear, so it's hard to know which runs are the best.
Recipe
models:
- model: v000000/Qwen2.5-14B-Gutenberg-1e-Delta
- model: v000000/Qwen2.5-14B-Gutenberg-0.6e-Sequential
- model: v000000/Qwen2.5-14B-Gutenberg-0.25e-Early
- model: v000000/Qwen2.5-14B-Gutenberg-2e-Sequential
- model: v000000/Qwen2.5-14B-Gutenberg-0.37e-Early
- model: v000000/Qwen2.5-14B-Gutenberg-2e-Zeta
- model: v000000/Qwen2.5-14B-Gutenberg-1e-Theta
- model: tanliboy/lambda-qwen2.5-14b-dpo-test
- model: v000000/Qwen2.5-14B-Gutenberg-1e-Delta
- model: tanliboy/lambda-qwen2.5-14b-dpo-test
- model: v000000/Qwen2.5-14B-Gutenberg-UltraLambda-Slerpeno
- model: v000000/Qwen2.5-14B-Gutenberg-Instruct-Slerpeno
base_model: v000000/Qwen2.5-14B-Gutenberg-1e-Delta
merge_method: model_stock
dtype: bfloat16
Finetune and merge
This is a merge and finetune of pre-trained language models.
Models Merged
The following models were included in the merge:
- v000000/Qwen2.5-14B-Gutenberg-1e-Delta
- v000000/Qwen2.5-14B-Gutenberg-0.6e-Sequential
- v000000/Qwen2.5-14B-Gutenberg-0.25e-Early
- v000000/Qwen2.5-14B-Gutenberg-2e-Sequential
- v000000/Qwen2.5-14B-Gutenberg-0.37e-Early
- v000000/Qwen2.5-14B-Gutenberg-2e-Zeta
- v000000/Qwen2.5-14B-Gutenberg-1e-Theta
- v000000/Qwen2.5-14B-Gutenberg-UltraLambda-Slerpeno
- v000000/Qwen2.5-14B-Gutenberg-Instruct-Slerpeno
- tanliboy/lambda-qwen2.5-14b-dpo-test
- Context Length: Full 131,072 tokens and generation 8192 tokens
- Qwen2(ChatML) Prompt format