metadata

datasets:
  - jondurbin/gutenberg-dpo-v0.1
  - Qwen/Qwen2.5-14B-Instruct
  - HuggingFaceH4/ultrafeedback_binarized
base_model:
  - Qwen/Qwen2.5-14B-Instruct
  - v000000/Qwen2.5-14B-Gutenberg-1e-Delta
  - tanliboy/lambda-qwen2.5-14b-dpo-test
library_name: transformers
tags:
  - qwen
  - qwen2.5
  - finetune
  - dpo
  - orpo
  - qwen2
  - chat
  - conversational
  - instruct
  - storywriting
  - roleplay
license: apache-2.0
language:
  - en
pipeline_tag: text-generation

Qwen2.5-Lumen-14B

Qwen direct preference optimization finetuned for ~3 epochs.

A qwen2.5 preference finetune, targeting prompt adherence, storywriting and roleplay.

Training Notes

Trained Qwen2.5-14B-Instruct for 2 epochs on NVidia A100, and on dataset jondurbin/gutenberg-dpo-v0.1, saving different checkpoints along the way (completely different runs at varying epochs and learning rates).

Tanliboy trained Qwen2.5-14B-Instruct for 1 epoch on HuggingFaceH4/ultrafeedback_binarized, (Credit to Tanliboy! Check out the model here)

Mass checkpoint merged, Based on Qwen2.5-14B-Instruct (Base Model).

Merge

Merged with a sophosympatheia's SLERP gradient "Ultrafeedback-Binarized DPO" and "Gutenberg DPO"
Merged with a sophosympatheia's SLERP gradient "Qwen2.5-14B-Instruct" and "Gutenberg DPO"
Merged all DPO checkpoints and SLERP variations with MODEL_STOCK to analyze geometric properties and get the most performant aspects of all runs/merges. Model Stock was chosen due to the similarity between the merged models.
This was chosen due to the fact that evaluation for ORPO is unclear, so it's hard to know which runs are the best.

Recipe

models:
  - model: v000000/Qwen2.5-14B-Gutenberg-1e-Delta
  - model: v000000/Qwen2.5-14B-Gutenberg-0.6e-Sequential
  - model: v000000/Qwen2.5-14B-Gutenberg-0.25e-Early
  - model: v000000/Qwen2.5-14B-Gutenberg-2e-Sequential
  - model: v000000/Qwen2.5-14B-Gutenberg-0.37e-Early
  - model: v000000/Qwen2.5-14B-Gutenberg-2e-Zeta
  - model: v000000/Qwen2.5-14B-Gutenberg-1e-Theta
  - model: tanliboy/lambda-qwen2.5-14b-dpo-test
  - model: v000000/Qwen2.5-14B-Gutenberg-1e-Delta
  - model: tanliboy/lambda-qwen2.5-14b-dpo-test
  - model: v000000/Qwen2.5-14B-Gutenberg-UltraLambda-Slerpeno
  - model: v000000/Qwen2.5-14B-Gutenberg-Instruct-Slerpeno
base_model: v000000/Qwen2.5-14B-Gutenberg-1e-Delta
merge_method: model_stock
dtype: bfloat16

Finetune and merge

This is a merge and finetune of pre-trained language models.

Models Merged

Arxiv 2403.19522

The following models were included in the merge:

v000000/Qwen2.5-14B-Gutenberg-1e-Delta
v000000/Qwen2.5-14B-Gutenberg-0.6e-Sequential
v000000/Qwen2.5-14B-Gutenberg-0.25e-Early
v000000/Qwen2.5-14B-Gutenberg-2e-Sequential
v000000/Qwen2.5-14B-Gutenberg-0.37e-Early
v000000/Qwen2.5-14B-Gutenberg-2e-Zeta
v000000/Qwen2.5-14B-Gutenberg-1e-Theta
v000000/Qwen2.5-14B-Gutenberg-UltraLambda-Slerpeno
v000000/Qwen2.5-14B-Gutenberg-Instruct-Slerpeno
tanliboy/lambda-qwen2.5-14b-dpo-test

Context Length: Full 131,072 tokens and generation 8192 tokens
Qwen2(ChatML) Prompt format