Qwen1.5-0.5B-Chat with EPFL DPO fine-tuning

Qwen1.5-0.5B-Chat DPO fine-tuned on the Orca Math dataset that consists of ~200K grade school math word problems and open-ended and multiple choice questions from different EPFL courses.

Model Details

Model Description

The model was developed during the course Modern Natural Language Processing (CS-552). Its aim is to fine-tune the base model (Qwen/Qwen1.5-0.5B-Chat) to accurately answer open-ended and multiple-choice questions from Orca Math dataset and various EPFL courses.

  • Developed by: Emma Lise Boehly, Ahmed Aziz Ben Haj Hmida and Jan Kokla
  • Finetuned from model: Qwen/Qwen1.5-0.5B-Chat

Training Details

Training Data

HuggingFace dataset : microsoft/orca-math-word-problems-200k The EPFL dataset is not publicly available.

Training Procedure

Training Hyperparameters

  • Training regime: The model is trained on EPFL dataset with cDPO with bf16 mixed precision, $\beta=0.2$, $lr=3 \times 10^{-6}$, and $label_smoothing=0.2$. It is then trained on Orca dataset but without label_smoothing and thus original DPO.

  • PEFT 0.10.0

Downloads last month
0
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for emmabhl/Qwen1.5-0.5B-Chat-EPFL-ORCA-DPO

Adapter
(212)
this model