metadata
license: apache-2.0
datasets:
- argilla/ultrafeedback-binarized-preferences-cleaned
language:
- en
base_model:
- mistralai/Mistral-7B-v0.1
library_name: transformers
tags:
- transformers
- ORPO
- RLHF
- notus
- argilla
Model Overview
ππ¨πππ₯ πππ¦π:ElEmperador
Model Description:
ElEmperador is an ORPO-based finetune derived from the Mistral-7B-v0.1 base model.
Evals:
BLEU:0.209
Results
Firstly,ORPO is a viable RLHF algorithm to improve the performance of your models along with SFT finetuning.Secondly, it also helps in aligning the modelβs outputs more closely with human preferences, leading to more user-friendly and acceptable results.