ElEmperador / README.md
AINovice2005's picture
Update README.md
4f74973 verified
|
raw
history blame
821 Bytes
metadata
license: apache-2.0
datasets:
  - argilla/ultrafeedback-binarized-preferences-cleaned
language:
  - en
base_model:
  - mistralai/Mistral-7B-v0.1
library_name: transformers
tags:
  - transformers
  - ORPO
  - RLHF
  - notus
  - argilla

Model Overview

𝐌𝐨𝐝𝐞π₯ 𝐍𝐚𝐦𝐞:ElEmperador

image/png

Model Description:

ElEmperador is an ORPO-based finetune derived from the Mistral-7B-v0.1 base model.

Evals:

BLEU:0.209

Results

Firstly,ORPO is a viable RLHF algorithm to improve the performance of your models along with SFT finetuning.Secondly, it also helps in aligning the model’s outputs more closely with human preferences, leading to more user-friendly and acceptable results.