|
--- |
|
license: apache-2.0 |
|
datasets: |
|
- argilla/ultrafeedback-binarized-preferences-cleaned |
|
language: |
|
- en |
|
base_model: |
|
- mistralai/Mistral-7B-v0.1 |
|
library_name: transformers |
|
tags: |
|
- transformers |
|
- ORPO |
|
- RLHF |
|
- notus |
|
- argilla |
|
--- |
|
|
|
# Model Overview |
|
|
|
# ππ¨πππ₯ πππ¦π:ElEmperador |
|
|
|
![image/png](https://cdn-uploads.huggingface.co/production/uploads/64e8ea3892d9db9a93580fe3/gkDcpIxRCjBlmknN_jzWN.png) |
|
|
|
|
|
## Model Description: |
|
|
|
ElEmperador is an ORPO-based finetune derived from the Mistral-7B-v0.1 base model. |
|
|
|
|
|
## Evals: |
|
BLEU:0.209 |
|
|
|
|
|
|
|
## Results |
|
|
|
Firstly,ORPO is a viable RLHF algorithm to improve the performance of your models along with SFT finetuning.Secondly, it also helps in aligning the modelβs outputs more closely with human preferences, |
|
leading to more user-friendly and acceptable results. |