AINovice2005
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -23,20 +23,14 @@ ElEmperador is an ORPO-based finetinue derived from the Mistral-7B-v0.1 base mod
|
|
23 |
|
24 |
The argilla/ultrafeedback-binarized-preferences-cleaned dataset was used, albeit a small portion was used due to GPU constraints.
|
25 |
|
26 |
-
## Citation
|
27 |
-
|
28 |
-
[Dettmers, T., Pagnoni, A., Holtzman, A., & Zettlemoyer, L. (2023, May 23). ] https://arxiv.org/abs/2305.14314.
|
29 |
-
|
30 |
-
|
31 |
# Evals:
|
32 |
BLEU:0.209
|
33 |
|
34 |
-
# Conclusion
|
35 |
|
36 |
ORPO is a viable RLHF algorithm to improve the performance of your models than SFT finetuning. It also helps in aligning the model’s outputs more closely with human preferences,
|
37 |
leading to more user-friendly and acceptable results.
|
38 |
|
39 |
-
The model recipe: [https://github.com/ParagEkbote/El-Emperador_ModelRecipe]
|
40 |
|
41 |
## Inference Script:
|
42 |
|
|
|
23 |
|
24 |
The argilla/ultrafeedback-binarized-preferences-cleaned dataset was used, albeit a small portion was used due to GPU constraints.
|
25 |
|
|
|
|
|
|
|
|
|
|
|
26 |
# Evals:
|
27 |
BLEU:0.209
|
28 |
|
29 |
+
# Conclusion
|
30 |
|
31 |
ORPO is a viable RLHF algorithm to improve the performance of your models than SFT finetuning. It also helps in aligning the model’s outputs more closely with human preferences,
|
32 |
leading to more user-friendly and acceptable results.
|
33 |
|
|
|
34 |
|
35 |
## Inference Script:
|
36 |
|