AINovice2005
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -21,16 +21,22 @@ tags:
|
|
21 |
|
22 |
ElEmperador is an ORPO-based finetinue derived from the Mistral-7B-v0.1 base model.
|
23 |
|
24 |
-
The argilla/ultrafeedback-binarized-preferences-cleaned dataset was used
|
25 |
|
26 |
## Citation
|
27 |
|
28 |
[Dettmers, T., Pagnoni, A., Holtzman, A., & Zettlemoyer, L. (2023, May 23). ] https://arxiv.org/abs/2305.14314.
|
29 |
|
30 |
|
31 |
-
|
|
|
32 |
|
33 |
-
|
|
|
|
|
|
|
|
|
|
|
34 |
|
35 |
## Inference Script:
|
36 |
|
|
|
21 |
|
22 |
ElEmperador is an ORPO-based finetinue derived from the Mistral-7B-v0.1 base model.
|
23 |
|
24 |
+
The argilla/ultrafeedback-binarized-preferences-cleaned dataset was used, albeit a small portion was used due to GPU constraints.
|
25 |
|
26 |
## Citation
|
27 |
|
28 |
[Dettmers, T., Pagnoni, A., Holtzman, A., & Zettlemoyer, L. (2023, May 23). ] https://arxiv.org/abs/2305.14314.
|
29 |
|
30 |
|
31 |
+
# Evals:
|
32 |
+
BLEU:0.0209
|
33 |
|
34 |
+
# Conclusion and Model Recipe.
|
35 |
+
ORPO is a viable RLHF algorithm to improve the performance of your models than SFT finetuning. It also helps in aligning the model’s outputs more closely with human preferences,
|
36 |
+
|
37 |
+
leading to more user-friendly and acceptable results.
|
38 |
+
|
39 |
+
The model recipe: [ https://github.com/ParagEkbote/El-Emperador_ModelRecipe]
|
40 |
|
41 |
## Inference Script:
|
42 |
|