AINovice2005
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -23,7 +23,6 @@ tags:
|
|
23 |
|
24 |
ElEmperador is an ORPO-based finetune derived from the Mistral-7B-v0.1 base model.
|
25 |
|
26 |
-
The 'ultrafeedback-binarized-preferences-cleaned' dataset was used for training, albeit a small portion was used due to GPU constraints.
|
27 |
|
28 |
## Evals:
|
29 |
BLEU:0.209
|
@@ -62,5 +61,6 @@ if __name__ == "__main__":
|
|
62 |
|
63 |
## Results
|
64 |
|
65 |
-
ORPO is a viable RLHF algorithm to improve the performance of your models along with SFT finetuning.
|
66 |
leading to more user-friendly and acceptable results.
|
|
|
|
23 |
|
24 |
ElEmperador is an ORPO-based finetune derived from the Mistral-7B-v0.1 base model.
|
25 |
|
|
|
26 |
|
27 |
## Evals:
|
28 |
BLEU:0.209
|
|
|
61 |
|
62 |
## Results
|
63 |
|
64 |
+
Firstly,ORPO is a viable RLHF algorithm to improve the performance of your models along with SFT finetuning.Secondly, it also helps in aligning the model’s outputs more closely with human preferences,
|
65 |
leading to more user-friendly and acceptable results.
|
66 |
+
|