AINovice2005
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -12,7 +12,7 @@ tags:
|
|
12 |
|
13 |
---
|
14 |
|
15 |
-
<h1 style="font-size: 2em;">
|
16 |
|
17 |
|
18 |
![image/png](https://cdn-uploads.huggingface.co/production/uploads/64e8ea3892d9db9a93580fe3/gkDcpIxRCjBlmknN_jzWN.png)
|
@@ -29,11 +29,11 @@ The argilla/ultrafeedback-binarized-preferences-cleaned dataset was used, albeit
|
|
29 |
|
30 |
|
31 |
# Evals:
|
32 |
-
BLEU:0.
|
33 |
|
34 |
# Conclusion and Model Recipe.
|
35 |
-
ORPO is a viable RLHF algorithm to improve the performance of your models than SFT finetuning. It also helps in aligning the model’s outputs more closely with human preferences,
|
36 |
|
|
|
37 |
leading to more user-friendly and acceptable results.
|
38 |
|
39 |
The model recipe: [ https://github.com/ParagEkbote/El-Emperador_ModelRecipe]
|
|
|
12 |
|
13 |
---
|
14 |
|
15 |
+
<h1 style="font-size: 2em;">ElEmperador.</h1>
|
16 |
|
17 |
|
18 |
![image/png](https://cdn-uploads.huggingface.co/production/uploads/64e8ea3892d9db9a93580fe3/gkDcpIxRCjBlmknN_jzWN.png)
|
|
|
29 |
|
30 |
|
31 |
# Evals:
|
32 |
+
BLEU:0.209
|
33 |
|
34 |
# Conclusion and Model Recipe.
|
|
|
35 |
|
36 |
+
ORPO is a viable RLHF algorithm to improve the performance of your models than SFT finetuning. It also helps in aligning the model’s outputs more closely with human preferences,
|
37 |
leading to more user-friendly and acceptable results.
|
38 |
|
39 |
The model recipe: [ https://github.com/ParagEkbote/El-Emperador_ModelRecipe]
|