AINovice2005
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -21,17 +21,11 @@ tags:
|
|
21 |
|
22 |
ElEmperador is an ORPO-based finetinue derived from the Mistral-7B-v0.1 base model.
|
23 |
|
24 |
-
The argilla/ultrafeedback-binarized-preferences-cleaned dataset was used, albeit a small portion was used due to GPU constraints.
|
25 |
|
26 |
# Evals:
|
27 |
BLEU:0.209
|
28 |
|
29 |
-
# Conclusion
|
30 |
-
|
31 |
-
ORPO is a viable RLHF algorithm to improve the performance of your models than SFT finetuning. It also helps in aligning the model’s outputs more closely with human preferences,
|
32 |
-
leading to more user-friendly and acceptable results.
|
33 |
-
|
34 |
-
|
35 |
## Inference Script:
|
36 |
|
37 |
```python
|
@@ -62,4 +56,9 @@ if __name__ == "__main__":
|
|
62 |
|
63 |
print(f"Input: {input_text}")
|
64 |
print(f"Output: {output}")
|
65 |
-
```
|
|
|
|
|
|
|
|
|
|
|
|
21 |
|
22 |
ElEmperador is an ORPO-based finetinue derived from the Mistral-7B-v0.1 base model.
|
23 |
|
24 |
+
The argilla/ultrafeedback-binarized-preferences-cleaned dataset was used for training, albeit a small portion was used due to GPU constraints.
|
25 |
|
26 |
# Evals:
|
27 |
BLEU:0.209
|
28 |
|
|
|
|
|
|
|
|
|
|
|
|
|
29 |
## Inference Script:
|
30 |
|
31 |
```python
|
|
|
56 |
|
57 |
print(f"Input: {input_text}")
|
58 |
print(f"Output: {output}")
|
59 |
+
```
|
60 |
+
|
61 |
+
# Results
|
62 |
+
|
63 |
+
ORPO is a viable RLHF algorithm to improve the performance of your models along with SFT finetuning. It also helps in aligning the model’s outputs more closely with human preferences,
|
64 |
+
leading to more user-friendly and acceptable results.
|