AINovice2005 commited on
Commit
bf10ea8
·
verified ·
1 Parent(s): 024f82b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -8
README.md CHANGED
@@ -21,17 +21,11 @@ tags:
21
 
22
  ElEmperador is an ORPO-based finetinue derived from the Mistral-7B-v0.1 base model.
23
 
24
- The argilla/ultrafeedback-binarized-preferences-cleaned dataset was used, albeit a small portion was used due to GPU constraints.
25
 
26
  # Evals:
27
  BLEU:0.209
28
 
29
- # Conclusion
30
-
31
- ORPO is a viable RLHF algorithm to improve the performance of your models than SFT finetuning. It also helps in aligning the model’s outputs more closely with human preferences,
32
- leading to more user-friendly and acceptable results.
33
-
34
-
35
  ## Inference Script:
36
 
37
  ```python
@@ -62,4 +56,9 @@ if __name__ == "__main__":
62
 
63
  print(f"Input: {input_text}")
64
  print(f"Output: {output}")
65
- ```
 
 
 
 
 
 
21
 
22
  ElEmperador is an ORPO-based finetinue derived from the Mistral-7B-v0.1 base model.
23
 
24
+ The argilla/ultrafeedback-binarized-preferences-cleaned dataset was used for training, albeit a small portion was used due to GPU constraints.
25
 
26
  # Evals:
27
  BLEU:0.209
28
 
 
 
 
 
 
 
29
  ## Inference Script:
30
 
31
  ```python
 
56
 
57
  print(f"Input: {input_text}")
58
  print(f"Output: {output}")
59
+ ```
60
+
61
+ # Results
62
+
63
+ ORPO is a viable RLHF algorithm to improve the performance of your models along with SFT finetuning. It also helps in aligning the model’s outputs more closely with human preferences,
64
+ leading to more user-friendly and acceptable results.