AINovice2005 commited on
Commit
024f82b
·
verified ·
1 Parent(s): 5a23691

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -7
README.md CHANGED
@@ -23,20 +23,14 @@ ElEmperador is an ORPO-based finetinue derived from the Mistral-7B-v0.1 base mod
23
 
24
  The argilla/ultrafeedback-binarized-preferences-cleaned dataset was used, albeit a small portion was used due to GPU constraints.
25
 
26
- ## Citation
27
-
28
- [Dettmers, T., Pagnoni, A., Holtzman, A., & Zettlemoyer, L. (2023, May 23). ] https://arxiv.org/abs/2305.14314.
29
-
30
-
31
  # Evals:
32
  BLEU:0.209
33
 
34
- # Conclusion and Model Recipe.
35
 
36
  ORPO is a viable RLHF algorithm to improve the performance of your models than SFT finetuning. It also helps in aligning the model’s outputs more closely with human preferences,
37
  leading to more user-friendly and acceptable results.
38
 
39
- The model recipe: [https://github.com/ParagEkbote/El-Emperador_ModelRecipe]
40
 
41
  ## Inference Script:
42
 
 
23
 
24
  The argilla/ultrafeedback-binarized-preferences-cleaned dataset was used, albeit a small portion was used due to GPU constraints.
25
 
 
 
 
 
 
26
  # Evals:
27
  BLEU:0.209
28
 
29
+ # Conclusion
30
 
31
  ORPO is a viable RLHF algorithm to improve the performance of your models than SFT finetuning. It also helps in aligning the model’s outputs more closely with human preferences,
32
  leading to more user-friendly and acceptable results.
33
 
 
34
 
35
  ## Inference Script:
36