Update README.md
Browse files
README.md
CHANGED
@@ -1,59 +1,61 @@
|
|
1 |
---
|
|
|
|
|
|
|
|
|
2 |
language:
|
3 |
- en
|
4 |
-
license: apache-2.0
|
5 |
-
library_name: transformers
|
6 |
---
|
7 |
-
# **ORPO**
|
8 |
|
9 |
-
|
10 |
-
|
11 |
-
|
12 |
-
|
13 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
14 |
|
15 |
-
|
16 |
|
17 |
-
|
18 |
|
19 |
-
|
20 |
|
21 |
-
|
22 |
-
|
23 |
|
24 |
-
|
|
|
|
|
|
|
25 |
|
26 |
-
- [X] **Mistral-ORPO-⍺**: <a class="link" href="https://wandb.ai/jiwooya1000/PREF/reports/Mistral-ORPO-7B-Training-Log--Vmlldzo3MTE1NzE0?accessToken=rms6o4mg5vo3feu1bvbpk632m4cspe19l0u1p4he3othx5bgean82chn9neiile6">Wandb Report for Mistral-ORPO-⍺</a>
|
27 |
-
- [X] **Mistral-ORPO-β**: <a class="link" href="https://wandb.ai/jiwooya1000/PREF/reports/Mistral-ORPO-7B-Training-Log--Vmlldzo3MTE3MzMy?accessToken=dij4qbp6dcrofsanzbgobjsne9el8a2zkly2u5z82rxisd4wiwv1rhp0s2dub11e">Wandb Report for Mistral-ORPO-β</a>
|
28 |
|
29 |
-
|
30 |
|
31 |
-
|
|
|
|
|
32 |
|
33 |
-
|
34 |
-
<img class="png" src="/assets/img/alpaca_blog.png" alt="Description of the image">
|
35 |
-
<figcaption><b>Figure 1.</b> AlpacaEval 2.0 score for the models trained with different alignment methods.</figcaption>
|
36 |
-
</figure>
|
37 |
|
38 |
-
|
|
|
39 |
|
40 |
-
|
|
|
|
|
|
|
41 |
|
42 |
-
|
43 |
-
|
44 |
-
|
45 |
-
</figure>
|
46 |
|
47 |
-
|
48 |
|
49 |
-
### **`IFEval`**
|
50 |
|
51 |
-
|
52 |
|
53 |
-
|
54 |
-
|--------------------|:-----------------:|:----------------:|:---------------:|----------------|
|
55 |
-
| **Llama-2-Chat (70B)** | 0.4436 | 0.5342 | 0.5468 | 0.6319 |
|
56 |
-
| **Zephyr-β (7B)** | 0.4233 | 0.4547 | 0.5492 | 0.5767 |
|
57 |
-
| **Mixtral-8X7B-Instruct-v0.1** | 0.5213 | **0.5712** | 0.6343 | **0.6823** |
|
58 |
-
| **Mistral-ORPO-⍺ (7B)** | 0.5009 | 0.5083 | 0.5995 | 0.6163 |
|
59 |
-
| **Mistral-ORPO-β (7B)** | **0.5287** | 0.5564 | **0.6355** | 0.6619 |
|
|
|
1 |
---
|
2 |
+
library_name: transformers
|
3 |
+
license: apache-2.0
|
4 |
+
datasets:
|
5 |
+
- argilla/dpo-mix-7k
|
6 |
language:
|
7 |
- en
|
|
|
|
|
8 |
---
|
|
|
9 |
|
10 |
+
# Phi2-PRO
|
11 |
+
|
12 |
+
|
13 |
+
![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/64e380b2e12618b261fa6ba0/QEQjVaXVqAjw4eSCAMnkv.jpeg)
|
14 |
+
|
15 |
+
*phi2-pro* is a fine-tuned version of **[microsoft/phi-2](https://huggingface.co/microsoft/phi-2)** on **[argilla/dpo-mix-7k](https://huggingface.co/datasets/argilla/dpo-mix-7k)**
|
16 |
+
preference dataset using *Odds Ratio Preference Optimization (ORPO)*. The model has been trained for 1 epoch.
|
17 |
+
|
18 |
+
## LazyORPO
|
19 |
+
|
20 |
+
This model has been trained using **[LazyORPO](https://colab.research.google.com/drive/19ci5XIcJDxDVPY2xC1ftZ5z1kc2ah_rx?usp=sharing)**. A colab notebook that makes the training
|
21 |
+
process much easier. Based on [ORPO paper](https://colab.research.google.com/corgiredirector?site=https%3A%2F%2Fhuggingface.co%2Fpapers%2F2403.07691)
|
22 |
|
|
|
23 |
|
24 |
+
![image/png](https://cdn-uploads.huggingface.co/production/uploads/64e380b2e12618b261fa6ba0/2h3guPdFocisjFClFr0Kh.png)
|
25 |
|
26 |
+
#### What is ORPO?
|
27 |
|
28 |
+
Odds Ratio Preference Optimization (ORPO) proposes a new method to train LLMs by combining SFT and Alignment into a new objective (loss function), achieving state of the art results.
|
29 |
+
Some highlights of this techniques are:
|
30 |
|
31 |
+
* 🧠 Reference model-free → memory friendly
|
32 |
+
* 🔄 Replaces SFT+DPO/PPO with 1 single method (ORPO)
|
33 |
+
* 🏆 ORPO Outperforms SFT, SFT+DPO on PHI-2, Llama 2, and Mistral
|
34 |
+
* 📊 Mistral ORPO achieves 12.20% on AlpacaEval2.0, 66.19% on IFEval, and 7.32 on MT-Bench out Hugging Face Zephyr Beta
|
35 |
|
|
|
|
|
36 |
|
37 |
+
#### Usage
|
38 |
|
39 |
+
python
|
40 |
+
import torch
|
41 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer
|
42 |
|
43 |
+
torch.set_default_device("cuda")
|
|
|
|
|
|
|
44 |
|
45 |
+
model = AutoModelForCausalLM.from_pretrained("abideen/phi2-pro", torch_dtype="auto", trust_remote_code=True)
|
46 |
+
tokenizer = AutoTokenizer.from_pretrained("abideen/phi2-pro", trust_remote_code=True)
|
47 |
|
48 |
+
inputs = tokenizer('''
|
49 |
+
"""
|
50 |
+
Write a detailed analogy between mathematics and a lighthouse.
|
51 |
+
"""''', return_tensors="pt", return_attention_mask=False)
|
52 |
|
53 |
+
outputs = model.generate(**inputs, max_length=200)
|
54 |
+
text = tokenizer.batch_decode(outputs)[0]
|
55 |
+
print(text)
|
|
|
56 |
|
|
|
57 |
|
|
|
58 |
|
59 |
+
## Evaluation
|
60 |
|
61 |
+
### COMING SOON
|
|
|
|
|
|
|
|
|
|
|
|