abideen commited on
Commit
78b8357
·
verified ·
1 Parent(s): 970b275

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +40 -38
README.md CHANGED
@@ -1,59 +1,61 @@
1
  ---
 
 
 
 
2
  language:
3
  - en
4
- license: apache-2.0
5
- library_name: transformers
6
  ---
7
- # **ORPO**
8
 
9
- This is the official repository for <a class="link" href="https://arxiv.org/abs/2403.07691">**Reference-free Monolithic Preference Optimization with Odds Ratio**</a>. The detailed results in the paper can be found in:
10
- - [Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=kaist-ai%2Fmistral-orpo-beta)
11
- - [AlpacaEval](#alpacaeval)
12
- - [MT-Bench](#mt-bench)
13
- - [IFEval](#ifeval)
 
 
 
 
 
 
 
14
 
15
- &nbsp;
16
 
17
- ### **`Model Checkpoints`**
18
 
19
- Our models trained with ORPO can be found in:
20
 
21
- - [X] **Mistral-ORPO-⍺**: 🤗 <a class="link" href="https://huggingface.co/kaist-ai/mistral-orpo-alpha">kaist-ai/mistral-orpo-alpha</a>
22
- - [X] **Mistral-ORPO-β**: 🤗 <a class="link" href="https://huggingface.co/kaist-ai/mistral-orpo-beta">kaist-ai/mistral-orpo-beta</a>
23
 
24
- And the corresponding logs for the average log probabilities of chosen/rejected responses during training are reported in:
 
 
 
25
 
26
- - [X] **Mistral-ORPO-⍺**: <a class="link" href="https://wandb.ai/jiwooya1000/PREF/reports/Mistral-ORPO-7B-Training-Log--Vmlldzo3MTE1NzE0?accessToken=rms6o4mg5vo3feu1bvbpk632m4cspe19l0u1p4he3othx5bgean82chn9neiile6">Wandb Report for Mistral-ORPO-⍺</a>
27
- - [X] **Mistral-ORPO-β**: <a class="link" href="https://wandb.ai/jiwooya1000/PREF/reports/Mistral-ORPO-7B-Training-Log--Vmlldzo3MTE3MzMy?accessToken=dij4qbp6dcrofsanzbgobjsne9el8a2zkly2u5z82rxisd4wiwv1rhp0s2dub11e">Wandb Report for Mistral-ORPO-β</a>
28
 
29
- &nbsp;
30
 
31
- ### **`AlpacaEval`**
 
 
32
 
33
- <figure>
34
- <img class="png" src="/assets/img/alpaca_blog.png" alt="Description of the image">
35
- <figcaption><b>Figure 1.</b> AlpacaEval 2.0 score for the models trained with different alignment methods.</figcaption>
36
- </figure>
37
 
38
- &nbsp;
 
39
 
40
- ### **`MT-Bench`**
 
 
 
41
 
42
- <figure>
43
- <img class="png" src="/assets/img/mtbench_hf.png" alt="Description of the image">
44
- <figcaption><b>Figure 2.</b> MT-Bench result by category.</figcaption>
45
- </figure>
46
 
47
- &nbsp;
48
 
49
- ### **`IFEval`**
50
 
51
- IFEval scores are measured with <a class="link" href="https://github.com/EleutherAI/lm-evaluation-harness">EleutherAI/lm-evaluation-harness</a> by applying the chat template. The scores for Llama-2-Chat (70B), Zephyr-β (7B), and Mixtral-8X7B-Instruct-v0.1 are originally reported in <a class="link" href="https://twitter.com/wiskojo/status/1739767758462877823">this tweet</a>.
52
 
53
- | **Model Type** | **Prompt-Strict** | **Prompt-Loose** | **Inst-Strict** | **Inst-Loose** |
54
- |--------------------|:-----------------:|:----------------:|:---------------:|----------------|
55
- | **Llama-2-Chat (70B)** | 0.4436 | 0.5342 | 0.5468 | 0.6319 |
56
- | **Zephyr-β (7B)** | 0.4233 | 0.4547 | 0.5492 | 0.5767 |
57
- | **Mixtral-8X7B-Instruct-v0.1** | 0.5213 | **0.5712** | 0.6343 | **0.6823** |
58
- | **Mistral-ORPO-⍺ (7B)** | 0.5009 | 0.5083 | 0.5995 | 0.6163 |
59
- | **Mistral-ORPO-β (7B)** | **0.5287** | 0.5564 | **0.6355** | 0.6619 |
 
1
  ---
2
+ library_name: transformers
3
+ license: apache-2.0
4
+ datasets:
5
+ - argilla/dpo-mix-7k
6
  language:
7
  - en
 
 
8
  ---
 
9
 
10
+ # Phi2-PRO
11
+
12
+
13
+ ![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/64e380b2e12618b261fa6ba0/QEQjVaXVqAjw4eSCAMnkv.jpeg)
14
+
15
+ *phi2-pro* is a fine-tuned version of **[microsoft/phi-2](https://huggingface.co/microsoft/phi-2)** on **[argilla/dpo-mix-7k](https://huggingface.co/datasets/argilla/dpo-mix-7k)**
16
+ preference dataset using *Odds Ratio Preference Optimization (ORPO)*. The model has been trained for 1 epoch.
17
+
18
+ ## LazyORPO
19
+
20
+ This model has been trained using **[LazyORPO](https://colab.research.google.com/drive/19ci5XIcJDxDVPY2xC1ftZ5z1kc2ah_rx?usp=sharing)**. A colab notebook that makes the training
21
+ process much easier. Based on [ORPO paper](https://colab.research.google.com/corgiredirector?site=https%3A%2F%2Fhuggingface.co%2Fpapers%2F2403.07691)
22
 
 
23
 
24
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/64e380b2e12618b261fa6ba0/2h3guPdFocisjFClFr0Kh.png)
25
 
26
+ #### What is ORPO?
27
 
28
+ Odds Ratio Preference Optimization (ORPO) proposes a new method to train LLMs by combining SFT and Alignment into a new objective (loss function), achieving state of the art results.
29
+ Some highlights of this techniques are:
30
 
31
+ * 🧠 Reference model-free memory friendly
32
+ * 🔄 Replaces SFT+DPO/PPO with 1 single method (ORPO)
33
+ * 🏆 ORPO Outperforms SFT, SFT+DPO on PHI-2, Llama 2, and Mistral
34
+ * 📊 Mistral ORPO achieves 12.20% on AlpacaEval2.0, 66.19% on IFEval, and 7.32 on MT-Bench out Hugging Face Zephyr Beta
35
 
 
 
36
 
37
+ #### Usage
38
 
39
+ python
40
+ import torch
41
+ from transformers import AutoModelForCausalLM, AutoTokenizer
42
 
43
+ torch.set_default_device("cuda")
 
 
 
44
 
45
+ model = AutoModelForCausalLM.from_pretrained("abideen/phi2-pro", torch_dtype="auto", trust_remote_code=True)
46
+ tokenizer = AutoTokenizer.from_pretrained("abideen/phi2-pro", trust_remote_code=True)
47
 
48
+ inputs = tokenizer('''
49
+ """
50
+ Write a detailed analogy between mathematics and a lighthouse.
51
+ """''', return_tensors="pt", return_attention_mask=False)
52
 
53
+ outputs = model.generate(**inputs, max_length=200)
54
+ text = tokenizer.batch_decode(outputs)[0]
55
+ print(text)
 
56
 
 
57
 
 
58
 
59
+ ## Evaluation
60
 
61
+ ### COMING SOON