Apel-sin commited on
Commit
bee8222
·
1 Parent(s): 073b368

add measurement.json

Browse files
Files changed (2) hide show
  1. README.md +74 -0
  2. measurement.json +0 -0
README.md ADDED
@@ -0,0 +1,74 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: gemma
3
+ datasets:
4
+ - openbmb/UltraFeedback
5
+ language:
6
+ - en
7
+ pipeline_tag: text-generation
8
+ ---
9
+ Self-Play Preference Optimization for Language Model Alignment (https://arxiv.org/abs/2405.00675)
10
+
11
+ # Gemma-2-9B-It-SPPO-Iter3
12
+
13
+ This model was developed using [Self-Play Preference Optimization](https://arxiv.org/abs/2405.00675) at iteration 3, based on the [google/gemma-2-9b-it](https://huggingface.co/google/gemma-2-9b-it) architecture as starting point. We utilized the prompt sets from the [openbmb/UltraFeedback](https://huggingface.co/datasets/openbmb/UltraFeedback) dataset, splited to 3 parts for 3 iterations by [snorkelai/Snorkel-Mistral-PairRM-DPO-Dataset](https://huggingface.co/datasets/snorkelai/Snorkel-Mistral-PairRM-DPO-Dataset). All responses used are synthetic.
14
+
15
+ **Terms of Use**: [Terms](https://www.kaggle.com/models/google/gemma/license/consent/verify/huggingface?returnModelRepoId=google/gemma-2-9b-it)
16
+
17
+
18
+ ## Links to Other Models
19
+ - [Gemma-2-9B-It-SPPO-Iter1](https://huggingface.co/UCLA-AGI/Gemma-2-9B-It-SPPO-Iter1)
20
+ - [Gemma-2-9B-It-SPPO-Iter2](https://huggingface.co/UCLA-AGI/Gemma-2-9B-It-SPPO-Iter2)
21
+ - [Gemma-2-9B-It-SPPO-Iter3](https://huggingface.co/UCLA-AGI/Gemma-2-9B-It-SPPO-Iter3)
22
+
23
+ ### Model Description
24
+
25
+ - Model type: A 8B parameter GPT-like model fine-tuned on synthetic datasets.
26
+ - Language(s) (NLP): Primarily English
27
+ - License: Apache-2.0
28
+ - Finetuned from model: google/gemma-2-9b-it
29
+
30
+
31
+ ## [AlpacaEval Leaderboard Evaluation Results](https://tatsu-lab.github.io/alpaca_eval/)
32
+
33
+
34
+ | Model | LC. Win Rate | Win Rate | Avg. Length |
35
+ |-------------------------------------------|:------------:|:--------:|:-----------:|
36
+ |[Gemma-2-9B-SPPO Iter1](https://huggingface.co/UCLA-AGI/Gemma-2-9B-It-SPPO-Iter1) |48.70 |40.76 | 1669
37
+ |[Gemma-2-9B-SPPO Iter2](https://huggingface.co/UCLA-AGI/Gemma-2-9B-It-SPPO-Iter2) |50.93 | 44.64 | 1759
38
+ |[Gemma-2-9B-SPPO Iter3](https://huggingface.co/UCLA-AGI/Gemma-2-9B-It-SPPO-Iter3) |**53.27** |**47.74** | 1803
39
+
40
+
41
+
42
+
43
+
44
+
45
+ ### Training hyperparameters
46
+ The following hyperparameters were used during training:
47
+
48
+ - learning_rate: 5e-07
49
+ - eta: 1000
50
+ - per_device_train_batch_size: 8
51
+ - gradient_accumulation_steps: 1
52
+ - seed: 42
53
+ - distributed_type: deepspeed_zero3
54
+ - num_devices: 8
55
+ - optimizer: RMSProp
56
+ - lr_scheduler_type: linear
57
+ - lr_scheduler_warmup_ratio: 0.1
58
+ - num_train_epochs: 1.0
59
+
60
+
61
+
62
+
63
+ ## Citation
64
+ ```
65
+ @misc{wu2024self,
66
+ title={Self-Play Preference Optimization for Language Model Alignment},
67
+ author={Wu, Yue and Sun, Zhiqing and Yuan, Huizhuo and Ji, Kaixuan and Yang, Yiming and Gu, Quanquan},
68
+ year={2024},
69
+ eprint={2405.00675},
70
+ archivePrefix={arXiv},
71
+ primaryClass={cs.LG}
72
+ }
73
+ ```
74
+
measurement.json ADDED
The diff for this file is too large to render. See raw diff