geonmin-kim commited on
Commit
6615799
·
verified ·
1 Parent(s): c75a62e

Model save

Browse files
README.md ADDED
@@ -0,0 +1,110 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ library_name: peft
4
+ tags:
5
+ - trl
6
+ - dpo
7
+ - generated_from_trainer
8
+ base_model: mistralai/Mistral-7B-v0.1
9
+ model-index:
10
+ - name: zephyr-7b-dpo-qlora
11
+ results: []
12
+ ---
13
+
14
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
15
+ should probably proofread and complete it, then remove this comment. -->
16
+
17
+ # zephyr-7b-dpo-qlora
18
+
19
+ This model is a fine-tuned version of [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) on the None dataset.
20
+ It achieves the following results on the evaluation set:
21
+ - Loss: 0.4873
22
+ - Rewards/chosen: -2.9667
23
+ - Rewards/rejected: -4.1000
24
+ - Rewards/accuracies: 0.7445
25
+ - Rewards/margins: 1.1333
26
+ - Logps/rejected: -654.6072
27
+ - Logps/chosen: -561.3217
28
+ - Logits/rejected: -0.9450
29
+ - Logits/chosen: -1.0724
30
+
31
+ ## Model description
32
+
33
+ More information needed
34
+
35
+ ## Intended uses & limitations
36
+
37
+ More information needed
38
+
39
+ ## Training and evaluation data
40
+
41
+ More information needed
42
+
43
+ ## Training procedure
44
+
45
+ ### Training hyperparameters
46
+
47
+ The following hyperparameters were used during training:
48
+ - learning_rate: 5e-06
49
+ - train_batch_size: 4
50
+ - eval_batch_size: 8
51
+ - seed: 42
52
+ - distributed_type: multi-GPU
53
+ - gradient_accumulation_steps: 4
54
+ - total_train_batch_size: 16
55
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
56
+ - lr_scheduler_type: cosine
57
+ - lr_scheduler_warmup_ratio: 0.1
58
+ - num_epochs: 1
59
+
60
+ ### Training results
61
+
62
+ | Training Loss | Epoch | Step | Logits/chosen | Logits/rejected | Logps/chosen | Logps/rejected | Validation Loss | Rewards/accuracies | Rewards/chosen | Rewards/margins | Rewards/rejected |
63
+ |:-------------:|:-----:|:----:|:-------------:|:---------------:|:------------:|:--------------:|:---------------:|:------------------:|:--------------:|:---------------:|:----------------:|
64
+ | 0.6819 | 0.03 | 100 | -2.0959 | -1.9565 | -259.6472 | -241.9029 | 0.6822 | 0.6545 | 0.0500 | 0.0230 | 0.0271 |
65
+ | 0.6548 | 0.05 | 200 | 0.6500 | -0.1489 | -0.2515 | 0.6780 | 0.1027 | -269.7628 | -279.5373 | -1.9329 | -2.0695 |
66
+ | 0.6084 | 0.08 | 300 | 0.6213 | -0.2956 | -0.4998 | 0.6810 | 0.2042 | -294.5921 | -294.2169 | -1.8771 | -2.0114 |
67
+ | 0.6237 | 0.1 | 400 | 0.6039 | -0.4538 | -0.7401 | 0.6935 | 0.2863 | -318.6170 | -310.0349 | -1.8367 | -1.9656 |
68
+ | 0.5534 | 0.13 | 500 | 0.5692 | -0.9154 | -1.3927 | 0.7050 | 0.4773 | -383.8828 | -356.1946 | -1.5403 | -1.6712 |
69
+ | 0.5613 | 0.16 | 600 | 0.5659 | -0.8123 | -1.3218 | 0.7025 | 0.5095 | -376.7896 | -345.8830 | -1.3701 | -1.5049 |
70
+ | 0.5139 | 0.18 | 700 | 0.5572 | -2.6368 | -3.4670 | 0.7145 | 0.8302 | -591.3087 | -528.3278 | -0.8924 | -1.0174 |
71
+ | 0.5184 | 0.21 | 800 | 0.5374 | -1.4908 | -2.1870 | 0.7160 | 0.6962 | -463.3091 | -413.7339 | -1.1141 | -1.2460 |
72
+ | 0.5211 | 0.24 | 900 | 0.5332 | -2.5430 | -3.3947 | 0.7180 | 0.8518 | -584.0806 | -518.9495 | -0.8116 | -0.9341 |
73
+ | 0.5553 | 0.26 | 1000 | 0.5178 | -2.1745 | -3.0424 | 0.7315 | 0.8679 | -548.8491 | -482.0993 | -0.8557 | -0.9813 |
74
+ | 0.5994 | 0.29 | 1100 | 0.5207 | -2.5002 | -3.3276 | 0.7300 | 0.8275 | -577.3698 | -514.6677 | -0.7615 | -0.8896 |
75
+ | 0.5976 | 0.31 | 1200 | 0.5098 | -2.1833 | -2.9905 | 0.7365 | 0.8072 | -543.6604 | -482.9834 | -0.8350 | -0.9596 |
76
+ | 0.5237 | 0.34 | 1300 | 0.5166 | -3.0973 | -4.1628 | 0.7350 | 1.0654 | -660.8850 | -574.3862 | -0.7072 | -0.8259 |
77
+ | 0.516 | 0.37 | 1400 | 0.5108 | -2.1009 | -3.0663 | 0.7350 | 0.9654 | -551.2367 | -474.7425 | -0.7865 | -0.9128 |
78
+ | 0.4593 | 0.39 | 1500 | 0.5174 | -2.3167 | -3.4254 | 0.7305 | 1.1088 | -587.1506 | -496.3185 | -0.8903 | -1.0211 |
79
+ | 0.5545 | 0.42 | 1600 | 0.5032 | -2.9938 | -4.0820 | 0.7370 | 1.0882 | -652.8123 | -564.0355 | -0.8801 | -1.0082 |
80
+ | 0.5425 | 0.44 | 1700 | 0.4996 | -3.3496 | -4.4061 | 0.7405 | 1.0565 | -685.2187 | -599.6096 | -0.8382 | -0.9686 |
81
+ | 0.4825 | 0.47 | 1800 | 0.5037 | -3.0446 | -4.1288 | 0.7380 | 1.0842 | -657.4884 | -569.1091 | -0.8738 | -1.0006 |
82
+ | 0.4455 | 0.5 | 1900 | 0.4962 | -3.0223 | -4.1482 | 0.7420 | 1.1259 | -659.4305 | -566.8840 | -0.8910 | -1.0214 |
83
+ | 0.4817 | 0.52 | 2000 | 0.4974 | -3.5987 | -4.6648 | 0.7470 | 1.0660 | -711.0853 | -624.5250 | -0.8139 | -0.9428 |
84
+ | 0.5079 | 0.55 | 2100 | 0.4923 | -3.1751 | -4.2293 | 0.7520 | 1.0542 | -667.5426 | -582.1657 | -0.8739 | -1.0031 |
85
+ | 0.477 | 0.58 | 2200 | 0.4897 | -2.6127 | -3.5713 | 0.7410 | 0.9587 | -601.7402 | -525.9182 | -0.9567 | -1.0880 |
86
+ | 0.4829 | 0.6 | 2300 | 0.4887 | -2.9530 | -4.0954 | 0.7485 | 1.1424 | -654.1511 | -559.9558 | -0.9032 | -1.0313 |
87
+ | 0.4752 | 0.63 | 2400 | 0.4909 | -3.1480 | -4.2815 | 0.7445 | 1.1335 | -672.7583 | -579.4506 | -0.8495 | -0.9765 |
88
+ | 0.5249 | 0.65 | 2500 | 0.4891 | -3.0936 | -4.2029 | 0.7445 | 1.1093 | -664.8962 | -574.0093 | -0.9136 | -1.0435 |
89
+ | 0.4596 | 0.68 | 2600 | 0.4939 | -2.9492 | -4.0985 | 0.7400 | 1.1493 | -654.4570 | -559.5698 | -0.9264 | -1.0549 |
90
+ | 0.5152 | 0.71 | 2700 | 0.4922 | -3.0197 | -4.1572 | 0.7440 | 1.1375 | -660.3236 | -566.6193 | -0.9249 | -1.0527 |
91
+ | 0.4518 | 0.73 | 2800 | 0.4908 | -3.0666 | -4.2342 | 0.7415 | 1.1676 | -668.0294 | -571.3138 | -0.9260 | -1.0535 |
92
+ | 0.5018 | 0.76 | 2900 | 0.4877 | -3.0977 | -4.2382 | 0.7465 | 1.1405 | -668.4285 | -574.4260 | -0.9320 | -1.0595 |
93
+ | 0.4592 | 0.79 | 3000 | 0.4873 | -2.9934 | -4.1134 | 0.7460 | 1.1200 | -655.9471 | -563.9877 | -0.9510 | -1.0788 |
94
+ | 0.4905 | 0.81 | 3100 | 0.4878 | -2.9825 | -4.1198 | 0.7430 | 1.1373 | -656.5853 | -562.9043 | -0.9465 | -1.0741 |
95
+ | 0.485 | 0.84 | 3200 | 0.4874 | -2.9459 | -4.0754 | 0.7455 | 1.1296 | -652.1517 | -559.2400 | -0.9531 | -1.0807 |
96
+ | 0.5157 | 0.86 | 3300 | 0.4874 | -2.9550 | -4.0838 | 0.7445 | 1.1289 | -652.9912 | -560.1489 | -0.9481 | -1.0755 |
97
+ | 0.4474 | 0.89 | 3400 | 0.4871 | -2.9699 | -4.1019 | 0.7435 | 1.1321 | -654.8017 | -561.6381 | -0.9499 | -1.0773 |
98
+ | 0.5379 | 0.92 | 3500 | 0.4874 | -2.9663 | -4.0989 | 0.7430 | 1.1326 | -654.5006 | -561.2808 | -0.9468 | -1.0742 |
99
+ | 0.464 | 0.94 | 3600 | 0.4874 | -2.9638 | -4.0967 | 0.7425 | 1.1329 | -654.2791 | -561.0286 | -0.9475 | -1.0748 |
100
+ | 0.4729 | 0.97 | 3700 | 0.4873 | -2.9666 | -4.0999 | 0.7445 | 1.1333 | -654.6014 | -561.3129 | -0.9495 | -1.0770 |
101
+ | 0.5017 | 0.99 | 3800 | 0.4873 | -2.9667 | -4.1000 | 0.7445 | 1.1333 | -654.6072 | -561.3217 | -0.9450 | -1.0724 |
102
+
103
+
104
+ ### Framework versions
105
+
106
+ - PEFT 0.7.1
107
+ - Transformers 4.39.0.dev0
108
+ - Pytorch 2.2.2+cu121
109
+ - Datasets 2.14.6
110
+ - Tokenizers 0.15.2
adapter_model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:90e264b19e273addbdca5bda3e10f90188e076f171d40c558c8daa4afa824fdd
3
  size 671150064
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d7262a16f0b092a2057a5c5f1fd27d9762f2c5c70842bf083667ea2f44521f4a
3
  size 671150064
all_results.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 1.0,
3
+ "train_loss": 0.5021860111574015,
4
+ "train_runtime": 41123.41,
5
+ "train_samples": 61134,
6
+ "train_samples_per_second": 1.487,
7
+ "train_steps_per_second": 0.093
8
+ }
runs/Apr08_17-22-54_gpu-1/events.out.tfevents.1712564669.gpu-1.407577.0 CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:afca04cade90042ae6cc45223679b79e284ea7695dd0baf710e34110f104e9de
3
- size 287105
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:04914239c4466595981804c7251c567ce4cd61fcb5eb5fe22ceaedbf9bdd4e66
3
+ size 288835
train_results.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 1.0,
3
+ "train_loss": 0.5021860111574015,
4
+ "train_runtime": 41123.41,
5
+ "train_samples": 61134,
6
+ "train_samples_per_second": 1.487,
7
+ "train_steps_per_second": 0.093
8
+ }
trainer_state.json ADDED
The diff for this file is too large to render. See raw diff