taufiqsyed commited on
Commit
04905b5
·
verified ·
1 Parent(s): 073b953

End of training

Browse files
Files changed (5) hide show
  1. README.md +5 -3
  2. all_results.json +15 -0
  3. eval_results.json +9 -0
  4. train_results.json +9 -0
  5. trainer_state.json +1542 -0
README.md CHANGED
@@ -3,6 +3,8 @@ library_name: peft
3
  license: cc-by-nc-4.0
4
  base_model: facebook/musicgen-melody
5
  tags:
 
 
6
  - generated_from_trainer
7
  model-index:
8
  - name: salami_truncsplit_model
@@ -14,10 +16,10 @@ should probably proofread and complete it, then remove this comment. -->
14
 
15
  # salami_truncsplit_model
16
 
17
- This model is a fine-tuned version of [facebook/musicgen-melody](https://huggingface.co/facebook/musicgen-melody) on an unknown dataset.
18
  It achieves the following results on the evaluation set:
19
- - Loss: 6.1305
20
- - Clap: 0.1143
21
 
22
  ## Model description
23
 
 
3
  license: cc-by-nc-4.0
4
  base_model: facebook/musicgen-melody
5
  tags:
6
+ - text-to-audio
7
+ - taufiqsyed/salami_cleaned_sampled
8
  - generated_from_trainer
9
  model-index:
10
  - name: salami_truncsplit_model
 
16
 
17
  # salami_truncsplit_model
18
 
19
+ This model is a fine-tuned version of [facebook/musicgen-melody](https://huggingface.co/facebook/musicgen-melody) on the TAUFIQSYED/SALAMI_CLEANED_SAMPLED - DEFAULT dataset.
20
  It achieves the following results on the evaluation set:
21
+ - Loss: 6.1330
22
+ - Clap: 0.1080
23
 
24
  ## Model description
25
 
all_results.json ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 3.9241877256317688,
3
+ "eval_clap": 0.10804431885480881,
4
+ "eval_loss": 6.133026123046875,
5
+ "eval_runtime": 165.3851,
6
+ "eval_samples": 16,
7
+ "eval_samples_per_second": 0.097,
8
+ "eval_steps_per_second": 0.097,
9
+ "total_flos": 784195045500888.0,
10
+ "train_loss": 6.39456293629665,
11
+ "train_runtime": 14405.0011,
12
+ "train_samples": 831,
13
+ "train_samples_per_second": 0.231,
14
+ "train_steps_per_second": 0.014
15
+ }
eval_results.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 3.9241877256317688,
3
+ "eval_clap": 0.10804431885480881,
4
+ "eval_loss": 6.133026123046875,
5
+ "eval_runtime": 165.3851,
6
+ "eval_samples": 16,
7
+ "eval_samples_per_second": 0.097,
8
+ "eval_steps_per_second": 0.097
9
+ }
train_results.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 3.9241877256317688,
3
+ "total_flos": 784195045500888.0,
4
+ "train_loss": 6.39456293629665,
5
+ "train_runtime": 14405.0011,
6
+ "train_samples": 831,
7
+ "train_samples_per_second": 0.231,
8
+ "train_steps_per_second": 0.014
9
+ }
trainer_state.json ADDED
@@ -0,0 +1,1542 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": null,
3
+ "best_model_checkpoint": null,
4
+ "epoch": 3.9241877256317688,
5
+ "eval_steps": 25,
6
+ "global_step": 204,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 0.019253910950661854,
13
+ "grad_norm": 23.54662322998047,
14
+ "learning_rate": 0.00019901960784313727,
15
+ "loss": 9.4209,
16
+ "step": 1
17
+ },
18
+ {
19
+ "epoch": 0.03850782190132371,
20
+ "grad_norm": 22.151025772094727,
21
+ "learning_rate": 0.00019803921568627454,
22
+ "loss": 9.3584,
23
+ "step": 2
24
+ },
25
+ {
26
+ "epoch": 0.05776173285198556,
27
+ "grad_norm": 32.229759216308594,
28
+ "learning_rate": 0.00019705882352941177,
29
+ "loss": 9.1469,
30
+ "step": 3
31
+ },
32
+ {
33
+ "epoch": 0.07701564380264742,
34
+ "grad_norm": 42.96324920654297,
35
+ "learning_rate": 0.000196078431372549,
36
+ "loss": 8.5595,
37
+ "step": 4
38
+ },
39
+ {
40
+ "epoch": 0.09626955475330927,
41
+ "grad_norm": 32.40974044799805,
42
+ "learning_rate": 0.00019509803921568628,
43
+ "loss": 8.3043,
44
+ "step": 5
45
+ },
46
+ {
47
+ "epoch": 0.11552346570397112,
48
+ "grad_norm": 32.838134765625,
49
+ "learning_rate": 0.00019411764705882354,
50
+ "loss": 8.1422,
51
+ "step": 6
52
+ },
53
+ {
54
+ "epoch": 0.13477737665463296,
55
+ "grad_norm": 34.38292694091797,
56
+ "learning_rate": 0.0001931372549019608,
57
+ "loss": 7.7643,
58
+ "step": 7
59
+ },
60
+ {
61
+ "epoch": 0.15403128760529483,
62
+ "grad_norm": 31.947425842285156,
63
+ "learning_rate": 0.00019215686274509807,
64
+ "loss": 7.4565,
65
+ "step": 8
66
+ },
67
+ {
68
+ "epoch": 0.17328519855595667,
69
+ "grad_norm": 242.39166259765625,
70
+ "learning_rate": 0.0001911764705882353,
71
+ "loss": 7.436,
72
+ "step": 9
73
+ },
74
+ {
75
+ "epoch": 0.19253910950661854,
76
+ "grad_norm": 25.68425750732422,
77
+ "learning_rate": 0.00019019607843137254,
78
+ "loss": 7.1307,
79
+ "step": 10
80
+ },
81
+ {
82
+ "epoch": 0.21179302045728038,
83
+ "grad_norm": 24.717641830444336,
84
+ "learning_rate": 0.0001892156862745098,
85
+ "loss": 7.1206,
86
+ "step": 11
87
+ },
88
+ {
89
+ "epoch": 0.23104693140794225,
90
+ "grad_norm": 36.47980880737305,
91
+ "learning_rate": 0.00018823529411764707,
92
+ "loss": 6.6912,
93
+ "step": 12
94
+ },
95
+ {
96
+ "epoch": 0.2503008423586041,
97
+ "grad_norm": 28.181612014770508,
98
+ "learning_rate": 0.00018725490196078433,
99
+ "loss": 6.6547,
100
+ "step": 13
101
+ },
102
+ {
103
+ "epoch": 0.2695547533092659,
104
+ "grad_norm": 24.55516242980957,
105
+ "learning_rate": 0.00018627450980392157,
106
+ "loss": 6.9486,
107
+ "step": 14
108
+ },
109
+ {
110
+ "epoch": 0.2888086642599278,
111
+ "grad_norm": 32.426963806152344,
112
+ "learning_rate": 0.00018529411764705883,
113
+ "loss": 7.1069,
114
+ "step": 15
115
+ },
116
+ {
117
+ "epoch": 0.30806257521058966,
118
+ "grad_norm": 20.413976669311523,
119
+ "learning_rate": 0.00018431372549019607,
120
+ "loss": 6.6628,
121
+ "step": 16
122
+ },
123
+ {
124
+ "epoch": 0.32731648616125153,
125
+ "grad_norm": 28.58907699584961,
126
+ "learning_rate": 0.00018333333333333334,
127
+ "loss": 6.5333,
128
+ "step": 17
129
+ },
130
+ {
131
+ "epoch": 0.34657039711191334,
132
+ "grad_norm": 24.02996253967285,
133
+ "learning_rate": 0.0001823529411764706,
134
+ "loss": 6.5981,
135
+ "step": 18
136
+ },
137
+ {
138
+ "epoch": 0.3658243080625752,
139
+ "grad_norm": 23.250669479370117,
140
+ "learning_rate": 0.00018137254901960786,
141
+ "loss": 6.4779,
142
+ "step": 19
143
+ },
144
+ {
145
+ "epoch": 0.3850782190132371,
146
+ "grad_norm": 15.006091117858887,
147
+ "learning_rate": 0.0001803921568627451,
148
+ "loss": 6.6096,
149
+ "step": 20
150
+ },
151
+ {
152
+ "epoch": 0.4043321299638989,
153
+ "grad_norm": 16.560985565185547,
154
+ "learning_rate": 0.00017941176470588236,
155
+ "loss": 6.6496,
156
+ "step": 21
157
+ },
158
+ {
159
+ "epoch": 0.42358604091456076,
160
+ "grad_norm": 31.329875946044922,
161
+ "learning_rate": 0.00017843137254901963,
162
+ "loss": 6.9627,
163
+ "step": 22
164
+ },
165
+ {
166
+ "epoch": 0.4428399518652226,
167
+ "grad_norm": 12.381958961486816,
168
+ "learning_rate": 0.00017745098039215687,
169
+ "loss": 6.398,
170
+ "step": 23
171
+ },
172
+ {
173
+ "epoch": 0.4620938628158845,
174
+ "grad_norm": 9.271923065185547,
175
+ "learning_rate": 0.00017647058823529413,
176
+ "loss": 6.6,
177
+ "step": 24
178
+ },
179
+ {
180
+ "epoch": 0.4813477737665463,
181
+ "grad_norm": 12.544185638427734,
182
+ "learning_rate": 0.00017549019607843137,
183
+ "loss": 6.4684,
184
+ "step": 25
185
+ },
186
+ {
187
+ "epoch": 0.4813477737665463,
188
+ "eval_clap": 0.09883298724889755,
189
+ "eval_loss": 6.00625467300415,
190
+ "eval_runtime": 166.3531,
191
+ "eval_samples_per_second": 0.096,
192
+ "eval_steps_per_second": 0.096,
193
+ "step": 25
194
+ },
195
+ {
196
+ "epoch": 0.5006016847172082,
197
+ "grad_norm": 11.769013404846191,
198
+ "learning_rate": 0.00017450980392156863,
199
+ "loss": 6.5248,
200
+ "step": 26
201
+ },
202
+ {
203
+ "epoch": 0.51985559566787,
204
+ "grad_norm": 11.039627075195312,
205
+ "learning_rate": 0.0001735294117647059,
206
+ "loss": 6.6403,
207
+ "step": 27
208
+ },
209
+ {
210
+ "epoch": 0.5391095066185319,
211
+ "grad_norm": 17.4042911529541,
212
+ "learning_rate": 0.00017254901960784316,
213
+ "loss": 6.8092,
214
+ "step": 28
215
+ },
216
+ {
217
+ "epoch": 0.5583634175691937,
218
+ "grad_norm": 12.926351547241211,
219
+ "learning_rate": 0.0001715686274509804,
220
+ "loss": 6.5886,
221
+ "step": 29
222
+ },
223
+ {
224
+ "epoch": 0.5776173285198556,
225
+ "grad_norm": 12.865156173706055,
226
+ "learning_rate": 0.00017058823529411766,
227
+ "loss": 6.6176,
228
+ "step": 30
229
+ },
230
+ {
231
+ "epoch": 0.5968712394705175,
232
+ "grad_norm": 15.517515182495117,
233
+ "learning_rate": 0.0001696078431372549,
234
+ "loss": 6.4096,
235
+ "step": 31
236
+ },
237
+ {
238
+ "epoch": 0.6161251504211793,
239
+ "grad_norm": 12.356785774230957,
240
+ "learning_rate": 0.00016862745098039216,
241
+ "loss": 6.4528,
242
+ "step": 32
243
+ },
244
+ {
245
+ "epoch": 0.6353790613718412,
246
+ "grad_norm": 15.226251602172852,
247
+ "learning_rate": 0.00016764705882352942,
248
+ "loss": 6.3188,
249
+ "step": 33
250
+ },
251
+ {
252
+ "epoch": 0.6546329723225031,
253
+ "grad_norm": 13.221582412719727,
254
+ "learning_rate": 0.0001666666666666667,
255
+ "loss": 6.542,
256
+ "step": 34
257
+ },
258
+ {
259
+ "epoch": 0.6738868832731648,
260
+ "grad_norm": 13.414304733276367,
261
+ "learning_rate": 0.00016568627450980395,
262
+ "loss": 6.4272,
263
+ "step": 35
264
+ },
265
+ {
266
+ "epoch": 0.6931407942238267,
267
+ "grad_norm": 27.81321907043457,
268
+ "learning_rate": 0.0001647058823529412,
269
+ "loss": 6.7035,
270
+ "step": 36
271
+ },
272
+ {
273
+ "epoch": 0.7123947051744886,
274
+ "grad_norm": 17.882911682128906,
275
+ "learning_rate": 0.00016372549019607843,
276
+ "loss": 6.6117,
277
+ "step": 37
278
+ },
279
+ {
280
+ "epoch": 0.7316486161251504,
281
+ "grad_norm": 10.675613403320312,
282
+ "learning_rate": 0.0001627450980392157,
283
+ "loss": 6.4818,
284
+ "step": 38
285
+ },
286
+ {
287
+ "epoch": 0.7509025270758123,
288
+ "grad_norm": 11.32511043548584,
289
+ "learning_rate": 0.00016176470588235295,
290
+ "loss": 6.4717,
291
+ "step": 39
292
+ },
293
+ {
294
+ "epoch": 0.7701564380264742,
295
+ "grad_norm": 13.292048454284668,
296
+ "learning_rate": 0.00016078431372549022,
297
+ "loss": 6.4119,
298
+ "step": 40
299
+ },
300
+ {
301
+ "epoch": 0.789410348977136,
302
+ "grad_norm": 9.824177742004395,
303
+ "learning_rate": 0.00015980392156862746,
304
+ "loss": 6.6399,
305
+ "step": 41
306
+ },
307
+ {
308
+ "epoch": 0.8086642599277978,
309
+ "grad_norm": 18.48476791381836,
310
+ "learning_rate": 0.0001588235294117647,
311
+ "loss": 6.4116,
312
+ "step": 42
313
+ },
314
+ {
315
+ "epoch": 0.8279181708784596,
316
+ "grad_norm": 10.409250259399414,
317
+ "learning_rate": 0.00015784313725490196,
318
+ "loss": 6.4832,
319
+ "step": 43
320
+ },
321
+ {
322
+ "epoch": 0.8471720818291215,
323
+ "grad_norm": 18.297466278076172,
324
+ "learning_rate": 0.00015686274509803922,
325
+ "loss": 6.308,
326
+ "step": 44
327
+ },
328
+ {
329
+ "epoch": 0.8664259927797834,
330
+ "grad_norm": 12.408952713012695,
331
+ "learning_rate": 0.00015588235294117648,
332
+ "loss": 6.3373,
333
+ "step": 45
334
+ },
335
+ {
336
+ "epoch": 0.8856799037304453,
337
+ "grad_norm": 12.280571937561035,
338
+ "learning_rate": 0.00015490196078431375,
339
+ "loss": 6.3173,
340
+ "step": 46
341
+ },
342
+ {
343
+ "epoch": 0.9049338146811071,
344
+ "grad_norm": 12.348167419433594,
345
+ "learning_rate": 0.00015392156862745098,
346
+ "loss": 6.2873,
347
+ "step": 47
348
+ },
349
+ {
350
+ "epoch": 0.924187725631769,
351
+ "grad_norm": 28.005126953125,
352
+ "learning_rate": 0.00015294117647058822,
353
+ "loss": 6.7117,
354
+ "step": 48
355
+ },
356
+ {
357
+ "epoch": 0.9434416365824309,
358
+ "grad_norm": 16.248571395874023,
359
+ "learning_rate": 0.00015196078431372549,
360
+ "loss": 6.3493,
361
+ "step": 49
362
+ },
363
+ {
364
+ "epoch": 0.9626955475330926,
365
+ "grad_norm": 19.102869033813477,
366
+ "learning_rate": 0.00015098039215686275,
367
+ "loss": 6.4209,
368
+ "step": 50
369
+ },
370
+ {
371
+ "epoch": 0.9626955475330926,
372
+ "eval_clap": 0.13957397639751434,
373
+ "eval_loss": 6.070012092590332,
374
+ "eval_runtime": 165.6113,
375
+ "eval_samples_per_second": 0.097,
376
+ "eval_steps_per_second": 0.097,
377
+ "step": 50
378
+ },
379
+ {
380
+ "epoch": 0.9819494584837545,
381
+ "grad_norm": 6.675487995147705,
382
+ "learning_rate": 0.00015000000000000001,
383
+ "loss": 6.1695,
384
+ "step": 51
385
+ },
386
+ {
387
+ "epoch": 1.0,
388
+ "grad_norm": 14.88092041015625,
389
+ "learning_rate": 0.00014901960784313728,
390
+ "loss": 5.6169,
391
+ "step": 52
392
+ },
393
+ {
394
+ "epoch": 1.0192539109506618,
395
+ "grad_norm": 19.78269386291504,
396
+ "learning_rate": 0.00014803921568627451,
397
+ "loss": 6.5455,
398
+ "step": 53
399
+ },
400
+ {
401
+ "epoch": 1.0385078219013237,
402
+ "grad_norm": 7.873740196228027,
403
+ "learning_rate": 0.00014705882352941178,
404
+ "loss": 6.3154,
405
+ "step": 54
406
+ },
407
+ {
408
+ "epoch": 1.0577617328519855,
409
+ "grad_norm": 10.514632225036621,
410
+ "learning_rate": 0.00014607843137254902,
411
+ "loss": 6.5085,
412
+ "step": 55
413
+ },
414
+ {
415
+ "epoch": 1.0770156438026475,
416
+ "grad_norm": 10.021757125854492,
417
+ "learning_rate": 0.00014509803921568628,
418
+ "loss": 6.5109,
419
+ "step": 56
420
+ },
421
+ {
422
+ "epoch": 1.0962695547533092,
423
+ "grad_norm": 8.690667152404785,
424
+ "learning_rate": 0.00014411764705882354,
425
+ "loss": 6.5515,
426
+ "step": 57
427
+ },
428
+ {
429
+ "epoch": 1.1155234657039712,
430
+ "grad_norm": 12.78662109375,
431
+ "learning_rate": 0.00014313725490196078,
432
+ "loss": 6.5425,
433
+ "step": 58
434
+ },
435
+ {
436
+ "epoch": 1.134777376654633,
437
+ "grad_norm": 10.592965126037598,
438
+ "learning_rate": 0.00014215686274509804,
439
+ "loss": 6.5105,
440
+ "step": 59
441
+ },
442
+ {
443
+ "epoch": 1.154031287605295,
444
+ "grad_norm": 7.947122573852539,
445
+ "learning_rate": 0.0001411764705882353,
446
+ "loss": 6.6142,
447
+ "step": 60
448
+ },
449
+ {
450
+ "epoch": 1.1732851985559567,
451
+ "grad_norm": 6.823319911956787,
452
+ "learning_rate": 0.00014019607843137255,
453
+ "loss": 6.5339,
454
+ "step": 61
455
+ },
456
+ {
457
+ "epoch": 1.1925391095066185,
458
+ "grad_norm": 16.670989990234375,
459
+ "learning_rate": 0.0001392156862745098,
460
+ "loss": 6.3022,
461
+ "step": 62
462
+ },
463
+ {
464
+ "epoch": 1.2117930204572804,
465
+ "grad_norm": 20.09317398071289,
466
+ "learning_rate": 0.00013823529411764707,
467
+ "loss": 6.0779,
468
+ "step": 63
469
+ },
470
+ {
471
+ "epoch": 1.2310469314079422,
472
+ "grad_norm": 8.030014991760254,
473
+ "learning_rate": 0.0001372549019607843,
474
+ "loss": 6.3284,
475
+ "step": 64
476
+ },
477
+ {
478
+ "epoch": 1.2503008423586042,
479
+ "grad_norm": 10.324827194213867,
480
+ "learning_rate": 0.00013627450980392157,
481
+ "loss": 6.4022,
482
+ "step": 65
483
+ },
484
+ {
485
+ "epoch": 1.269554753309266,
486
+ "grad_norm": 29.070960998535156,
487
+ "learning_rate": 0.00013529411764705884,
488
+ "loss": 6.7835,
489
+ "step": 66
490
+ },
491
+ {
492
+ "epoch": 1.288808664259928,
493
+ "grad_norm": 17.838394165039062,
494
+ "learning_rate": 0.00013431372549019608,
495
+ "loss": 6.5344,
496
+ "step": 67
497
+ },
498
+ {
499
+ "epoch": 1.3080625752105897,
500
+ "grad_norm": 10.388354301452637,
501
+ "learning_rate": 0.00013333333333333334,
502
+ "loss": 6.3438,
503
+ "step": 68
504
+ },
505
+ {
506
+ "epoch": 1.3273164861612514,
507
+ "grad_norm": 9.607653617858887,
508
+ "learning_rate": 0.0001323529411764706,
509
+ "loss": 6.4325,
510
+ "step": 69
511
+ },
512
+ {
513
+ "epoch": 1.3465703971119134,
514
+ "grad_norm": 9.639688491821289,
515
+ "learning_rate": 0.00013137254901960784,
516
+ "loss": 6.3907,
517
+ "step": 70
518
+ },
519
+ {
520
+ "epoch": 1.3658243080625752,
521
+ "grad_norm": 9.424043655395508,
522
+ "learning_rate": 0.0001303921568627451,
523
+ "loss": 6.605,
524
+ "step": 71
525
+ },
526
+ {
527
+ "epoch": 1.3850782190132371,
528
+ "grad_norm": 8.21303653717041,
529
+ "learning_rate": 0.00012941176470588237,
530
+ "loss": 6.6275,
531
+ "step": 72
532
+ },
533
+ {
534
+ "epoch": 1.404332129963899,
535
+ "grad_norm": 10.479741096496582,
536
+ "learning_rate": 0.00012843137254901963,
537
+ "loss": 6.4801,
538
+ "step": 73
539
+ },
540
+ {
541
+ "epoch": 1.4235860409145609,
542
+ "grad_norm": 21.424253463745117,
543
+ "learning_rate": 0.00012745098039215687,
544
+ "loss": 6.3391,
545
+ "step": 74
546
+ },
547
+ {
548
+ "epoch": 1.4428399518652226,
549
+ "grad_norm": 6.5513224601745605,
550
+ "learning_rate": 0.0001264705882352941,
551
+ "loss": 6.7252,
552
+ "step": 75
553
+ },
554
+ {
555
+ "epoch": 1.4428399518652226,
556
+ "eval_clap": 0.10309316217899323,
557
+ "eval_loss": 6.036521911621094,
558
+ "eval_runtime": 165.4554,
559
+ "eval_samples_per_second": 0.097,
560
+ "eval_steps_per_second": 0.097,
561
+ "step": 75
562
+ },
563
+ {
564
+ "epoch": 1.4620938628158844,
565
+ "grad_norm": 32.52528762817383,
566
+ "learning_rate": 0.00012549019607843137,
567
+ "loss": 6.1922,
568
+ "step": 76
569
+ },
570
+ {
571
+ "epoch": 1.4813477737665464,
572
+ "grad_norm": 23.51795196533203,
573
+ "learning_rate": 0.00012450980392156863,
574
+ "loss": 6.3506,
575
+ "step": 77
576
+ },
577
+ {
578
+ "epoch": 1.5006016847172083,
579
+ "grad_norm": 10.925686836242676,
580
+ "learning_rate": 0.0001235294117647059,
581
+ "loss": 6.4783,
582
+ "step": 78
583
+ },
584
+ {
585
+ "epoch": 1.5198555956678699,
586
+ "grad_norm": 7.924820899963379,
587
+ "learning_rate": 0.00012254901960784316,
588
+ "loss": 6.6288,
589
+ "step": 79
590
+ },
591
+ {
592
+ "epoch": 1.5391095066185319,
593
+ "grad_norm": 6.946601390838623,
594
+ "learning_rate": 0.00012156862745098039,
595
+ "loss": 6.4085,
596
+ "step": 80
597
+ },
598
+ {
599
+ "epoch": 1.5583634175691938,
600
+ "grad_norm": 10.120043754577637,
601
+ "learning_rate": 0.00012058823529411765,
602
+ "loss": 6.4667,
603
+ "step": 81
604
+ },
605
+ {
606
+ "epoch": 1.5776173285198556,
607
+ "grad_norm": 9.635017395019531,
608
+ "learning_rate": 0.0001196078431372549,
609
+ "loss": 6.3742,
610
+ "step": 82
611
+ },
612
+ {
613
+ "epoch": 1.5968712394705173,
614
+ "grad_norm": 6.578627586364746,
615
+ "learning_rate": 0.00011862745098039216,
616
+ "loss": 6.1956,
617
+ "step": 83
618
+ },
619
+ {
620
+ "epoch": 1.6161251504211793,
621
+ "grad_norm": 18.30640983581543,
622
+ "learning_rate": 0.00011764705882352942,
623
+ "loss": 6.4804,
624
+ "step": 84
625
+ },
626
+ {
627
+ "epoch": 1.6353790613718413,
628
+ "grad_norm": 11.166876792907715,
629
+ "learning_rate": 0.00011666666666666668,
630
+ "loss": 6.4495,
631
+ "step": 85
632
+ },
633
+ {
634
+ "epoch": 1.654632972322503,
635
+ "grad_norm": 8.15738582611084,
636
+ "learning_rate": 0.00011568627450980394,
637
+ "loss": 6.1371,
638
+ "step": 86
639
+ },
640
+ {
641
+ "epoch": 1.6738868832731648,
642
+ "grad_norm": 9.473989486694336,
643
+ "learning_rate": 0.00011470588235294118,
644
+ "loss": 6.366,
645
+ "step": 87
646
+ },
647
+ {
648
+ "epoch": 1.6931407942238268,
649
+ "grad_norm": 16.634380340576172,
650
+ "learning_rate": 0.00011372549019607843,
651
+ "loss": 6.1748,
652
+ "step": 88
653
+ },
654
+ {
655
+ "epoch": 1.7123947051744886,
656
+ "grad_norm": 20.92518424987793,
657
+ "learning_rate": 0.0001127450980392157,
658
+ "loss": 6.0918,
659
+ "step": 89
660
+ },
661
+ {
662
+ "epoch": 1.7316486161251503,
663
+ "grad_norm": 10.186667442321777,
664
+ "learning_rate": 0.00011176470588235294,
665
+ "loss": 6.1072,
666
+ "step": 90
667
+ },
668
+ {
669
+ "epoch": 1.7509025270758123,
670
+ "grad_norm": 21.300180435180664,
671
+ "learning_rate": 0.00011078431372549021,
672
+ "loss": 6.724,
673
+ "step": 91
674
+ },
675
+ {
676
+ "epoch": 1.7701564380264743,
677
+ "grad_norm": 17.833845138549805,
678
+ "learning_rate": 0.00010980392156862746,
679
+ "loss": 6.2231,
680
+ "step": 92
681
+ },
682
+ {
683
+ "epoch": 1.789410348977136,
684
+ "grad_norm": 12.850127220153809,
685
+ "learning_rate": 0.0001088235294117647,
686
+ "loss": 6.4846,
687
+ "step": 93
688
+ },
689
+ {
690
+ "epoch": 1.8086642599277978,
691
+ "grad_norm": 16.229764938354492,
692
+ "learning_rate": 0.00010784313725490196,
693
+ "loss": 6.6046,
694
+ "step": 94
695
+ },
696
+ {
697
+ "epoch": 1.8279181708784598,
698
+ "grad_norm": 41.6049690246582,
699
+ "learning_rate": 0.00010686274509803922,
700
+ "loss": 6.5044,
701
+ "step": 95
702
+ },
703
+ {
704
+ "epoch": 1.8471720818291215,
705
+ "grad_norm": 8.0320463180542,
706
+ "learning_rate": 0.00010588235294117647,
707
+ "loss": 6.4836,
708
+ "step": 96
709
+ },
710
+ {
711
+ "epoch": 1.8664259927797833,
712
+ "grad_norm": 19.129127502441406,
713
+ "learning_rate": 0.00010490196078431374,
714
+ "loss": 6.1962,
715
+ "step": 97
716
+ },
717
+ {
718
+ "epoch": 1.8856799037304453,
719
+ "grad_norm": 14.464997291564941,
720
+ "learning_rate": 0.00010392156862745099,
721
+ "loss": 6.2694,
722
+ "step": 98
723
+ },
724
+ {
725
+ "epoch": 1.9049338146811072,
726
+ "grad_norm": 25.245752334594727,
727
+ "learning_rate": 0.00010294117647058823,
728
+ "loss": 6.0148,
729
+ "step": 99
730
+ },
731
+ {
732
+ "epoch": 1.924187725631769,
733
+ "grad_norm": 12.66399097442627,
734
+ "learning_rate": 0.00010196078431372549,
735
+ "loss": 6.1879,
736
+ "step": 100
737
+ },
738
+ {
739
+ "epoch": 1.924187725631769,
740
+ "eval_clap": 0.12328307330608368,
741
+ "eval_loss": 5.896579742431641,
742
+ "eval_runtime": 165.5834,
743
+ "eval_samples_per_second": 0.097,
744
+ "eval_steps_per_second": 0.097,
745
+ "step": 100
746
+ },
747
+ {
748
+ "epoch": 1.9434416365824307,
749
+ "grad_norm": 12.162952423095703,
750
+ "learning_rate": 0.00010098039215686274,
751
+ "loss": 6.1875,
752
+ "step": 101
753
+ },
754
+ {
755
+ "epoch": 1.9626955475330927,
756
+ "grad_norm": 16.754629135131836,
757
+ "learning_rate": 0.0001,
758
+ "loss": 6.5483,
759
+ "step": 102
760
+ },
761
+ {
762
+ "epoch": 1.9819494584837545,
763
+ "grad_norm": 9.804841995239258,
764
+ "learning_rate": 9.901960784313727e-05,
765
+ "loss": 6.0631,
766
+ "step": 103
767
+ },
768
+ {
769
+ "epoch": 2.0,
770
+ "grad_norm": 26.169551849365234,
771
+ "learning_rate": 9.80392156862745e-05,
772
+ "loss": 6.3384,
773
+ "step": 104
774
+ },
775
+ {
776
+ "epoch": 2.019253910950662,
777
+ "grad_norm": 22.054380416870117,
778
+ "learning_rate": 9.705882352941177e-05,
779
+ "loss": 6.5192,
780
+ "step": 105
781
+ },
782
+ {
783
+ "epoch": 2.0385078219013235,
784
+ "grad_norm": 13.319371223449707,
785
+ "learning_rate": 9.607843137254903e-05,
786
+ "loss": 6.1904,
787
+ "step": 106
788
+ },
789
+ {
790
+ "epoch": 2.0577617328519855,
791
+ "grad_norm": 13.158707618713379,
792
+ "learning_rate": 9.509803921568627e-05,
793
+ "loss": 6.4906,
794
+ "step": 107
795
+ },
796
+ {
797
+ "epoch": 2.0770156438026475,
798
+ "grad_norm": 7.972289562225342,
799
+ "learning_rate": 9.411764705882353e-05,
800
+ "loss": 6.4551,
801
+ "step": 108
802
+ },
803
+ {
804
+ "epoch": 2.0962695547533094,
805
+ "grad_norm": 14.052528381347656,
806
+ "learning_rate": 9.313725490196079e-05,
807
+ "loss": 6.2028,
808
+ "step": 109
809
+ },
810
+ {
811
+ "epoch": 2.115523465703971,
812
+ "grad_norm": 21.128631591796875,
813
+ "learning_rate": 9.215686274509804e-05,
814
+ "loss": 6.121,
815
+ "step": 110
816
+ },
817
+ {
818
+ "epoch": 2.134777376654633,
819
+ "grad_norm": 9.11488151550293,
820
+ "learning_rate": 9.11764705882353e-05,
821
+ "loss": 6.559,
822
+ "step": 111
823
+ },
824
+ {
825
+ "epoch": 2.154031287605295,
826
+ "grad_norm": 10.081767082214355,
827
+ "learning_rate": 9.019607843137255e-05,
828
+ "loss": 6.4236,
829
+ "step": 112
830
+ },
831
+ {
832
+ "epoch": 2.1732851985559565,
833
+ "grad_norm": 7.397235870361328,
834
+ "learning_rate": 8.921568627450981e-05,
835
+ "loss": 6.5415,
836
+ "step": 113
837
+ },
838
+ {
839
+ "epoch": 2.1925391095066185,
840
+ "grad_norm": 9.652939796447754,
841
+ "learning_rate": 8.823529411764706e-05,
842
+ "loss": 6.3744,
843
+ "step": 114
844
+ },
845
+ {
846
+ "epoch": 2.2117930204572804,
847
+ "grad_norm": 12.823005676269531,
848
+ "learning_rate": 8.725490196078432e-05,
849
+ "loss": 5.9683,
850
+ "step": 115
851
+ },
852
+ {
853
+ "epoch": 2.2310469314079424,
854
+ "grad_norm": 9.981169700622559,
855
+ "learning_rate": 8.627450980392158e-05,
856
+ "loss": 6.2714,
857
+ "step": 116
858
+ },
859
+ {
860
+ "epoch": 2.250300842358604,
861
+ "grad_norm": 11.026590347290039,
862
+ "learning_rate": 8.529411764705883e-05,
863
+ "loss": 6.1287,
864
+ "step": 117
865
+ },
866
+ {
867
+ "epoch": 2.269554753309266,
868
+ "grad_norm": 14.469505310058594,
869
+ "learning_rate": 8.431372549019608e-05,
870
+ "loss": 6.2634,
871
+ "step": 118
872
+ },
873
+ {
874
+ "epoch": 2.288808664259928,
875
+ "grad_norm": 10.639300346374512,
876
+ "learning_rate": 8.333333333333334e-05,
877
+ "loss": 6.1014,
878
+ "step": 119
879
+ },
880
+ {
881
+ "epoch": 2.30806257521059,
882
+ "grad_norm": 10.407938003540039,
883
+ "learning_rate": 8.23529411764706e-05,
884
+ "loss": 6.2487,
885
+ "step": 120
886
+ },
887
+ {
888
+ "epoch": 2.3273164861612514,
889
+ "grad_norm": 18.310867309570312,
890
+ "learning_rate": 8.137254901960785e-05,
891
+ "loss": 6.025,
892
+ "step": 121
893
+ },
894
+ {
895
+ "epoch": 2.3465703971119134,
896
+ "grad_norm": 13.314108848571777,
897
+ "learning_rate": 8.039215686274511e-05,
898
+ "loss": 6.1319,
899
+ "step": 122
900
+ },
901
+ {
902
+ "epoch": 2.3658243080625754,
903
+ "grad_norm": 12.528412818908691,
904
+ "learning_rate": 7.941176470588235e-05,
905
+ "loss": 6.27,
906
+ "step": 123
907
+ },
908
+ {
909
+ "epoch": 2.385078219013237,
910
+ "grad_norm": 10.71603775024414,
911
+ "learning_rate": 7.843137254901961e-05,
912
+ "loss": 6.4118,
913
+ "step": 124
914
+ },
915
+ {
916
+ "epoch": 2.404332129963899,
917
+ "grad_norm": 8.234016418457031,
918
+ "learning_rate": 7.745098039215687e-05,
919
+ "loss": 6.3642,
920
+ "step": 125
921
+ },
922
+ {
923
+ "epoch": 2.404332129963899,
924
+ "eval_clap": 0.10650094598531723,
925
+ "eval_loss": 6.806448936462402,
926
+ "eval_runtime": 165.8182,
927
+ "eval_samples_per_second": 0.096,
928
+ "eval_steps_per_second": 0.096,
929
+ "step": 125
930
+ },
931
+ {
932
+ "epoch": 2.423586040914561,
933
+ "grad_norm": 13.84628963470459,
934
+ "learning_rate": 7.647058823529411e-05,
935
+ "loss": 6.0872,
936
+ "step": 126
937
+ },
938
+ {
939
+ "epoch": 2.4428399518652224,
940
+ "grad_norm": 7.576101779937744,
941
+ "learning_rate": 7.549019607843137e-05,
942
+ "loss": 6.3515,
943
+ "step": 127
944
+ },
945
+ {
946
+ "epoch": 2.4620938628158844,
947
+ "grad_norm": 9.205301284790039,
948
+ "learning_rate": 7.450980392156864e-05,
949
+ "loss": 6.0883,
950
+ "step": 128
951
+ },
952
+ {
953
+ "epoch": 2.4813477737665464,
954
+ "grad_norm": 8.85059928894043,
955
+ "learning_rate": 7.352941176470589e-05,
956
+ "loss": 5.824,
957
+ "step": 129
958
+ },
959
+ {
960
+ "epoch": 2.5006016847172083,
961
+ "grad_norm": 6.963297367095947,
962
+ "learning_rate": 7.254901960784314e-05,
963
+ "loss": 6.4633,
964
+ "step": 130
965
+ },
966
+ {
967
+ "epoch": 2.51985559566787,
968
+ "grad_norm": 6.612102508544922,
969
+ "learning_rate": 7.156862745098039e-05,
970
+ "loss": 6.3979,
971
+ "step": 131
972
+ },
973
+ {
974
+ "epoch": 2.539109506618532,
975
+ "grad_norm": 11.322911262512207,
976
+ "learning_rate": 7.058823529411765e-05,
977
+ "loss": 6.2103,
978
+ "step": 132
979
+ },
980
+ {
981
+ "epoch": 2.558363417569194,
982
+ "grad_norm": 21.0396671295166,
983
+ "learning_rate": 6.96078431372549e-05,
984
+ "loss": 5.6772,
985
+ "step": 133
986
+ },
987
+ {
988
+ "epoch": 2.577617328519856,
989
+ "grad_norm": 13.040122985839844,
990
+ "learning_rate": 6.862745098039216e-05,
991
+ "loss": 6.0072,
992
+ "step": 134
993
+ },
994
+ {
995
+ "epoch": 2.5968712394705173,
996
+ "grad_norm": 13.392056465148926,
997
+ "learning_rate": 6.764705882352942e-05,
998
+ "loss": 6.0408,
999
+ "step": 135
1000
+ },
1001
+ {
1002
+ "epoch": 2.6161251504211793,
1003
+ "grad_norm": 9.345407485961914,
1004
+ "learning_rate": 6.666666666666667e-05,
1005
+ "loss": 6.345,
1006
+ "step": 136
1007
+ },
1008
+ {
1009
+ "epoch": 2.6353790613718413,
1010
+ "grad_norm": 9.068965911865234,
1011
+ "learning_rate": 6.568627450980392e-05,
1012
+ "loss": 6.0518,
1013
+ "step": 137
1014
+ },
1015
+ {
1016
+ "epoch": 2.654632972322503,
1017
+ "grad_norm": 9.924796104431152,
1018
+ "learning_rate": 6.470588235294118e-05,
1019
+ "loss": 6.404,
1020
+ "step": 138
1021
+ },
1022
+ {
1023
+ "epoch": 2.673886883273165,
1024
+ "grad_norm": 11.512860298156738,
1025
+ "learning_rate": 6.372549019607843e-05,
1026
+ "loss": 5.849,
1027
+ "step": 139
1028
+ },
1029
+ {
1030
+ "epoch": 2.693140794223827,
1031
+ "grad_norm": 9.558600425720215,
1032
+ "learning_rate": 6.274509803921569e-05,
1033
+ "loss": 6.0751,
1034
+ "step": 140
1035
+ },
1036
+ {
1037
+ "epoch": 2.7123947051744883,
1038
+ "grad_norm": 14.465291976928711,
1039
+ "learning_rate": 6.176470588235295e-05,
1040
+ "loss": 5.5432,
1041
+ "step": 141
1042
+ },
1043
+ {
1044
+ "epoch": 2.7316486161251503,
1045
+ "grad_norm": 14.843960762023926,
1046
+ "learning_rate": 6.078431372549019e-05,
1047
+ "loss": 5.8858,
1048
+ "step": 142
1049
+ },
1050
+ {
1051
+ "epoch": 2.7509025270758123,
1052
+ "grad_norm": 8.04920768737793,
1053
+ "learning_rate": 5.980392156862745e-05,
1054
+ "loss": 5.8131,
1055
+ "step": 143
1056
+ },
1057
+ {
1058
+ "epoch": 2.7701564380264743,
1059
+ "grad_norm": 9.71105670928955,
1060
+ "learning_rate": 5.882352941176471e-05,
1061
+ "loss": 5.9374,
1062
+ "step": 144
1063
+ },
1064
+ {
1065
+ "epoch": 2.7894103489771362,
1066
+ "grad_norm": 5.949017524719238,
1067
+ "learning_rate": 5.784313725490197e-05,
1068
+ "loss": 6.4545,
1069
+ "step": 145
1070
+ },
1071
+ {
1072
+ "epoch": 2.808664259927798,
1073
+ "grad_norm": 7.233414649963379,
1074
+ "learning_rate": 5.6862745098039215e-05,
1075
+ "loss": 6.1215,
1076
+ "step": 146
1077
+ },
1078
+ {
1079
+ "epoch": 2.8279181708784598,
1080
+ "grad_norm": 9.445034980773926,
1081
+ "learning_rate": 5.588235294117647e-05,
1082
+ "loss": 5.7711,
1083
+ "step": 147
1084
+ },
1085
+ {
1086
+ "epoch": 2.8471720818291217,
1087
+ "grad_norm": 6.351881980895996,
1088
+ "learning_rate": 5.490196078431373e-05,
1089
+ "loss": 6.3073,
1090
+ "step": 148
1091
+ },
1092
+ {
1093
+ "epoch": 2.8664259927797833,
1094
+ "grad_norm": 5.955877304077148,
1095
+ "learning_rate": 5.392156862745098e-05,
1096
+ "loss": 6.2675,
1097
+ "step": 149
1098
+ },
1099
+ {
1100
+ "epoch": 2.8856799037304453,
1101
+ "grad_norm": 7.2687764167785645,
1102
+ "learning_rate": 5.294117647058824e-05,
1103
+ "loss": 6.2382,
1104
+ "step": 150
1105
+ },
1106
+ {
1107
+ "epoch": 2.8856799037304453,
1108
+ "eval_clap": 0.07656023651361465,
1109
+ "eval_loss": 6.118464469909668,
1110
+ "eval_runtime": 165.7635,
1111
+ "eval_samples_per_second": 0.097,
1112
+ "eval_steps_per_second": 0.097,
1113
+ "step": 150
1114
+ },
1115
+ {
1116
+ "epoch": 2.9049338146811072,
1117
+ "grad_norm": 7.581653594970703,
1118
+ "learning_rate": 5.1960784313725495e-05,
1119
+ "loss": 6.1951,
1120
+ "step": 151
1121
+ },
1122
+ {
1123
+ "epoch": 2.9241877256317688,
1124
+ "grad_norm": 5.309889793395996,
1125
+ "learning_rate": 5.0980392156862745e-05,
1126
+ "loss": 6.1416,
1127
+ "step": 152
1128
+ },
1129
+ {
1130
+ "epoch": 2.9434416365824307,
1131
+ "grad_norm": 10.804561614990234,
1132
+ "learning_rate": 5e-05,
1133
+ "loss": 6.4203,
1134
+ "step": 153
1135
+ },
1136
+ {
1137
+ "epoch": 2.9626955475330927,
1138
+ "grad_norm": 7.452890872955322,
1139
+ "learning_rate": 4.901960784313725e-05,
1140
+ "loss": 6.3695,
1141
+ "step": 154
1142
+ },
1143
+ {
1144
+ "epoch": 2.9819494584837543,
1145
+ "grad_norm": 7.373142719268799,
1146
+ "learning_rate": 4.803921568627452e-05,
1147
+ "loss": 6.0469,
1148
+ "step": 155
1149
+ },
1150
+ {
1151
+ "epoch": 3.0,
1152
+ "grad_norm": 6.503188610076904,
1153
+ "learning_rate": 4.705882352941177e-05,
1154
+ "loss": 5.5774,
1155
+ "step": 156
1156
+ },
1157
+ {
1158
+ "epoch": 3.019253910950662,
1159
+ "grad_norm": 6.571235656738281,
1160
+ "learning_rate": 4.607843137254902e-05,
1161
+ "loss": 6.3784,
1162
+ "step": 157
1163
+ },
1164
+ {
1165
+ "epoch": 3.0385078219013235,
1166
+ "grad_norm": 6.059790134429932,
1167
+ "learning_rate": 4.5098039215686275e-05,
1168
+ "loss": 6.2638,
1169
+ "step": 158
1170
+ },
1171
+ {
1172
+ "epoch": 3.0577617328519855,
1173
+ "grad_norm": 7.978560447692871,
1174
+ "learning_rate": 4.411764705882353e-05,
1175
+ "loss": 6.2388,
1176
+ "step": 159
1177
+ },
1178
+ {
1179
+ "epoch": 3.0770156438026475,
1180
+ "grad_norm": 4.5174479484558105,
1181
+ "learning_rate": 4.313725490196079e-05,
1182
+ "loss": 6.1811,
1183
+ "step": 160
1184
+ },
1185
+ {
1186
+ "epoch": 3.0962695547533094,
1187
+ "grad_norm": 16.497093200683594,
1188
+ "learning_rate": 4.215686274509804e-05,
1189
+ "loss": 5.8567,
1190
+ "step": 161
1191
+ },
1192
+ {
1193
+ "epoch": 3.115523465703971,
1194
+ "grad_norm": 10.036762237548828,
1195
+ "learning_rate": 4.11764705882353e-05,
1196
+ "loss": 5.7851,
1197
+ "step": 162
1198
+ },
1199
+ {
1200
+ "epoch": 3.134777376654633,
1201
+ "grad_norm": 8.312905311584473,
1202
+ "learning_rate": 4.0196078431372555e-05,
1203
+ "loss": 6.3701,
1204
+ "step": 163
1205
+ },
1206
+ {
1207
+ "epoch": 3.154031287605295,
1208
+ "grad_norm": 6.305182456970215,
1209
+ "learning_rate": 3.9215686274509805e-05,
1210
+ "loss": 6.2461,
1211
+ "step": 164
1212
+ },
1213
+ {
1214
+ "epoch": 3.1732851985559565,
1215
+ "grad_norm": 6.297240257263184,
1216
+ "learning_rate": 3.8235294117647055e-05,
1217
+ "loss": 6.1583,
1218
+ "step": 165
1219
+ },
1220
+ {
1221
+ "epoch": 3.1925391095066185,
1222
+ "grad_norm": 6.377700328826904,
1223
+ "learning_rate": 3.725490196078432e-05,
1224
+ "loss": 5.8368,
1225
+ "step": 166
1226
+ },
1227
+ {
1228
+ "epoch": 3.2117930204572804,
1229
+ "grad_norm": 6.20255708694458,
1230
+ "learning_rate": 3.627450980392157e-05,
1231
+ "loss": 6.1394,
1232
+ "step": 167
1233
+ },
1234
+ {
1235
+ "epoch": 3.2310469314079424,
1236
+ "grad_norm": 10.172269821166992,
1237
+ "learning_rate": 3.529411764705883e-05,
1238
+ "loss": 5.99,
1239
+ "step": 168
1240
+ },
1241
+ {
1242
+ "epoch": 3.250300842358604,
1243
+ "grad_norm": 12.56449031829834,
1244
+ "learning_rate": 3.431372549019608e-05,
1245
+ "loss": 6.2823,
1246
+ "step": 169
1247
+ },
1248
+ {
1249
+ "epoch": 3.269554753309266,
1250
+ "grad_norm": 6.517347812652588,
1251
+ "learning_rate": 3.3333333333333335e-05,
1252
+ "loss": 6.4417,
1253
+ "step": 170
1254
+ },
1255
+ {
1256
+ "epoch": 3.288808664259928,
1257
+ "grad_norm": 7.165337085723877,
1258
+ "learning_rate": 3.235294117647059e-05,
1259
+ "loss": 6.1048,
1260
+ "step": 171
1261
+ },
1262
+ {
1263
+ "epoch": 3.30806257521059,
1264
+ "grad_norm": 14.79480266571045,
1265
+ "learning_rate": 3.137254901960784e-05,
1266
+ "loss": 5.9012,
1267
+ "step": 172
1268
+ },
1269
+ {
1270
+ "epoch": 3.3273164861612514,
1271
+ "grad_norm": 10.55307388305664,
1272
+ "learning_rate": 3.0392156862745097e-05,
1273
+ "loss": 6.0419,
1274
+ "step": 173
1275
+ },
1276
+ {
1277
+ "epoch": 3.3465703971119134,
1278
+ "grad_norm": 7.354953289031982,
1279
+ "learning_rate": 2.9411764705882354e-05,
1280
+ "loss": 5.9871,
1281
+ "step": 174
1282
+ },
1283
+ {
1284
+ "epoch": 3.3658243080625754,
1285
+ "grad_norm": 7.013256549835205,
1286
+ "learning_rate": 2.8431372549019608e-05,
1287
+ "loss": 6.3169,
1288
+ "step": 175
1289
+ },
1290
+ {
1291
+ "epoch": 3.3658243080625754,
1292
+ "eval_clap": 0.09689466655254364,
1293
+ "eval_loss": 6.116217613220215,
1294
+ "eval_runtime": 165.7689,
1295
+ "eval_samples_per_second": 0.097,
1296
+ "eval_steps_per_second": 0.097,
1297
+ "step": 175
1298
+ },
1299
+ {
1300
+ "epoch": 3.385078219013237,
1301
+ "grad_norm": 8.007953643798828,
1302
+ "learning_rate": 2.7450980392156865e-05,
1303
+ "loss": 6.0573,
1304
+ "step": 176
1305
+ },
1306
+ {
1307
+ "epoch": 3.404332129963899,
1308
+ "grad_norm": 7.166982173919678,
1309
+ "learning_rate": 2.647058823529412e-05,
1310
+ "loss": 6.3097,
1311
+ "step": 177
1312
+ },
1313
+ {
1314
+ "epoch": 3.423586040914561,
1315
+ "grad_norm": 5.868830680847168,
1316
+ "learning_rate": 2.5490196078431373e-05,
1317
+ "loss": 6.1856,
1318
+ "step": 178
1319
+ },
1320
+ {
1321
+ "epoch": 3.4428399518652224,
1322
+ "grad_norm": 7.172518253326416,
1323
+ "learning_rate": 2.4509803921568626e-05,
1324
+ "loss": 6.284,
1325
+ "step": 179
1326
+ },
1327
+ {
1328
+ "epoch": 3.4620938628158844,
1329
+ "grad_norm": 5.972955226898193,
1330
+ "learning_rate": 2.3529411764705884e-05,
1331
+ "loss": 6.1067,
1332
+ "step": 180
1333
+ },
1334
+ {
1335
+ "epoch": 3.4813477737665464,
1336
+ "grad_norm": 5.716938495635986,
1337
+ "learning_rate": 2.2549019607843138e-05,
1338
+ "loss": 6.2792,
1339
+ "step": 181
1340
+ },
1341
+ {
1342
+ "epoch": 3.5006016847172083,
1343
+ "grad_norm": 5.647866249084473,
1344
+ "learning_rate": 2.1568627450980395e-05,
1345
+ "loss": 6.336,
1346
+ "step": 182
1347
+ },
1348
+ {
1349
+ "epoch": 3.51985559566787,
1350
+ "grad_norm": 7.596288204193115,
1351
+ "learning_rate": 2.058823529411765e-05,
1352
+ "loss": 6.1188,
1353
+ "step": 183
1354
+ },
1355
+ {
1356
+ "epoch": 3.539109506618532,
1357
+ "grad_norm": 9.767680168151855,
1358
+ "learning_rate": 1.9607843137254903e-05,
1359
+ "loss": 6.3607,
1360
+ "step": 184
1361
+ },
1362
+ {
1363
+ "epoch": 3.558363417569194,
1364
+ "grad_norm": 5.301209926605225,
1365
+ "learning_rate": 1.862745098039216e-05,
1366
+ "loss": 6.0671,
1367
+ "step": 185
1368
+ },
1369
+ {
1370
+ "epoch": 3.577617328519856,
1371
+ "grad_norm": 6.347781658172607,
1372
+ "learning_rate": 1.7647058823529414e-05,
1373
+ "loss": 6.1538,
1374
+ "step": 186
1375
+ },
1376
+ {
1377
+ "epoch": 3.5968712394705173,
1378
+ "grad_norm": 6.653684139251709,
1379
+ "learning_rate": 1.6666666666666667e-05,
1380
+ "loss": 6.1422,
1381
+ "step": 187
1382
+ },
1383
+ {
1384
+ "epoch": 3.6161251504211793,
1385
+ "grad_norm": 9.340754508972168,
1386
+ "learning_rate": 1.568627450980392e-05,
1387
+ "loss": 5.6681,
1388
+ "step": 188
1389
+ },
1390
+ {
1391
+ "epoch": 3.6353790613718413,
1392
+ "grad_norm": 6.159310340881348,
1393
+ "learning_rate": 1.4705882352941177e-05,
1394
+ "loss": 5.8408,
1395
+ "step": 189
1396
+ },
1397
+ {
1398
+ "epoch": 3.654632972322503,
1399
+ "grad_norm": 7.5495195388793945,
1400
+ "learning_rate": 1.3725490196078432e-05,
1401
+ "loss": 6.1853,
1402
+ "step": 190
1403
+ },
1404
+ {
1405
+ "epoch": 3.673886883273165,
1406
+ "grad_norm": 6.215287208557129,
1407
+ "learning_rate": 1.2745098039215686e-05,
1408
+ "loss": 6.082,
1409
+ "step": 191
1410
+ },
1411
+ {
1412
+ "epoch": 3.693140794223827,
1413
+ "grad_norm": 5.863905906677246,
1414
+ "learning_rate": 1.1764705882352942e-05,
1415
+ "loss": 6.0772,
1416
+ "step": 192
1417
+ },
1418
+ {
1419
+ "epoch": 3.7123947051744883,
1420
+ "grad_norm": 5.785052299499512,
1421
+ "learning_rate": 1.0784313725490197e-05,
1422
+ "loss": 6.2809,
1423
+ "step": 193
1424
+ },
1425
+ {
1426
+ "epoch": 3.7316486161251503,
1427
+ "grad_norm": 8.62579345703125,
1428
+ "learning_rate": 9.803921568627451e-06,
1429
+ "loss": 5.9173,
1430
+ "step": 194
1431
+ },
1432
+ {
1433
+ "epoch": 3.7509025270758123,
1434
+ "grad_norm": 8.095368385314941,
1435
+ "learning_rate": 8.823529411764707e-06,
1436
+ "loss": 6.2614,
1437
+ "step": 195
1438
+ },
1439
+ {
1440
+ "epoch": 3.7701564380264743,
1441
+ "grad_norm": 6.416041851043701,
1442
+ "learning_rate": 7.84313725490196e-06,
1443
+ "loss": 5.7276,
1444
+ "step": 196
1445
+ },
1446
+ {
1447
+ "epoch": 3.7894103489771362,
1448
+ "grad_norm": 6.0362868309021,
1449
+ "learning_rate": 6.862745098039216e-06,
1450
+ "loss": 6.1875,
1451
+ "step": 197
1452
+ },
1453
+ {
1454
+ "epoch": 3.808664259927798,
1455
+ "grad_norm": 6.641626834869385,
1456
+ "learning_rate": 5.882352941176471e-06,
1457
+ "loss": 6.0641,
1458
+ "step": 198
1459
+ },
1460
+ {
1461
+ "epoch": 3.8279181708784598,
1462
+ "grad_norm": 6.249925136566162,
1463
+ "learning_rate": 4.901960784313726e-06,
1464
+ "loss": 6.4255,
1465
+ "step": 199
1466
+ },
1467
+ {
1468
+ "epoch": 3.8471720818291217,
1469
+ "grad_norm": 7.856912136077881,
1470
+ "learning_rate": 3.92156862745098e-06,
1471
+ "loss": 5.7667,
1472
+ "step": 200
1473
+ },
1474
+ {
1475
+ "epoch": 3.8471720818291217,
1476
+ "eval_clap": 0.11432015895843506,
1477
+ "eval_loss": 6.130455017089844,
1478
+ "eval_runtime": 165.7823,
1479
+ "eval_samples_per_second": 0.097,
1480
+ "eval_steps_per_second": 0.097,
1481
+ "step": 200
1482
+ },
1483
+ {
1484
+ "epoch": 3.8664259927797833,
1485
+ "grad_norm": 8.209946632385254,
1486
+ "learning_rate": 2.9411764705882355e-06,
1487
+ "loss": 6.1598,
1488
+ "step": 201
1489
+ },
1490
+ {
1491
+ "epoch": 3.8856799037304453,
1492
+ "grad_norm": 7.541530609130859,
1493
+ "learning_rate": 1.96078431372549e-06,
1494
+ "loss": 5.7201,
1495
+ "step": 202
1496
+ },
1497
+ {
1498
+ "epoch": 3.9049338146811072,
1499
+ "grad_norm": 36.531105041503906,
1500
+ "learning_rate": 9.80392156862745e-07,
1501
+ "loss": 6.0873,
1502
+ "step": 203
1503
+ },
1504
+ {
1505
+ "epoch": 3.9241877256317688,
1506
+ "grad_norm": 6.220560073852539,
1507
+ "learning_rate": 0.0,
1508
+ "loss": 6.0892,
1509
+ "step": 204
1510
+ },
1511
+ {
1512
+ "epoch": 3.9241877256317688,
1513
+ "step": 204,
1514
+ "total_flos": 784195045500888.0,
1515
+ "train_loss": 6.39456293629665,
1516
+ "train_runtime": 14405.0011,
1517
+ "train_samples_per_second": 0.231,
1518
+ "train_steps_per_second": 0.014
1519
+ }
1520
+ ],
1521
+ "logging_steps": 1.0,
1522
+ "max_steps": 204,
1523
+ "num_input_tokens_seen": 0,
1524
+ "num_train_epochs": 4,
1525
+ "save_steps": 500,
1526
+ "stateful_callbacks": {
1527
+ "TrainerControl": {
1528
+ "args": {
1529
+ "should_epoch_stop": false,
1530
+ "should_evaluate": false,
1531
+ "should_log": false,
1532
+ "should_save": true,
1533
+ "should_training_stop": true
1534
+ },
1535
+ "attributes": {}
1536
+ }
1537
+ },
1538
+ "total_flos": 784195045500888.0,
1539
+ "train_batch_size": 1,
1540
+ "trial_name": null,
1541
+ "trial_params": null
1542
+ }