chansung commited on
Commit
f74a994
·
verified ·
1 Parent(s): 0f4576e

Model save

Browse files
README.md CHANGED
@@ -1,11 +1,10 @@
1
  ---
2
  base_model: meta-llama/Meta-Llama-3-8B
3
  datasets:
4
- - llama-duo/synth_classification_dataset_dedup
5
  library_name: peft
6
  license: llama3
7
  tags:
8
- - alignment-handbook
9
  - trl
10
  - sft
11
  - generated_from_trainer
@@ -19,9 +18,9 @@ should probably proofread and complete it, then remove this comment. -->
19
 
20
  # llama3-8b-classification-gpt4o-100k2
21
 
22
- This model is a fine-tuned version of [meta-llama/Meta-Llama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B) on the llama-duo/synth_classification_dataset_dedup dataset.
23
  It achieves the following results on the evaluation set:
24
- - Loss: 1.9855
25
 
26
  ## Model description
27
 
@@ -42,27 +41,23 @@ More information needed
42
  The following hyperparameters were used during training:
43
  - learning_rate: 0.001
44
  - train_batch_size: 4
45
- - eval_batch_size: 2
46
  - seed: 42
47
  - distributed_type: multi-GPU
48
  - num_devices: 4
49
  - gradient_accumulation_steps: 2
50
  - total_train_batch_size: 32
51
- - total_eval_batch_size: 8
52
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
53
- - lr_scheduler_type: cosine
54
  - lr_scheduler_warmup_ratio: 0.1
55
- - num_epochs: 5
56
 
57
  ### Training results
58
 
59
  | Training Loss | Epoch | Step | Validation Loss |
60
  |:-------------:|:------:|:----:|:---------------:|
61
- | 1.3936 | 0.9978 | 225 | 1.7919 |
62
- | 1.31 | 2.0 | 451 | 1.7869 |
63
- | 1.2416 | 2.9978 | 676 | 1.8242 |
64
- | 1.1447 | 4.0 | 902 | 1.9029 |
65
- | 1.098 | 4.9889 | 1125 | 1.9855 |
66
 
67
 
68
  ### Framework versions
 
1
  ---
2
  base_model: meta-llama/Meta-Llama-3-8B
3
  datasets:
4
+ - generator
5
  library_name: peft
6
  license: llama3
7
  tags:
 
8
  - trl
9
  - sft
10
  - generated_from_trainer
 
18
 
19
  # llama3-8b-classification-gpt4o-100k2
20
 
21
+ This model is a fine-tuned version of [meta-llama/Meta-Llama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B) on the generator dataset.
22
  It achieves the following results on the evaluation set:
23
+ - Loss: 1.7732
24
 
25
  ## Model description
26
 
 
41
  The following hyperparameters were used during training:
42
  - learning_rate: 0.001
43
  - train_batch_size: 4
44
+ - eval_batch_size: 4
45
  - seed: 42
46
  - distributed_type: multi-GPU
47
  - num_devices: 4
48
  - gradient_accumulation_steps: 2
49
  - total_train_batch_size: 32
50
+ - total_eval_batch_size: 16
51
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
52
+ - lr_scheduler_type: linear
53
  - lr_scheduler_warmup_ratio: 0.1
54
+ - num_epochs: 1
55
 
56
  ### Training results
57
 
58
  | Training Loss | Epoch | Step | Validation Loss |
59
  |:-------------:|:------:|:----:|:---------------:|
60
+ | 1.3767 | 0.9978 | 225 | 1.7732 |
 
 
 
 
61
 
62
 
63
  ### Framework versions
all_results.json CHANGED
@@ -1,14 +1,9 @@
1
  {
2
- "epoch": 4.988913525498892,
3
- "eval_loss": 1.9855170249938965,
4
- "eval_runtime": 0.3687,
5
- "eval_samples": 16,
6
- "eval_samples_per_second": 2.712,
7
- "eval_steps_per_second": 2.712,
8
- "total_flos": 1.6629843858229821e+18,
9
- "train_loss": 1.2764637470245361,
10
- "train_runtime": 3608.211,
11
  "train_samples": 92634,
12
- "train_samples_per_second": 9.984,
13
- "train_steps_per_second": 0.312
14
  }
 
1
  {
2
+ "epoch": 0.9977827050997783,
3
+ "total_flos": 3.3259687694984806e+17,
4
+ "train_loss": 1.4963340536753336,
5
+ "train_runtime": 725.2803,
 
 
 
 
 
6
  "train_samples": 92634,
7
+ "train_samples_per_second": 9.934,
8
+ "train_steps_per_second": 0.31
9
  }
runs/Aug08_10-30-50_main-soft-horse-1-0-0/events.out.tfevents.1723127707.main-soft-horse-1-0-0.543.0 CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:1806f203b23653e41128f0fac913b442a4cd2eeab34f57bb6280ab04b011f232
3
- size 16295
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e57dc67a583eec32755f46848d62b983159a805687c6c6c565915662456a2c0e
3
+ size 16920
train_results.json CHANGED
@@ -1,9 +1,9 @@
1
  {
2
- "epoch": 4.988913525498892,
3
- "total_flos": 1.6629843858229821e+18,
4
- "train_loss": 1.2764637470245361,
5
- "train_runtime": 3608.211,
6
  "train_samples": 92634,
7
- "train_samples_per_second": 9.984,
8
- "train_steps_per_second": 0.312
9
  }
 
1
  {
2
+ "epoch": 0.9977827050997783,
3
+ "total_flos": 3.3259687694984806e+17,
4
+ "train_loss": 1.4963340536753336,
5
+ "train_runtime": 725.2803,
6
  "train_samples": 92634,
7
+ "train_samples_per_second": 9.934,
8
+ "train_steps_per_second": 0.31
9
  }
trainer_state.json CHANGED
@@ -1,1649 +1,357 @@
1
  {
2
  "best_metric": null,
3
  "best_model_checkpoint": null,
4
- "epoch": 4.988913525498892,
5
  "eval_steps": 500,
6
- "global_step": 1125,
7
  "is_hyper_param_search": false,
8
  "is_local_process_zero": true,
9
  "is_world_process_zero": true,
10
  "log_history": [
11
  {
12
  "epoch": 0.004434589800443459,
13
- "grad_norm": 1.911939263343811,
14
- "learning_rate": 8.849557522123894e-06,
15
  "loss": 2.8127,
16
  "step": 1
17
  },
18
  {
19
  "epoch": 0.022172949002217297,
20
- "grad_norm": 2.071887731552124,
21
- "learning_rate": 4.424778761061947e-05,
22
- "loss": 2.8128,
23
  "step": 5
24
  },
25
  {
26
  "epoch": 0.04434589800443459,
27
- "grad_norm": 1.1280204057693481,
28
- "learning_rate": 8.849557522123894e-05,
29
- "loss": 2.5671,
30
  "step": 10
31
  },
32
  {
33
  "epoch": 0.06651884700665188,
34
- "grad_norm": 1.151498794555664,
35
- "learning_rate": 0.00013274336283185842,
36
- "loss": 2.3153,
37
  "step": 15
38
  },
39
  {
40
  "epoch": 0.08869179600886919,
41
- "grad_norm": 0.7351519465446472,
42
- "learning_rate": 0.00017699115044247788,
43
- "loss": 2.087,
44
  "step": 20
45
  },
46
  {
47
  "epoch": 0.11086474501108648,
48
- "grad_norm": 0.8526760339736938,
49
- "learning_rate": 0.00022123893805309737,
50
- "loss": 1.9278,
51
  "step": 25
52
  },
53
  {
54
  "epoch": 0.13303769401330376,
55
- "grad_norm": 0.4035344421863556,
56
- "learning_rate": 0.00026548672566371683,
57
- "loss": 1.8159,
58
  "step": 30
59
  },
60
  {
61
  "epoch": 0.15521064301552107,
62
- "grad_norm": 0.2873729169368744,
63
- "learning_rate": 0.00030973451327433627,
64
- "loss": 1.7511,
65
  "step": 35
66
  },
67
  {
68
  "epoch": 0.17738359201773837,
69
- "grad_norm": 0.2770038843154907,
70
- "learning_rate": 0.00035398230088495576,
71
- "loss": 1.7015,
72
  "step": 40
73
  },
74
  {
75
  "epoch": 0.19955654101995565,
76
- "grad_norm": 0.3265347480773926,
77
- "learning_rate": 0.00039823008849557525,
78
- "loss": 1.6655,
79
  "step": 45
80
  },
81
  {
82
  "epoch": 0.22172949002217296,
83
- "grad_norm": 0.5982062220573425,
84
- "learning_rate": 0.00044247787610619474,
85
- "loss": 1.6314,
86
  "step": 50
87
  },
88
  {
89
  "epoch": 0.24390243902439024,
90
- "grad_norm": 0.4608590602874756,
91
- "learning_rate": 0.0004867256637168142,
92
- "loss": 1.571,
93
  "step": 55
94
  },
95
  {
96
  "epoch": 0.2660753880266075,
97
- "grad_norm": 0.4388026297092438,
98
- "learning_rate": 0.0005309734513274337,
99
- "loss": 1.5656,
100
  "step": 60
101
  },
102
  {
103
  "epoch": 0.28824833702882485,
104
- "grad_norm": 0.3848969638347626,
105
- "learning_rate": 0.0005752212389380532,
106
- "loss": 1.5407,
107
  "step": 65
108
  },
109
  {
110
  "epoch": 0.31042128603104213,
111
- "grad_norm": 0.2577425241470337,
112
- "learning_rate": 0.0006194690265486725,
113
- "loss": 1.5097,
114
  "step": 70
115
  },
116
  {
117
  "epoch": 0.3325942350332594,
118
- "grad_norm": 0.38717567920684814,
119
- "learning_rate": 0.0006637168141592921,
120
- "loss": 1.5087,
121
  "step": 75
122
  },
123
  {
124
  "epoch": 0.35476718403547675,
125
- "grad_norm": 0.2715625464916229,
126
- "learning_rate": 0.0007079646017699115,
127
- "loss": 1.5023,
128
  "step": 80
129
  },
130
  {
131
  "epoch": 0.376940133037694,
132
- "grad_norm": 0.29234352707862854,
133
- "learning_rate": 0.0007522123893805309,
134
- "loss": 1.4785,
135
  "step": 85
136
  },
137
  {
138
  "epoch": 0.3991130820399113,
139
- "grad_norm": 0.28365185856819153,
140
- "learning_rate": 0.0007964601769911505,
141
- "loss": 1.4721,
142
  "step": 90
143
  },
144
  {
145
  "epoch": 0.4212860310421286,
146
- "grad_norm": 0.32057851552963257,
147
- "learning_rate": 0.0008407079646017699,
148
- "loss": 1.4595,
149
  "step": 95
150
  },
151
  {
152
  "epoch": 0.4434589800443459,
153
- "grad_norm": 0.30948102474212646,
154
- "learning_rate": 0.0008849557522123895,
155
- "loss": 1.4668,
156
  "step": 100
157
  },
158
  {
159
  "epoch": 0.4656319290465632,
160
- "grad_norm": 0.31636759638786316,
161
- "learning_rate": 0.0009292035398230089,
162
- "loss": 1.4544,
163
  "step": 105
164
  },
165
  {
166
  "epoch": 0.4878048780487805,
167
- "grad_norm": 0.24775166809558868,
168
- "learning_rate": 0.0009734513274336283,
169
- "loss": 1.4405,
170
  "step": 110
171
  },
172
  {
173
  "epoch": 0.5099778270509978,
174
- "grad_norm": 0.24200735986232758,
175
- "learning_rate": 0.0009999903631006022,
176
- "loss": 1.4508,
177
  "step": 115
178
  },
179
  {
180
  "epoch": 0.532150776053215,
181
- "grad_norm": 0.306389719247818,
182
- "learning_rate": 0.0009998819522485391,
183
- "loss": 1.4232,
184
  "step": 120
185
  },
186
  {
187
  "epoch": 0.5543237250554324,
188
- "grad_norm": 0.2549804449081421,
189
- "learning_rate": 0.0009996531106254026,
190
- "loss": 1.4249,
191
  "step": 125
192
  },
193
  {
194
  "epoch": 0.5764966740576497,
195
- "grad_norm": 0.24480336904525757,
196
- "learning_rate": 0.0009993038933633555,
197
- "loss": 1.4142,
198
  "step": 130
199
  },
200
  {
201
  "epoch": 0.5986696230598669,
202
- "grad_norm": 0.2402483969926834,
203
- "learning_rate": 0.0009988343845952696,
204
- "loss": 1.414,
205
  "step": 135
206
  },
207
  {
208
  "epoch": 0.6208425720620843,
209
- "grad_norm": 0.22881096601486206,
210
- "learning_rate": 0.000998244697434456,
211
- "loss": 1.4312,
212
  "step": 140
213
  },
214
  {
215
  "epoch": 0.6430155210643016,
216
- "grad_norm": 0.2914198637008667,
217
- "learning_rate": 0.0009975349739474153,
218
- "loss": 1.417,
219
  "step": 145
220
  },
221
  {
222
  "epoch": 0.6651884700665188,
223
- "grad_norm": 0.2460290491580963,
224
- "learning_rate": 0.0009967053851196099,
225
- "loss": 1.4257,
226
  "step": 150
227
  },
228
  {
229
  "epoch": 0.6873614190687362,
230
- "grad_norm": 0.22039610147476196,
231
- "learning_rate": 0.0009957561308142709,
232
- "loss": 1.4142,
233
  "step": 155
234
  },
235
  {
236
  "epoch": 0.7095343680709535,
237
- "grad_norm": 0.24831444025039673,
238
- "learning_rate": 0.0009946874397242474,
239
- "loss": 1.403,
240
  "step": 160
241
  },
242
  {
243
  "epoch": 0.7317073170731707,
244
- "grad_norm": 0.2765662372112274,
245
- "learning_rate": 0.0009934995693169104,
246
- "loss": 1.3855,
247
  "step": 165
248
  },
249
  {
250
  "epoch": 0.753880266075388,
251
- "grad_norm": 0.26668626070022583,
252
- "learning_rate": 0.0009921928057721242,
253
- "loss": 1.3983,
254
  "step": 170
255
  },
256
  {
257
  "epoch": 0.7760532150776053,
258
- "grad_norm": 0.305702805519104,
259
- "learning_rate": 0.0009907674639132995,
260
- "loss": 1.4083,
261
  "step": 175
262
  },
263
  {
264
  "epoch": 0.7982261640798226,
265
- "grad_norm": 0.3682064414024353,
266
- "learning_rate": 0.0009892238871315475,
267
- "loss": 1.4011,
268
  "step": 180
269
  },
270
  {
271
  "epoch": 0.8203991130820399,
272
- "grad_norm": 0.2574247419834137,
273
- "learning_rate": 0.0009875624473029507,
274
- "loss": 1.3962,
275
  "step": 185
276
  },
277
  {
278
  "epoch": 0.8425720620842572,
279
- "grad_norm": 0.42513400316238403,
280
- "learning_rate": 0.0009857835446989707,
281
- "loss": 1.3924,
282
  "step": 190
283
  },
284
  {
285
  "epoch": 0.8647450110864745,
286
- "grad_norm": 0.3164115250110626,
287
- "learning_rate": 0.0009838876078900156,
288
- "loss": 1.3804,
289
  "step": 195
290
  },
291
  {
292
  "epoch": 0.8869179600886918,
293
- "grad_norm": 0.2795542776584625,
294
- "learning_rate": 0.0009818750936421894,
295
- "loss": 1.3943,
296
  "step": 200
297
  },
298
  {
299
  "epoch": 0.9090909090909091,
300
- "grad_norm": 0.2589350938796997,
301
- "learning_rate": 0.0009797464868072487,
302
- "loss": 1.384,
303
  "step": 205
304
  },
305
  {
306
  "epoch": 0.9312638580931264,
307
- "grad_norm": 0.9124072790145874,
308
- "learning_rate": 0.000977502300205793,
309
- "loss": 1.3689,
310
  "step": 210
311
  },
312
  {
313
  "epoch": 0.9534368070953437,
314
- "grad_norm": 0.3424608111381531,
315
- "learning_rate": 0.0009751430745037169,
316
- "loss": 1.3664,
317
  "step": 215
318
  },
319
  {
320
  "epoch": 0.975609756097561,
321
- "grad_norm": 0.25026756525039673,
322
- "learning_rate": 0.0009726693780819534,
323
- "loss": 1.3913,
324
  "step": 220
325
  },
326
  {
327
  "epoch": 0.9977827050997783,
328
- "grad_norm": 0.4633461833000183,
329
- "learning_rate": 0.0009700818068995407,
330
- "loss": 1.3936,
331
  "step": 225
332
  },
333
  {
334
  "epoch": 0.9977827050997783,
335
- "eval_loss": 1.791922926902771,
336
- "eval_runtime": 0.3732,
337
- "eval_samples_per_second": 2.679,
338
- "eval_steps_per_second": 2.679,
339
  "step": 225
340
  },
341
  {
342
- "epoch": 1.0199556541019956,
343
- "grad_norm": 0.2957637310028076,
344
- "learning_rate": 0.0009673809843500446,
345
- "loss": 1.3285,
346
- "step": 230
347
- },
348
- {
349
- "epoch": 1.042128603104213,
350
- "grad_norm": 0.29558026790618896,
351
- "learning_rate": 0.0009645675611113716,
352
- "loss": 1.333,
353
- "step": 235
354
- },
355
- {
356
- "epoch": 1.06430155210643,
357
- "grad_norm": 0.30888795852661133,
358
- "learning_rate": 0.0009616422149890085,
359
- "loss": 1.3411,
360
- "step": 240
361
- },
362
- {
363
- "epoch": 1.0864745011086474,
364
- "grad_norm": 2.1240732669830322,
365
- "learning_rate": 0.0009586056507527265,
366
- "loss": 1.3353,
367
- "step": 245
368
- },
369
- {
370
- "epoch": 1.1086474501108647,
371
- "grad_norm": 0.4370739758014679,
372
- "learning_rate": 0.0009554585999667896,
373
- "loss": 1.3688,
374
- "step": 250
375
- },
376
- {
377
- "epoch": 1.130820399113082,
378
- "grad_norm": 0.5616880059242249,
379
- "learning_rate": 0.0009522018208137067,
380
- "loss": 1.3541,
381
- "step": 255
382
- },
383
- {
384
- "epoch": 1.1529933481152994,
385
- "grad_norm": 0.31425368785858154,
386
- "learning_rate": 0.0009488360979115719,
387
- "loss": 1.3366,
388
- "step": 260
389
- },
390
- {
391
- "epoch": 1.1751662971175167,
392
- "grad_norm": 0.2877110242843628,
393
- "learning_rate": 0.0009453622421250352,
394
- "loss": 1.3446,
395
- "step": 265
396
- },
397
- {
398
- "epoch": 1.1973392461197339,
399
- "grad_norm": 0.32708439230918884,
400
- "learning_rate": 0.0009417810903699507,
401
- "loss": 1.3453,
402
- "step": 270
403
- },
404
- {
405
- "epoch": 1.2195121951219512,
406
- "grad_norm": 0.24861423671245575,
407
- "learning_rate": 0.000938093505411748,
408
- "loss": 1.3226,
409
- "step": 275
410
- },
411
- {
412
- "epoch": 1.2416851441241685,
413
- "grad_norm": 0.2731972634792328,
414
- "learning_rate": 0.0009343003756575757,
415
- "loss": 1.3322,
416
- "step": 280
417
- },
418
- {
419
- "epoch": 1.2638580931263859,
420
- "grad_norm": 0.3049946129322052,
421
- "learning_rate": 0.000930402614942268,
422
- "loss": 1.3487,
423
- "step": 285
424
- },
425
- {
426
- "epoch": 1.2860310421286032,
427
- "grad_norm": 0.27978989481925964,
428
- "learning_rate": 0.0009264011623081859,
429
- "loss": 1.3274,
430
- "step": 290
431
- },
432
- {
433
- "epoch": 1.3082039911308203,
434
- "grad_norm": 0.9313792586326599,
435
- "learning_rate": 0.0009222969817789828,
436
- "loss": 1.3204,
437
- "step": 295
438
- },
439
- {
440
- "epoch": 1.3303769401330376,
441
- "grad_norm": 0.2632613778114319,
442
- "learning_rate": 0.0009180910621273555,
443
- "loss": 1.3382,
444
- "step": 300
445
- },
446
- {
447
- "epoch": 1.352549889135255,
448
- "grad_norm": 0.35932329297065735,
449
- "learning_rate": 0.0009137844166368287,
450
- "loss": 1.3397,
451
- "step": 305
452
- },
453
- {
454
- "epoch": 1.3747228381374723,
455
- "grad_norm": 0.31215277314186096,
456
- "learning_rate": 0.0009093780828576379,
457
- "loss": 1.3241,
458
- "step": 310
459
- },
460
- {
461
- "epoch": 1.3968957871396896,
462
- "grad_norm": 1.306994080543518,
463
- "learning_rate": 0.0009048731223567636,
464
- "loss": 1.3411,
465
- "step": 315
466
- },
467
- {
468
- "epoch": 1.4190687361419068,
469
- "grad_norm": 0.7964722514152527,
470
- "learning_rate": 0.0009002706204621802,
471
- "loss": 1.3433,
472
- "step": 320
473
- },
474
- {
475
- "epoch": 1.441241685144124,
476
- "grad_norm": 0.32706671953201294,
477
- "learning_rate": 0.0008955716860013812,
478
- "loss": 1.3231,
479
- "step": 325
480
- },
481
- {
482
- "epoch": 1.4634146341463414,
483
- "grad_norm": 0.2626619338989258,
484
- "learning_rate": 0.0008907774510342412,
485
- "loss": 1.333,
486
- "step": 330
487
- },
488
- {
489
- "epoch": 1.4855875831485588,
490
- "grad_norm": 0.2566235363483429,
491
- "learning_rate": 0.0008858890705802829,
492
- "loss": 1.3289,
493
- "step": 335
494
- },
495
- {
496
- "epoch": 1.507760532150776,
497
- "grad_norm": 0.22669324278831482,
498
- "learning_rate": 0.0008809077223404109,
499
- "loss": 1.3345,
500
- "step": 340
501
- },
502
- {
503
- "epoch": 1.5299334811529932,
504
- "grad_norm": 0.24277335405349731,
505
- "learning_rate": 0.0008758346064131824,
506
- "loss": 1.3258,
507
- "step": 345
508
- },
509
- {
510
- "epoch": 1.5521064301552108,
511
- "grad_norm": 0.23311229050159454,
512
- "learning_rate": 0.0008706709450056802,
513
- "loss": 1.3208,
514
- "step": 350
515
- },
516
- {
517
- "epoch": 1.5742793791574279,
518
- "grad_norm": 0.3494793176651001,
519
- "learning_rate": 0.0008654179821390621,
520
- "loss": 1.3253,
521
- "step": 355
522
- },
523
- {
524
- "epoch": 1.5964523281596452,
525
- "grad_norm": 0.22604632377624512,
526
- "learning_rate": 0.0008600769833488522,
527
- "loss": 1.3244,
528
- "step": 360
529
- },
530
- {
531
- "epoch": 1.6186252771618626,
532
- "grad_norm": 0.26579421758651733,
533
- "learning_rate": 0.0008546492353800504,
534
- "loss": 1.3256,
535
- "step": 365
536
- },
537
- {
538
- "epoch": 1.6407982261640797,
539
- "grad_norm": 0.2301110476255417,
540
- "learning_rate": 0.000849136045877132,
541
- "loss": 1.3383,
542
- "step": 370
543
- },
544
- {
545
- "epoch": 1.6629711751662972,
546
- "grad_norm": 0.22868593037128448,
547
- "learning_rate": 0.0008435387430690114,
548
- "loss": 1.3194,
549
- "step": 375
550
- },
551
- {
552
- "epoch": 1.6851441241685143,
553
- "grad_norm": 0.28906944394111633,
554
- "learning_rate": 0.0008378586754490483,
555
- "loss": 1.3196,
556
- "step": 380
557
- },
558
- {
559
- "epoch": 1.7073170731707317,
560
- "grad_norm": 0.22416484355926514,
561
- "learning_rate": 0.0008320972114501697,
562
- "loss": 1.3281,
563
- "step": 385
564
- },
565
- {
566
- "epoch": 1.729490022172949,
567
- "grad_norm": 0.2212500125169754,
568
- "learning_rate": 0.0008262557391151904,
569
- "loss": 1.3166,
570
- "step": 390
571
- },
572
- {
573
- "epoch": 1.7516629711751663,
574
- "grad_norm": 0.24597449600696564,
575
- "learning_rate": 0.0008203356657624068,
576
- "loss": 1.3147,
577
- "step": 395
578
- },
579
- {
580
- "epoch": 1.7738359201773837,
581
- "grad_norm": 0.20681482553482056,
582
- "learning_rate": 0.0008143384176465486,
583
- "loss": 1.3207,
584
- "step": 400
585
- },
586
- {
587
- "epoch": 1.7960088691796008,
588
- "grad_norm": 0.22514577209949493,
589
- "learning_rate": 0.0008082654396151675,
590
- "loss": 1.3256,
591
- "step": 405
592
- },
593
- {
594
- "epoch": 1.8181818181818183,
595
- "grad_norm": 0.2058994621038437,
596
- "learning_rate": 0.0008021181947605473,
597
- "loss": 1.3051,
598
- "step": 410
599
- },
600
- {
601
- "epoch": 1.8403547671840355,
602
- "grad_norm": 0.24868044257164001,
603
- "learning_rate": 0.0007958981640672172,
604
- "loss": 1.3133,
605
- "step": 415
606
- },
607
- {
608
- "epoch": 1.8625277161862528,
609
- "grad_norm": 0.219014972448349,
610
- "learning_rate": 0.0007896068460551562,
611
- "loss": 1.3016,
612
- "step": 420
613
- },
614
- {
615
- "epoch": 1.8847006651884701,
616
- "grad_norm": 0.2263939380645752,
617
- "learning_rate": 0.0007832457564187715,
618
- "loss": 1.3269,
619
- "step": 425
620
- },
621
- {
622
- "epoch": 1.9068736141906872,
623
- "grad_norm": 0.22888123989105225,
624
- "learning_rate": 0.0007768164276617396,
625
- "loss": 1.3297,
626
- "step": 430
627
- },
628
- {
629
- "epoch": 1.9290465631929048,
630
- "grad_norm": 0.21448783576488495,
631
- "learning_rate": 0.0007703204087277988,
632
- "loss": 1.3042,
633
- "step": 435
634
- },
635
- {
636
- "epoch": 1.951219512195122,
637
- "grad_norm": 0.217611163854599,
638
- "learning_rate": 0.0007637592646275793,
639
- "loss": 1.3171,
640
- "step": 440
641
- },
642
- {
643
- "epoch": 1.9733924611973392,
644
- "grad_norm": 0.2066190540790558,
645
- "learning_rate": 0.0007571345760615634,
646
- "loss": 1.3131,
647
- "step": 445
648
- },
649
- {
650
- "epoch": 1.9955654101995566,
651
- "grad_norm": 0.2094324827194214,
652
- "learning_rate": 0.0007504479390392661,
653
- "loss": 1.31,
654
- "step": 450
655
- },
656
- {
657
- "epoch": 2.0,
658
- "eval_loss": 1.7869027853012085,
659
- "eval_runtime": 0.3369,
660
- "eval_samples_per_second": 2.969,
661
- "eval_steps_per_second": 2.969,
662
- "step": 451
663
- },
664
- {
665
- "epoch": 2.0177383592017737,
666
- "grad_norm": 0.2380608320236206,
667
- "learning_rate": 0.0007437009644947268,
668
- "loss": 1.2636,
669
- "step": 455
670
- },
671
- {
672
- "epoch": 2.0399113082039912,
673
- "grad_norm": 0.226850226521492,
674
- "learning_rate": 0.0007368952778984051,
675
- "loss": 1.2298,
676
- "step": 460
677
- },
678
- {
679
- "epoch": 2.0620842572062084,
680
- "grad_norm": 0.23007981479167938,
681
- "learning_rate": 0.0007300325188655761,
682
- "loss": 1.2459,
683
- "step": 465
684
- },
685
- {
686
- "epoch": 2.084257206208426,
687
- "grad_norm": 0.21383607387542725,
688
- "learning_rate": 0.0007231143407613156,
689
- "loss": 1.2328,
690
- "step": 470
691
- },
692
- {
693
- "epoch": 2.106430155210643,
694
- "grad_norm": 0.22145338356494904,
695
- "learning_rate": 0.0007161424103021752,
696
- "loss": 1.2326,
697
- "step": 475
698
- },
699
- {
700
- "epoch": 2.12860310421286,
701
- "grad_norm": 0.26510247588157654,
702
- "learning_rate": 0.0007091184071546384,
703
- "loss": 1.2377,
704
- "step": 480
705
- },
706
- {
707
- "epoch": 2.1507760532150777,
708
- "grad_norm": 0.23144353926181793,
709
- "learning_rate": 0.0007020440235304592,
710
- "loss": 1.2195,
711
- "step": 485
712
- },
713
- {
714
- "epoch": 2.172949002217295,
715
- "grad_norm": 0.22692181169986725,
716
- "learning_rate": 0.000694920963778976,
717
- "loss": 1.2181,
718
- "step": 490
719
- },
720
- {
721
- "epoch": 2.1951219512195124,
722
- "grad_norm": 0.2270960956811905,
723
- "learning_rate": 0.0006877509439765037,
724
- "loss": 1.2444,
725
- "step": 495
726
- },
727
- {
728
- "epoch": 2.2172949002217295,
729
- "grad_norm": 0.22735467553138733,
730
- "learning_rate": 0.0006805356915128977,
731
- "loss": 1.2385,
732
- "step": 500
733
- },
734
- {
735
- "epoch": 2.2394678492239466,
736
- "grad_norm": 0.21610836684703827,
737
- "learning_rate": 0.0006732769446753953,
738
- "loss": 1.2476,
739
- "step": 505
740
- },
741
- {
742
- "epoch": 2.261640798226164,
743
- "grad_norm": 0.4567781686782837,
744
- "learning_rate": 0.0006659764522298296,
745
- "loss": 1.2411,
746
- "step": 510
747
- },
748
- {
749
- "epoch": 2.2838137472283813,
750
- "grad_norm": 0.23705683648586273,
751
- "learning_rate": 0.0006586359729993199,
752
- "loss": 1.2334,
753
- "step": 515
754
- },
755
- {
756
- "epoch": 2.305986696230599,
757
- "grad_norm": 0.22446921467781067,
758
- "learning_rate": 0.0006512572754405379,
759
- "loss": 1.2342,
760
- "step": 520
761
- },
762
- {
763
- "epoch": 2.328159645232816,
764
- "grad_norm": 0.2318691909313202,
765
- "learning_rate": 0.0006438421372176556,
766
- "loss": 1.2379,
767
- "step": 525
768
- },
769
- {
770
- "epoch": 2.3503325942350335,
771
- "grad_norm": 0.2584494650363922,
772
- "learning_rate": 0.0006363923447740718,
773
- "loss": 1.2371,
774
- "step": 530
775
- },
776
- {
777
- "epoch": 2.3725055432372506,
778
- "grad_norm": 0.2136611044406891,
779
- "learning_rate": 0.0006289096929020253,
780
- "loss": 1.2497,
781
- "step": 535
782
- },
783
- {
784
- "epoch": 2.3946784922394677,
785
- "grad_norm": 0.22401590645313263,
786
- "learning_rate": 0.000621395984310197,
787
- "loss": 1.2508,
788
- "step": 540
789
- },
790
- {
791
- "epoch": 2.4168514412416853,
792
- "grad_norm": 0.21865229308605194,
793
- "learning_rate": 0.0006138530291894032,
794
- "loss": 1.2494,
795
- "step": 545
796
- },
797
- {
798
- "epoch": 2.4390243902439024,
799
- "grad_norm": 0.23078739643096924,
800
- "learning_rate": 0.0006062826447764884,
801
- "loss": 1.2474,
802
- "step": 550
803
- },
804
- {
805
- "epoch": 2.4611973392461195,
806
- "grad_norm": 0.24809886515140533,
807
- "learning_rate": 0.0005986866549165184,
808
- "loss": 1.2472,
809
- "step": 555
810
- },
811
- {
812
- "epoch": 2.483370288248337,
813
- "grad_norm": 0.23055674135684967,
814
- "learning_rate": 0.000591066889623383,
815
- "loss": 1.23,
816
- "step": 560
817
- },
818
- {
819
- "epoch": 2.505543237250554,
820
- "grad_norm": 0.22716328501701355,
821
- "learning_rate": 0.000583425184638912,
822
- "loss": 1.2495,
823
- "step": 565
824
- },
825
- {
826
- "epoch": 2.5277161862527717,
827
- "grad_norm": 0.21761415898799896,
828
- "learning_rate": 0.0005757633809906107,
829
- "loss": 1.2448,
830
- "step": 570
831
- },
832
- {
833
- "epoch": 2.549889135254989,
834
- "grad_norm": 0.2366757094860077,
835
- "learning_rate": 0.0005680833245481234,
836
- "loss": 1.2481,
837
- "step": 575
838
- },
839
- {
840
- "epoch": 2.5720620842572064,
841
- "grad_norm": 0.23113702237606049,
842
- "learning_rate": 0.0005603868655785279,
843
- "loss": 1.2422,
844
- "step": 580
845
- },
846
- {
847
- "epoch": 2.5942350332594235,
848
- "grad_norm": 0.2284475862979889,
849
- "learning_rate": 0.0005526758583005735,
850
- "loss": 1.2354,
851
- "step": 585
852
- },
853
- {
854
- "epoch": 2.6164079822616406,
855
- "grad_norm": 0.23871304094791412,
856
- "learning_rate": 0.0005449521604379652,
857
- "loss": 1.2573,
858
- "step": 590
859
- },
860
- {
861
- "epoch": 2.638580931263858,
862
- "grad_norm": 0.2440497726202011,
863
- "learning_rate": 0.0005372176327718029,
864
- "loss": 1.2634,
865
- "step": 595
866
- },
867
- {
868
- "epoch": 2.6607538802660753,
869
- "grad_norm": 0.23553554713726044,
870
- "learning_rate": 0.0005294741386922863,
871
- "loss": 1.2494,
872
- "step": 600
873
- },
874
- {
875
- "epoch": 2.682926829268293,
876
- "grad_norm": 0.22893203794956207,
877
- "learning_rate": 0.000521723543749789,
878
- "loss": 1.2317,
879
- "step": 605
880
- },
881
- {
882
- "epoch": 2.70509977827051,
883
- "grad_norm": 0.21567869186401367,
884
- "learning_rate": 0.0005139677152054135,
885
- "loss": 1.2267,
886
- "step": 610
887
- },
888
- {
889
- "epoch": 2.7272727272727275,
890
- "grad_norm": 0.21539655327796936,
891
- "learning_rate": 0.000506208521581133,
892
- "loss": 1.2314,
893
- "step": 615
894
- },
895
- {
896
- "epoch": 2.7494456762749446,
897
- "grad_norm": 0.263122022151947,
898
- "learning_rate": 0.0004984478322096308,
899
- "loss": 1.2504,
900
- "step": 620
901
- },
902
- {
903
- "epoch": 2.7716186252771617,
904
- "grad_norm": 0.21988996863365173,
905
- "learning_rate": 0.0004906875167839433,
906
- "loss": 1.2424,
907
- "step": 625
908
- },
909
- {
910
- "epoch": 2.7937915742793793,
911
- "grad_norm": 0.22908321022987366,
912
- "learning_rate": 0.00048292944490701606,
913
- "loss": 1.2374,
914
- "step": 630
915
- },
916
- {
917
- "epoch": 2.8159645232815964,
918
- "grad_norm": 0.21693512797355652,
919
- "learning_rate": 0.00047517548564128293,
920
- "loss": 1.2368,
921
- "step": 635
922
- },
923
- {
924
- "epoch": 2.8381374722838135,
925
- "grad_norm": 0.22369663417339325,
926
- "learning_rate": 0.00046742750705837356,
927
- "loss": 1.2227,
928
- "step": 640
929
- },
930
- {
931
- "epoch": 2.860310421286031,
932
- "grad_norm": 0.21551957726478577,
933
- "learning_rate": 0.0004596873757890612,
934
- "loss": 1.2353,
935
- "step": 645
936
- },
937
- {
938
- "epoch": 2.882483370288248,
939
- "grad_norm": 0.21659965813159943,
940
- "learning_rate": 0.00045195695657355636,
941
- "loss": 1.242,
942
- "step": 650
943
- },
944
- {
945
- "epoch": 2.9046563192904657,
946
- "grad_norm": 0.2131386399269104,
947
- "learning_rate": 0.00044423811181225727,
948
- "loss": 1.2372,
949
- "step": 655
950
- },
951
- {
952
- "epoch": 2.926829268292683,
953
- "grad_norm": 0.21243086457252502,
954
- "learning_rate": 0.0004365327011170628,
955
- "loss": 1.2492,
956
- "step": 660
957
- },
958
- {
959
- "epoch": 2.9490022172949004,
960
- "grad_norm": 0.2212466597557068,
961
- "learning_rate": 0.0004288425808633575,
962
- "loss": 1.2548,
963
- "step": 665
964
- },
965
- {
966
- "epoch": 2.9711751662971175,
967
- "grad_norm": 0.21596314013004303,
968
- "learning_rate": 0.0004211696037427772,
969
- "loss": 1.231,
970
- "step": 670
971
- },
972
- {
973
- "epoch": 2.9933481152993346,
974
- "grad_norm": 0.20963598787784576,
975
- "learning_rate": 0.0004135156183168613,
976
- "loss": 1.2416,
977
- "step": 675
978
- },
979
- {
980
- "epoch": 2.9977827050997785,
981
- "eval_loss": 1.824194312095642,
982
- "eval_runtime": 0.3515,
983
- "eval_samples_per_second": 2.845,
984
- "eval_steps_per_second": 2.845,
985
- "step": 676
986
- },
987
- {
988
- "epoch": 3.015521064301552,
989
- "grad_norm": 0.23751798272132874,
990
- "learning_rate": 0.0004058824685716997,
991
- "loss": 1.1799,
992
- "step": 680
993
- },
994
- {
995
- "epoch": 3.0376940133037693,
996
- "grad_norm": 0.22644291818141937,
997
- "learning_rate": 0.0003982719934736832,
998
- "loss": 1.1569,
999
- "step": 685
1000
- },
1001
- {
1002
- "epoch": 3.059866962305987,
1003
- "grad_norm": 0.24408124387264252,
1004
- "learning_rate": 0.0003906860265264622,
1005
- "loss": 1.1543,
1006
- "step": 690
1007
- },
1008
- {
1009
- "epoch": 3.082039911308204,
1010
- "grad_norm": 0.23074336349964142,
1011
- "learning_rate": 0.00038312639532922245,
1012
- "loss": 1.1437,
1013
- "step": 695
1014
- },
1015
- {
1016
- "epoch": 3.104212860310421,
1017
- "grad_norm": 0.2394222617149353,
1018
- "learning_rate": 0.00037559492113638205,
1019
- "loss": 1.1394,
1020
- "step": 700
1021
- },
1022
- {
1023
- "epoch": 3.1263858093126387,
1024
- "grad_norm": 0.23385775089263916,
1025
- "learning_rate": 0.00036809341841881815,
1026
- "loss": 1.1507,
1027
- "step": 705
1028
- },
1029
- {
1030
- "epoch": 3.1485587583148558,
1031
- "grad_norm": 0.23921357095241547,
1032
- "learning_rate": 0.00036062369442672724,
1033
- "loss": 1.1554,
1034
- "step": 710
1035
- },
1036
- {
1037
- "epoch": 3.1707317073170733,
1038
- "grad_norm": 0.2213825136423111,
1039
- "learning_rate": 0.00035318754875422585,
1040
- "loss": 1.1523,
1041
- "step": 715
1042
- },
1043
- {
1044
- "epoch": 3.1929046563192904,
1045
- "grad_norm": 0.24610434472560883,
1046
- "learning_rate": 0.0003457867729057942,
1047
- "loss": 1.1484,
1048
- "step": 720
1049
- },
1050
- {
1051
- "epoch": 3.2150776053215075,
1052
- "grad_norm": 0.23285223543643951,
1053
- "learning_rate": 0.0003384231498646706,
1054
- "loss": 1.1599,
1055
- "step": 725
1056
- },
1057
- {
1058
- "epoch": 3.237250554323725,
1059
- "grad_norm": 0.23643791675567627,
1060
- "learning_rate": 0.0003310984536632975,
1061
- "loss": 1.1655,
1062
- "step": 730
1063
- },
1064
- {
1065
- "epoch": 3.259423503325942,
1066
- "grad_norm": 0.23265090584754944,
1067
- "learning_rate": 0.0003238144489559248,
1068
- "loss": 1.1772,
1069
- "step": 735
1070
- },
1071
- {
1072
- "epoch": 3.2815964523281598,
1073
- "grad_norm": 0.23318685591220856,
1074
- "learning_rate": 0.00031657289059347184,
1075
- "loss": 1.1607,
1076
- "step": 740
1077
- },
1078
- {
1079
- "epoch": 3.303769401330377,
1080
- "grad_norm": 0.23179373145103455,
1081
- "learning_rate": 0.00030937552320075114,
1082
- "loss": 1.1541,
1083
- "step": 745
1084
- },
1085
- {
1086
- "epoch": 3.3259423503325944,
1087
- "grad_norm": 0.22854293882846832,
1088
- "learning_rate": 0.0003022240807561569,
1089
- "loss": 1.1573,
1090
- "step": 750
1091
- },
1092
- {
1093
- "epoch": 3.3481152993348116,
1094
- "grad_norm": 0.24048267304897308,
1095
- "learning_rate": 0.0002951202861739173,
1096
- "loss": 1.134,
1097
- "step": 755
1098
- },
1099
- {
1100
- "epoch": 3.3702882483370287,
1101
- "grad_norm": 0.24298301339149475,
1102
- "learning_rate": 0.0002880658508890125,
1103
- "loss": 1.1615,
1104
- "step": 760
1105
- },
1106
- {
1107
- "epoch": 3.3924611973392462,
1108
- "grad_norm": 0.24603115022182465,
1109
- "learning_rate": 0.0002810624744448588,
1110
- "loss": 1.1458,
1111
- "step": 765
1112
- },
1113
- {
1114
- "epoch": 3.4146341463414633,
1115
- "grad_norm": 0.2374211847782135,
1116
- "learning_rate": 0.000274111844083857,
1117
- "loss": 1.1604,
1118
- "step": 770
1119
- },
1120
- {
1121
- "epoch": 3.436807095343681,
1122
- "grad_norm": 0.23238298296928406,
1123
- "learning_rate": 0.0002672156343409053,
1124
- "loss": 1.1525,
1125
- "step": 775
1126
- },
1127
- {
1128
- "epoch": 3.458980044345898,
1129
- "grad_norm": 0.24065588414669037,
1130
- "learning_rate": 0.00026037550663997176,
1131
- "loss": 1.1626,
1132
- "step": 780
1133
- },
1134
- {
1135
- "epoch": 3.481152993348115,
1136
- "grad_norm": 0.23232793807983398,
1137
- "learning_rate": 0.00025359310889382737,
1138
- "loss": 1.1567,
1139
- "step": 785
1140
- },
1141
- {
1142
- "epoch": 3.5033259423503327,
1143
- "grad_norm": 0.31952565908432007,
1144
- "learning_rate": 0.0002468700751070346,
1145
- "loss": 1.1441,
1146
- "step": 790
1147
- },
1148
- {
1149
- "epoch": 3.52549889135255,
1150
- "grad_norm": 0.25355201959609985,
1151
- "learning_rate": 0.00024020802498228333,
1152
- "loss": 1.1578,
1153
- "step": 795
1154
- },
1155
- {
1156
- "epoch": 3.5476718403547673,
1157
- "grad_norm": 0.23360273241996765,
1158
- "learning_rate": 0.00023360856353017617,
1159
- "loss": 1.1624,
1160
- "step": 800
1161
- },
1162
- {
1163
- "epoch": 3.5698447893569845,
1164
- "grad_norm": 0.24496670067310333,
1165
- "learning_rate": 0.00022707328068255166,
1166
- "loss": 1.1608,
1167
- "step": 805
1168
- },
1169
- {
1170
- "epoch": 3.5920177383592016,
1171
- "grad_norm": 0.23215603828430176,
1172
- "learning_rate": 0.00022060375090944025,
1173
- "loss": 1.1561,
1174
- "step": 810
1175
- },
1176
- {
1177
- "epoch": 3.614190687361419,
1178
- "grad_norm": 0.24374577403068542,
1179
- "learning_rate": 0.00021420153283974535,
1180
- "loss": 1.1582,
1181
- "step": 815
1182
- },
1183
- {
1184
- "epoch": 3.6363636363636362,
1185
- "grad_norm": 0.23796550929546356,
1186
- "learning_rate": 0.00020786816888574095,
1187
- "loss": 1.1714,
1188
- "step": 820
1189
- },
1190
- {
1191
- "epoch": 3.658536585365854,
1192
- "grad_norm": 0.33616843819618225,
1193
- "learning_rate": 0.00020160518487147579,
1194
- "loss": 1.155,
1195
- "step": 825
1196
- },
1197
- {
1198
- "epoch": 3.680709534368071,
1199
- "grad_norm": 0.23069876432418823,
1200
- "learning_rate": 0.00019541408966517566,
1201
- "loss": 1.1552,
1202
- "step": 830
1203
- },
1204
- {
1205
- "epoch": 3.7028824833702885,
1206
- "grad_norm": 0.23669025301933289,
1207
- "learning_rate": 0.00018929637481572713,
1208
- "loss": 1.1598,
1209
- "step": 835
1210
- },
1211
- {
1212
- "epoch": 3.7250554323725056,
1213
- "grad_norm": 0.23340646922588348,
1214
- "learning_rate": 0.0001832535141933373,
1215
- "loss": 1.1478,
1216
- "step": 840
1217
- },
1218
- {
1219
- "epoch": 3.7472283813747227,
1220
- "grad_norm": 0.24212823808193207,
1221
- "learning_rate": 0.00017728696363445117,
1222
- "loss": 1.1521,
1223
- "step": 845
1224
- },
1225
- {
1226
- "epoch": 3.7694013303769403,
1227
- "grad_norm": 0.243617445230484,
1228
- "learning_rate": 0.0001713981605910137,
1229
- "loss": 1.1534,
1230
- "step": 850
1231
- },
1232
- {
1233
- "epoch": 3.7915742793791574,
1234
- "grad_norm": 0.23238743841648102,
1235
- "learning_rate": 0.0001655885237841611,
1236
- "loss": 1.1595,
1237
- "step": 855
1238
- },
1239
- {
1240
- "epoch": 3.8137472283813745,
1241
- "grad_norm": 0.2364315241575241,
1242
- "learning_rate": 0.00015985945286242452,
1243
- "loss": 1.1499,
1244
- "step": 860
1245
- },
1246
- {
1247
- "epoch": 3.835920177383592,
1248
- "grad_norm": 0.23698101937770844,
1249
- "learning_rate": 0.00015421232806452916,
1250
- "loss": 1.1564,
1251
- "step": 865
1252
- },
1253
- {
1254
- "epoch": 3.858093126385809,
1255
- "grad_norm": 0.2341204732656479,
1256
- "learning_rate": 0.00014864850988687017,
1257
- "loss": 1.1575,
1258
- "step": 870
1259
- },
1260
- {
1261
- "epoch": 3.8802660753880267,
1262
- "grad_norm": 0.2593097984790802,
1263
- "learning_rate": 0.0001431693387557424,
1264
- "loss": 1.1644,
1265
- "step": 875
1266
- },
1267
- {
1268
- "epoch": 3.902439024390244,
1269
- "grad_norm": 0.24039191007614136,
1270
- "learning_rate": 0.0001377761347044079,
1271
- "loss": 1.1549,
1272
- "step": 880
1273
- },
1274
- {
1275
- "epoch": 3.9246119733924614,
1276
- "grad_norm": 0.23400068283081055,
1277
- "learning_rate": 0.00013247019705507596,
1278
- "loss": 1.1712,
1279
- "step": 885
1280
- },
1281
- {
1282
- "epoch": 3.9467849223946785,
1283
- "grad_norm": 0.23080703616142273,
1284
- "learning_rate": 0.00012725280410587166,
1285
- "loss": 1.1687,
1286
- "step": 890
1287
- },
1288
- {
1289
- "epoch": 3.9689578713968956,
1290
- "grad_norm": 0.23793019354343414,
1291
- "learning_rate": 0.00012212521282287093,
1292
- "loss": 1.1562,
1293
- "step": 895
1294
- },
1295
- {
1296
- "epoch": 3.991130820399113,
1297
- "grad_norm": 0.254978746175766,
1298
- "learning_rate": 0.00011708865853727369,
1299
- "loss": 1.1447,
1300
- "step": 900
1301
- },
1302
- {
1303
- "epoch": 4.0,
1304
- "eval_loss": 1.9029209613800049,
1305
- "eval_runtime": 0.3379,
1306
- "eval_samples_per_second": 2.959,
1307
- "eval_steps_per_second": 2.959,
1308
- "step": 902
1309
- },
1310
- {
1311
- "epoch": 4.013303769401331,
1312
- "grad_norm": 0.22599312663078308,
1313
- "learning_rate": 0.00011214435464779005,
1314
- "loss": 1.1088,
1315
- "step": 905
1316
- },
1317
- {
1318
- "epoch": 4.035476718403547,
1319
- "grad_norm": 0.26651766896247864,
1320
- "learning_rate": 0.00010729349232831092,
1321
- "loss": 1.0774,
1322
- "step": 910
1323
- },
1324
- {
1325
- "epoch": 4.057649667405765,
1326
- "grad_norm": 0.2449249029159546,
1327
- "learning_rate": 0.00010253724024093103,
1328
- "loss": 1.0788,
1329
- "step": 915
1330
- },
1331
- {
1332
- "epoch": 4.0798226164079825,
1333
- "grad_norm": 0.24662241339683533,
1334
- "learning_rate": 9.787674425439719e-05,
1335
- "loss": 1.0857,
1336
- "step": 920
1337
- },
1338
- {
1339
- "epoch": 4.101995565410199,
1340
- "grad_norm": 0.24599789083003998,
1341
- "learning_rate": 9.331312716804791e-05,
1342
- "loss": 1.0937,
1343
- "step": 925
1344
- },
1345
- {
1346
- "epoch": 4.124168514412417,
1347
- "grad_norm": 0.25388607382774353,
1348
- "learning_rate": 8.884748844130986e-05,
1349
- "loss": 1.0834,
1350
- "step": 930
1351
- },
1352
- {
1353
- "epoch": 4.146341463414634,
1354
- "grad_norm": 0.24923333525657654,
1355
- "learning_rate": 8.448090392881796e-05,
1356
- "loss": 1.0861,
1357
- "step": 935
1358
- },
1359
- {
1360
- "epoch": 4.168514412416852,
1361
- "grad_norm": 0.24948322772979736,
1362
- "learning_rate": 8.021442562122194e-05,
1363
- "loss": 1.0852,
1364
- "step": 940
1365
- },
1366
- {
1367
- "epoch": 4.1906873614190685,
1368
- "grad_norm": 0.25804150104522705,
1369
- "learning_rate": 7.604908139174255e-05,
1370
- "loss": 1.0998,
1371
- "step": 945
1372
- },
1373
- {
1374
- "epoch": 4.212860310421286,
1375
- "grad_norm": 0.2608419954776764,
1376
- "learning_rate": 7.198587474853863e-05,
1377
- "loss": 1.081,
1378
- "step": 950
1379
- },
1380
- {
1381
- "epoch": 4.235033259423504,
1382
- "grad_norm": 0.2649877965450287,
1383
- "learning_rate": 6.802578459294235e-05,
1384
- "loss": 1.0946,
1385
- "step": 955
1386
- },
1387
- {
1388
- "epoch": 4.25720620842572,
1389
- "grad_norm": 0.25245440006256104,
1390
- "learning_rate": 6.416976498362431e-05,
1391
- "loss": 1.0893,
1392
- "step": 960
1393
- },
1394
- {
1395
- "epoch": 4.279379157427938,
1396
- "grad_norm": 0.2566092908382416,
1397
- "learning_rate": 6.041874490674415e-05,
1398
- "loss": 1.0925,
1399
- "step": 965
1400
- },
1401
- {
1402
- "epoch": 4.301552106430155,
1403
- "grad_norm": 0.25820818543434143,
1404
- "learning_rate": 5.6773628052139036e-05,
1405
- "loss": 1.0999,
1406
- "step": 970
1407
- },
1408
- {
1409
- "epoch": 4.323725055432373,
1410
- "grad_norm": 0.24905657768249512,
1411
- "learning_rate": 5.3235292595609106e-05,
1412
- "loss": 1.0727,
1413
- "step": 975
1414
- },
1415
- {
1416
- "epoch": 4.34589800443459,
1417
- "grad_norm": 0.2517815828323364,
1418
- "learning_rate": 4.9804590987348854e-05,
1419
- "loss": 1.0929,
1420
- "step": 980
1421
- },
1422
- {
1423
- "epoch": 4.368070953436807,
1424
- "grad_norm": 0.25075092911720276,
1425
- "learning_rate": 4.648234974657578e-05,
1426
- "loss": 1.1038,
1427
- "step": 985
1428
- },
1429
- {
1430
- "epoch": 4.390243902439025,
1431
- "grad_norm": 0.25245392322540283,
1432
- "learning_rate": 4.326936926240682e-05,
1433
- "loss": 1.0721,
1434
- "step": 990
1435
- },
1436
- {
1437
- "epoch": 4.412416851441241,
1438
- "grad_norm": 0.2557605504989624,
1439
- "learning_rate": 4.0166423601029736e-05,
1440
- "loss": 1.0926,
1441
- "step": 995
1442
- },
1443
- {
1444
- "epoch": 4.434589800443459,
1445
- "grad_norm": 0.24609370529651642,
1446
- "learning_rate": 3.717426031921639e-05,
1447
- "loss": 1.0866,
1448
- "step": 1000
1449
- },
1450
- {
1451
- "epoch": 4.4567627494456765,
1452
- "grad_norm": 0.2574557363986969,
1453
- "learning_rate": 3.429360028422307e-05,
1454
- "loss": 1.0874,
1455
- "step": 1005
1456
- },
1457
- {
1458
- "epoch": 4.478935698447893,
1459
- "grad_norm": 0.256829172372818,
1460
- "learning_rate": 3.152513750011921e-05,
1461
- "loss": 1.092,
1462
- "step": 1010
1463
- },
1464
- {
1465
- "epoch": 4.501108647450111,
1466
- "grad_norm": 0.25330841541290283,
1467
- "learning_rate": 2.8869538940589802e-05,
1468
- "loss": 1.0949,
1469
- "step": 1015
1470
- },
1471
- {
1472
- "epoch": 4.523281596452328,
1473
- "grad_norm": 0.25692063570022583,
1474
- "learning_rate": 2.6327444388249076e-05,
1475
- "loss": 1.0946,
1476
- "step": 1020
1477
- },
1478
- {
1479
- "epoch": 4.545454545454545,
1480
- "grad_norm": 0.25190240144729614,
1481
- "learning_rate": 2.3899466280504933e-05,
1482
- "loss": 1.0901,
1483
- "step": 1025
1484
- },
1485
- {
1486
- "epoch": 4.5676274944567625,
1487
- "grad_norm": 0.25231993198394775,
1488
- "learning_rate": 2.158618956201158e-05,
1489
- "loss": 1.095,
1490
- "step": 1030
1491
- },
1492
- {
1493
- "epoch": 4.58980044345898,
1494
- "grad_norm": 0.25305166840553284,
1495
- "learning_rate": 1.9388171543745393e-05,
1496
- "loss": 1.0747,
1497
- "step": 1035
1498
- },
1499
- {
1500
- "epoch": 4.611973392461198,
1501
- "grad_norm": 0.2522623538970947,
1502
- "learning_rate": 1.730594176873851e-05,
1503
- "loss": 1.0927,
1504
- "step": 1040
1505
- },
1506
- {
1507
- "epoch": 4.634146341463414,
1508
- "grad_norm": 0.2547568678855896,
1509
- "learning_rate": 1.5340001884502576e-05,
1510
- "loss": 1.1009,
1511
- "step": 1045
1512
- },
1513
- {
1514
- "epoch": 4.656319290465632,
1515
- "grad_norm": 0.25178787112236023,
1516
- "learning_rate": 1.3490825522172012e-05,
1517
- "loss": 1.0879,
1518
- "step": 1050
1519
- },
1520
- {
1521
- "epoch": 4.678492239467849,
1522
- "grad_norm": 0.2568508982658386,
1523
- "learning_rate": 1.1758858182397692e-05,
1524
- "loss": 1.0887,
1525
- "step": 1055
1526
- },
1527
- {
1528
- "epoch": 4.700665188470067,
1529
- "grad_norm": 0.2606378197669983,
1530
- "learning_rate": 1.014451712801806e-05,
1531
- "loss": 1.0881,
1532
- "step": 1060
1533
- },
1534
- {
1535
- "epoch": 4.722838137472284,
1536
- "grad_norm": 0.24862989783287048,
1537
- "learning_rate": 8.648191283532336e-06,
1538
- "loss": 1.0887,
1539
- "step": 1065
1540
- },
1541
- {
1542
- "epoch": 4.745011086474501,
1543
- "grad_norm": 0.24630777537822723,
1544
- "learning_rate": 7.270241141401568e-06,
1545
- "loss": 1.0745,
1546
- "step": 1070
1547
- },
1548
- {
1549
- "epoch": 4.767184035476719,
1550
- "grad_norm": 0.26346486806869507,
1551
- "learning_rate": 6.010998675199553e-06,
1552
- "loss": 1.1047,
1553
- "step": 1075
1554
- },
1555
- {
1556
- "epoch": 4.789356984478935,
1557
- "grad_norm": 0.25308722257614136,
1558
- "learning_rate": 4.870767259633868e-06,
1559
- "loss": 1.0981,
1560
- "step": 1080
1561
- },
1562
- {
1563
- "epoch": 4.811529933481153,
1564
- "grad_norm": 0.2649281322956085,
1565
- "learning_rate": 3.849821597457892e-06,
1566
- "loss": 1.0827,
1567
- "step": 1085
1568
- },
1569
- {
1570
- "epoch": 4.8337028824833705,
1571
- "grad_norm": 0.2485639750957489,
1572
- "learning_rate": 2.948407653289409e-06,
1573
- "loss": 1.0849,
1574
- "step": 1090
1575
- },
1576
- {
1577
- "epoch": 4.855875831485587,
1578
- "grad_norm": 0.25217127799987793,
1579
- "learning_rate": 2.166742594353288e-06,
1580
- "loss": 1.0861,
1581
- "step": 1095
1582
- },
1583
- {
1584
- "epoch": 4.878048780487805,
1585
- "grad_norm": 0.2509874403476715,
1586
- "learning_rate": 1.5050147381619473e-06,
1587
- "loss": 1.1002,
1588
- "step": 1100
1589
- },
1590
- {
1591
- "epoch": 4.900221729490022,
1592
- "grad_norm": 0.2537370026111603,
1593
- "learning_rate": 9.633835071463092e-07,
1594
- "loss": 1.0815,
1595
- "step": 1105
1596
- },
1597
- {
1598
- "epoch": 4.922394678492239,
1599
- "grad_norm": 0.25824105739593506,
1600
- "learning_rate": 5.419793902477488e-07,
1601
- "loss": 1.085,
1602
- "step": 1110
1603
- },
1604
- {
1605
- "epoch": 4.9445676274944566,
1606
- "grad_norm": 0.2534651756286621,
1607
- "learning_rate": 2.4090391148112736e-07,
1608
- "loss": 1.0952,
1609
- "step": 1115
1610
- },
1611
- {
1612
- "epoch": 4.966740576496674,
1613
- "grad_norm": 0.25321847200393677,
1614
- "learning_rate": 6.022960547563683e-08,
1615
- "loss": 1.0899,
1616
- "step": 1120
1617
- },
1618
- {
1619
- "epoch": 4.988913525498892,
1620
- "grad_norm": 0.2509351968765259,
1621
- "learning_rate": 0.0,
1622
- "loss": 1.098,
1623
- "step": 1125
1624
- },
1625
- {
1626
- "epoch": 4.988913525498892,
1627
- "eval_loss": 1.9855170249938965,
1628
- "eval_runtime": 0.3384,
1629
- "eval_samples_per_second": 2.955,
1630
- "eval_steps_per_second": 2.955,
1631
- "step": 1125
1632
- },
1633
- {
1634
- "epoch": 4.988913525498892,
1635
- "step": 1125,
1636
- "total_flos": 1.6629843858229821e+18,
1637
- "train_loss": 1.2764637470245361,
1638
- "train_runtime": 3608.211,
1639
- "train_samples_per_second": 9.984,
1640
- "train_steps_per_second": 0.312
1641
  }
1642
  ],
1643
  "logging_steps": 5,
1644
- "max_steps": 1125,
1645
  "num_input_tokens_seen": 0,
1646
- "num_train_epochs": 5,
1647
  "save_steps": 100,
1648
  "stateful_callbacks": {
1649
  "TrainerControl": {
@@ -1657,7 +365,7 @@
1657
  "attributes": {}
1658
  }
1659
  },
1660
- "total_flos": 1.6629843858229821e+18,
1661
  "train_batch_size": 4,
1662
  "trial_name": null,
1663
  "trial_params": null
 
1
  {
2
  "best_metric": null,
3
  "best_model_checkpoint": null,
4
+ "epoch": 0.9977827050997783,
5
  "eval_steps": 500,
6
+ "global_step": 225,
7
  "is_hyper_param_search": false,
8
  "is_local_process_zero": true,
9
  "is_world_process_zero": true,
10
  "log_history": [
11
  {
12
  "epoch": 0.004434589800443459,
13
+ "grad_norm": 1.91265869140625,
14
+ "learning_rate": 4.347826086956522e-05,
15
  "loss": 2.8127,
16
  "step": 1
17
  },
18
  {
19
  "epoch": 0.022172949002217297,
20
+ "grad_norm": 1.5314122438430786,
21
+ "learning_rate": 0.0002173913043478261,
22
+ "loss": 2.7241,
23
  "step": 5
24
  },
25
  {
26
  "epoch": 0.04434589800443459,
27
+ "grad_norm": 0.6431057453155518,
28
+ "learning_rate": 0.0004347826086956522,
29
+ "loss": 2.2423,
30
  "step": 10
31
  },
32
  {
33
  "epoch": 0.06651884700665188,
34
+ "grad_norm": 0.5257381200790405,
35
+ "learning_rate": 0.0006521739130434783,
36
+ "loss": 1.9505,
37
  "step": 15
38
  },
39
  {
40
  "epoch": 0.08869179600886919,
41
+ "grad_norm": 0.37703999876976013,
42
+ "learning_rate": 0.0008695652173913044,
43
+ "loss": 1.7881,
44
  "step": 20
45
  },
46
  {
47
  "epoch": 0.11086474501108648,
48
+ "grad_norm": 0.30256885290145874,
49
+ "learning_rate": 0.0009900990099009901,
50
+ "loss": 1.7031,
51
  "step": 25
52
  },
53
  {
54
  "epoch": 0.13303769401330376,
55
+ "grad_norm": 0.3443244993686676,
56
+ "learning_rate": 0.0009653465346534653,
57
+ "loss": 1.6352,
58
  "step": 30
59
  },
60
  {
61
  "epoch": 0.15521064301552107,
62
+ "grad_norm": 0.369827538728714,
63
+ "learning_rate": 0.0009405940594059406,
64
+ "loss": 1.5746,
65
  "step": 35
66
  },
67
  {
68
  "epoch": 0.17738359201773837,
69
+ "grad_norm": 0.231527641415596,
70
+ "learning_rate": 0.0009158415841584159,
71
+ "loss": 1.5409,
72
  "step": 40
73
  },
74
  {
75
  "epoch": 0.19955654101995565,
76
+ "grad_norm": 0.22827404737472534,
77
+ "learning_rate": 0.0008910891089108911,
78
+ "loss": 1.5187,
79
  "step": 45
80
  },
81
  {
82
  "epoch": 0.22172949002217296,
83
+ "grad_norm": 0.2396710067987442,
84
+ "learning_rate": 0.0008663366336633663,
85
+ "loss": 1.5128,
86
  "step": 50
87
  },
88
  {
89
  "epoch": 0.24390243902439024,
90
+ "grad_norm": 0.20095600187778473,
91
+ "learning_rate": 0.0008415841584158416,
92
+ "loss": 1.4848,
93
  "step": 55
94
  },
95
  {
96
  "epoch": 0.2660753880266075,
97
+ "grad_norm": 0.28900983929634094,
98
+ "learning_rate": 0.0008168316831683168,
99
+ "loss": 1.4962,
100
  "step": 60
101
  },
102
  {
103
  "epoch": 0.28824833702882485,
104
+ "grad_norm": 0.25716254115104675,
105
+ "learning_rate": 0.0007920792079207921,
106
+ "loss": 1.4789,
107
  "step": 65
108
  },
109
  {
110
  "epoch": 0.31042128603104213,
111
+ "grad_norm": 0.252340167760849,
112
+ "learning_rate": 0.0007673267326732674,
113
+ "loss": 1.458,
114
  "step": 70
115
  },
116
  {
117
  "epoch": 0.3325942350332594,
118
+ "grad_norm": 0.20464155077934265,
119
+ "learning_rate": 0.0007425742574257426,
120
+ "loss": 1.4558,
121
  "step": 75
122
  },
123
  {
124
  "epoch": 0.35476718403547675,
125
+ "grad_norm": 0.23394732177257538,
126
+ "learning_rate": 0.0007178217821782178,
127
+ "loss": 1.4562,
128
  "step": 80
129
  },
130
  {
131
  "epoch": 0.376940133037694,
132
+ "grad_norm": 0.2164139449596405,
133
+ "learning_rate": 0.000693069306930693,
134
+ "loss": 1.4338,
135
  "step": 85
136
  },
137
  {
138
  "epoch": 0.3991130820399113,
139
+ "grad_norm": 0.215862438082695,
140
+ "learning_rate": 0.0006683168316831684,
141
+ "loss": 1.4287,
142
  "step": 90
143
  },
144
  {
145
  "epoch": 0.4212860310421286,
146
+ "grad_norm": 0.20270515978336334,
147
+ "learning_rate": 0.0006435643564356436,
148
+ "loss": 1.4226,
149
  "step": 95
150
  },
151
  {
152
  "epoch": 0.4434589800443459,
153
+ "grad_norm": 0.20255711674690247,
154
+ "learning_rate": 0.0006188118811881188,
155
+ "loss": 1.4314,
156
  "step": 100
157
  },
158
  {
159
  "epoch": 0.4656319290465632,
160
+ "grad_norm": 0.20747065544128418,
161
+ "learning_rate": 0.000594059405940594,
162
+ "loss": 1.4194,
163
  "step": 105
164
  },
165
  {
166
  "epoch": 0.4878048780487805,
167
+ "grad_norm": 0.2104884535074234,
168
+ "learning_rate": 0.0005693069306930693,
169
+ "loss": 1.4106,
170
  "step": 110
171
  },
172
  {
173
  "epoch": 0.5099778270509978,
174
+ "grad_norm": 0.21514882147312164,
175
+ "learning_rate": 0.0005445544554455446,
176
+ "loss": 1.42,
177
  "step": 115
178
  },
179
  {
180
  "epoch": 0.532150776053215,
181
+ "grad_norm": 0.20466424524784088,
182
+ "learning_rate": 0.0005198019801980198,
183
+ "loss": 1.3937,
184
  "step": 120
185
  },
186
  {
187
  "epoch": 0.5543237250554324,
188
+ "grad_norm": 0.2181282341480255,
189
+ "learning_rate": 0.0004950495049504951,
190
+ "loss": 1.3972,
191
  "step": 125
192
  },
193
  {
194
  "epoch": 0.5764966740576497,
195
+ "grad_norm": 0.22615699470043182,
196
+ "learning_rate": 0.0004702970297029703,
197
+ "loss": 1.3882,
198
  "step": 130
199
  },
200
  {
201
  "epoch": 0.5986696230598669,
202
+ "grad_norm": 0.1967965066432953,
203
+ "learning_rate": 0.00044554455445544556,
204
+ "loss": 1.388,
205
  "step": 135
206
  },
207
  {
208
  "epoch": 0.6208425720620843,
209
+ "grad_norm": 0.2030034065246582,
210
+ "learning_rate": 0.0004207920792079208,
211
+ "loss": 1.4048,
212
  "step": 140
213
  },
214
  {
215
  "epoch": 0.6430155210643016,
216
+ "grad_norm": 0.2136310189962387,
217
+ "learning_rate": 0.00039603960396039607,
218
+ "loss": 1.3918,
219
  "step": 145
220
  },
221
  {
222
  "epoch": 0.6651884700665188,
223
+ "grad_norm": 0.22149060666561127,
224
+ "learning_rate": 0.0003712871287128713,
225
+ "loss": 1.4023,
226
  "step": 150
227
  },
228
  {
229
  "epoch": 0.6873614190687362,
230
+ "grad_norm": 0.2130667269229889,
231
+ "learning_rate": 0.0003465346534653465,
232
+ "loss": 1.3933,
233
  "step": 155
234
  },
235
  {
236
  "epoch": 0.7095343680709535,
237
+ "grad_norm": 0.19920696318149567,
238
+ "learning_rate": 0.0003217821782178218,
239
+ "loss": 1.3815,
240
  "step": 160
241
  },
242
  {
243
  "epoch": 0.7317073170731707,
244
+ "grad_norm": 0.20453611016273499,
245
+ "learning_rate": 0.000297029702970297,
246
+ "loss": 1.3648,
247
  "step": 165
248
  },
249
  {
250
  "epoch": 0.753880266075388,
251
+ "grad_norm": 0.21325863897800446,
252
+ "learning_rate": 0.0002722772277227723,
253
+ "loss": 1.3773,
254
  "step": 170
255
  },
256
  {
257
  "epoch": 0.7760532150776053,
258
+ "grad_norm": 0.2014823704957962,
259
+ "learning_rate": 0.00024752475247524753,
260
+ "loss": 1.3881,
261
  "step": 175
262
  },
263
  {
264
  "epoch": 0.7982261640798226,
265
+ "grad_norm": 0.20359407365322113,
266
+ "learning_rate": 0.00022277227722772278,
267
+ "loss": 1.3826,
268
  "step": 180
269
  },
270
  {
271
  "epoch": 0.8203991130820399,
272
+ "grad_norm": 0.21738748252391815,
273
+ "learning_rate": 0.00019801980198019803,
274
+ "loss": 1.3705,
275
  "step": 185
276
  },
277
  {
278
  "epoch": 0.8425720620842572,
279
+ "grad_norm": 0.1990172564983368,
280
+ "learning_rate": 0.00017326732673267326,
281
+ "loss": 1.3693,
282
  "step": 190
283
  },
284
  {
285
  "epoch": 0.8647450110864745,
286
+ "grad_norm": 0.2007543295621872,
287
+ "learning_rate": 0.0001485148514851485,
288
+ "loss": 1.3575,
289
  "step": 195
290
  },
291
  {
292
  "epoch": 0.8869179600886918,
293
+ "grad_norm": 0.5149243474006653,
294
+ "learning_rate": 0.00012376237623762376,
295
+ "loss": 1.374,
296
  "step": 200
297
  },
298
  {
299
  "epoch": 0.9090909090909091,
300
+ "grad_norm": 0.2131042778491974,
301
+ "learning_rate": 9.900990099009902e-05,
302
+ "loss": 1.3636,
303
  "step": 205
304
  },
305
  {
306
  "epoch": 0.9312638580931264,
307
+ "grad_norm": 0.19097404181957245,
308
+ "learning_rate": 7.425742574257426e-05,
309
+ "loss": 1.3489,
310
  "step": 210
311
  },
312
  {
313
  "epoch": 0.9534368070953437,
314
+ "grad_norm": 0.19905418157577515,
315
+ "learning_rate": 4.950495049504951e-05,
316
+ "loss": 1.3442,
317
  "step": 215
318
  },
319
  {
320
  "epoch": 0.975609756097561,
321
+ "grad_norm": 0.19617854058742523,
322
+ "learning_rate": 2.4752475247524754e-05,
323
+ "loss": 1.3721,
324
  "step": 220
325
  },
326
  {
327
  "epoch": 0.9977827050997783,
328
+ "grad_norm": 0.20064575970172882,
329
+ "learning_rate": 0.0,
330
+ "loss": 1.3767,
331
  "step": 225
332
  },
333
  {
334
  "epoch": 0.9977827050997783,
335
+ "eval_loss": 1.7732421159744263,
336
+ "eval_runtime": 0.5415,
337
+ "eval_samples_per_second": 1.847,
338
+ "eval_steps_per_second": 1.847,
339
  "step": 225
340
  },
341
  {
342
+ "epoch": 0.9977827050997783,
343
+ "step": 225,
344
+ "total_flos": 3.3259687694984806e+17,
345
+ "train_loss": 1.4963340536753336,
346
+ "train_runtime": 725.2803,
347
+ "train_samples_per_second": 9.934,
348
+ "train_steps_per_second": 0.31
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
349
  }
350
  ],
351
  "logging_steps": 5,
352
+ "max_steps": 225,
353
  "num_input_tokens_seen": 0,
354
+ "num_train_epochs": 1,
355
  "save_steps": 100,
356
  "stateful_callbacks": {
357
  "TrainerControl": {
 
365
  "attributes": {}
366
  }
367
  },
368
+ "total_flos": 3.3259687694984806e+17,
369
  "train_batch_size": 4,
370
  "trial_name": null,
371
  "trial_params": null