kykim0 commited on
Commit
d972836
·
verified ·
1 Parent(s): 2b5fe5a

Model save

Browse files
README.md ADDED
@@ -0,0 +1,67 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model: allenai/OLMo-1B-hf
4
+ tags:
5
+ - trl
6
+ - sft
7
+ - generated_from_trainer
8
+ datasets:
9
+ - generator
10
+ model-index:
11
+ - name: sft-olmo-1b
12
+ results: []
13
+ ---
14
+
15
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
16
+ should probably proofread and complete it, then remove this comment. -->
17
+
18
+ # sft-olmo-1b
19
+
20
+ This model is a fine-tuned version of [allenai/OLMo-1B-hf](https://huggingface.co/allenai/OLMo-1B-hf) on the generator dataset.
21
+ It achieves the following results on the evaluation set:
22
+ - Loss: 0.8224
23
+
24
+ ## Model description
25
+
26
+ More information needed
27
+
28
+ ## Intended uses & limitations
29
+
30
+ More information needed
31
+
32
+ ## Training and evaluation data
33
+
34
+ More information needed
35
+
36
+ ## Training procedure
37
+
38
+ ### Training hyperparameters
39
+
40
+ The following hyperparameters were used during training:
41
+ - learning_rate: 2e-05
42
+ - train_batch_size: 8
43
+ - eval_batch_size: 2
44
+ - seed: 42
45
+ - distributed_type: multi-GPU
46
+ - gradient_accumulation_steps: 16
47
+ - total_train_batch_size: 128
48
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
49
+ - lr_scheduler_type: cosine
50
+ - lr_scheduler_warmup_ratio: 0.03
51
+ - num_epochs: 3
52
+
53
+ ### Training results
54
+
55
+ | Training Loss | Epoch | Step | Validation Loss |
56
+ |:-------------:|:------:|:----:|:---------------:|
57
+ | 1.1754 | 0.9992 | 1236 | 1.0556 |
58
+ | 0.9628 | 1.9993 | 2473 | 0.8751 |
59
+ | 0.801 | 2.9977 | 3708 | 0.8224 |
60
+
61
+
62
+ ### Framework versions
63
+
64
+ - Transformers 4.40.0
65
+ - Pytorch 2.1.2
66
+ - Datasets 2.14.6
67
+ - Tokenizers 0.19.1
all_results.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 2.997726239199636,
3
+ "total_flos": 0.0,
4
+ "train_loss": 1.0273753281164324,
5
+ "train_runtime": 58675.1239,
6
+ "train_samples": 326149,
7
+ "train_samples_per_second": 8.095,
8
+ "train_steps_per_second": 0.063
9
+ }
generation_config.json ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "eos_token_id": 0,
4
+ "pad_token_id": 1,
5
+ "transformers_version": "4.40.0"
6
+ }
train_results.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 2.997726239199636,
3
+ "total_flos": 0.0,
4
+ "train_loss": 1.0273753281164324,
5
+ "train_runtime": 58675.1239,
6
+ "train_samples": 326149,
7
+ "train_samples_per_second": 8.095,
8
+ "train_steps_per_second": 0.063
9
+ }
trainer_state.json ADDED
@@ -0,0 +1,2651 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": null,
3
+ "best_model_checkpoint": null,
4
+ "epoch": 2.997726239199636,
5
+ "eval_steps": 500,
6
+ "global_step": 3708,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 0.0008084482845737962,
13
+ "grad_norm": 10.217898134986712,
14
+ "learning_rate": 1.7857142857142858e-07,
15
+ "loss": 1.7929,
16
+ "step": 1
17
+ },
18
+ {
19
+ "epoch": 0.008084482845737961,
20
+ "grad_norm": 8.447331849071613,
21
+ "learning_rate": 1.7857142857142859e-06,
22
+ "loss": 1.8062,
23
+ "step": 10
24
+ },
25
+ {
26
+ "epoch": 0.016168965691475922,
27
+ "grad_norm": 3.3131831113560053,
28
+ "learning_rate": 3.5714285714285718e-06,
29
+ "loss": 1.6026,
30
+ "step": 20
31
+ },
32
+ {
33
+ "epoch": 0.024253448537213885,
34
+ "grad_norm": 1.9123288015392683,
35
+ "learning_rate": 5.357142857142857e-06,
36
+ "loss": 1.5044,
37
+ "step": 30
38
+ },
39
+ {
40
+ "epoch": 0.032337931382951844,
41
+ "grad_norm": 1.5061470360359468,
42
+ "learning_rate": 7.1428571428571436e-06,
43
+ "loss": 1.46,
44
+ "step": 40
45
+ },
46
+ {
47
+ "epoch": 0.04042241422868981,
48
+ "grad_norm": 1.3730571252951111,
49
+ "learning_rate": 8.92857142857143e-06,
50
+ "loss": 1.4242,
51
+ "step": 50
52
+ },
53
+ {
54
+ "epoch": 0.04850689707442777,
55
+ "grad_norm": 1.3696793978053872,
56
+ "learning_rate": 1.0714285714285714e-05,
57
+ "loss": 1.3962,
58
+ "step": 60
59
+ },
60
+ {
61
+ "epoch": 0.05659137992016573,
62
+ "grad_norm": 1.3254100387294059,
63
+ "learning_rate": 1.25e-05,
64
+ "loss": 1.4019,
65
+ "step": 70
66
+ },
67
+ {
68
+ "epoch": 0.06467586276590369,
69
+ "grad_norm": 1.300617283886077,
70
+ "learning_rate": 1.4285714285714287e-05,
71
+ "loss": 1.38,
72
+ "step": 80
73
+ },
74
+ {
75
+ "epoch": 0.07276034561164166,
76
+ "grad_norm": 1.2890020177179862,
77
+ "learning_rate": 1.6071428571428572e-05,
78
+ "loss": 1.3692,
79
+ "step": 90
80
+ },
81
+ {
82
+ "epoch": 0.08084482845737961,
83
+ "grad_norm": 1.3617464980600442,
84
+ "learning_rate": 1.785714285714286e-05,
85
+ "loss": 1.3717,
86
+ "step": 100
87
+ },
88
+ {
89
+ "epoch": 0.08892931130311758,
90
+ "grad_norm": 1.3364658238171299,
91
+ "learning_rate": 1.9642857142857145e-05,
92
+ "loss": 1.3445,
93
+ "step": 110
94
+ },
95
+ {
96
+ "epoch": 0.09701379414885554,
97
+ "grad_norm": 1.406654933905743,
98
+ "learning_rate": 1.999975576461237e-05,
99
+ "loss": 1.3724,
100
+ "step": 120
101
+ },
102
+ {
103
+ "epoch": 0.1050982769945935,
104
+ "grad_norm": 1.4611284363694028,
105
+ "learning_rate": 1.999876357879684e-05,
106
+ "loss": 1.3446,
107
+ "step": 130
108
+ },
109
+ {
110
+ "epoch": 0.11318275984033146,
111
+ "grad_norm": 1.3711888479566774,
112
+ "learning_rate": 1.9997008253510416e-05,
113
+ "loss": 1.3515,
114
+ "step": 140
115
+ },
116
+ {
117
+ "epoch": 0.12126724268606942,
118
+ "grad_norm": 1.409811295994504,
119
+ "learning_rate": 1.9994489922725454e-05,
120
+ "loss": 1.342,
121
+ "step": 150
122
+ },
123
+ {
124
+ "epoch": 0.12935172553180738,
125
+ "grad_norm": 1.3918093353079757,
126
+ "learning_rate": 1.9991208778649485e-05,
127
+ "loss": 1.3493,
128
+ "step": 160
129
+ },
130
+ {
131
+ "epoch": 0.13743620837754536,
132
+ "grad_norm": 1.3597096387911873,
133
+ "learning_rate": 1.998716507171053e-05,
134
+ "loss": 1.3186,
135
+ "step": 170
136
+ },
137
+ {
138
+ "epoch": 0.14552069122328332,
139
+ "grad_norm": 1.3532852407233782,
140
+ "learning_rate": 1.998235911053798e-05,
141
+ "loss": 1.3426,
142
+ "step": 180
143
+ },
144
+ {
145
+ "epoch": 0.15360517406902127,
146
+ "grad_norm": 1.395929355438092,
147
+ "learning_rate": 1.9976791261939064e-05,
148
+ "loss": 1.338,
149
+ "step": 190
150
+ },
151
+ {
152
+ "epoch": 0.16168965691475923,
153
+ "grad_norm": 1.3346897807354206,
154
+ "learning_rate": 1.997046195087082e-05,
155
+ "loss": 1.3209,
156
+ "step": 200
157
+ },
158
+ {
159
+ "epoch": 0.16977413976049718,
160
+ "grad_norm": 1.3501622187451188,
161
+ "learning_rate": 1.996337166040769e-05,
162
+ "loss": 1.3279,
163
+ "step": 210
164
+ },
165
+ {
166
+ "epoch": 0.17785862260623517,
167
+ "grad_norm": 1.262079121683393,
168
+ "learning_rate": 1.995552093170463e-05,
169
+ "loss": 1.3135,
170
+ "step": 220
171
+ },
172
+ {
173
+ "epoch": 0.18594310545197312,
174
+ "grad_norm": 1.324883144834464,
175
+ "learning_rate": 1.994691036395583e-05,
176
+ "loss": 1.306,
177
+ "step": 230
178
+ },
179
+ {
180
+ "epoch": 0.19402758829771108,
181
+ "grad_norm": 1.373867783740766,
182
+ "learning_rate": 1.9937540614348944e-05,
183
+ "loss": 1.3018,
184
+ "step": 240
185
+ },
186
+ {
187
+ "epoch": 0.20211207114344903,
188
+ "grad_norm": 1.4020861161362783,
189
+ "learning_rate": 1.992741239801498e-05,
190
+ "loss": 1.3203,
191
+ "step": 250
192
+ },
193
+ {
194
+ "epoch": 0.210196553989187,
195
+ "grad_norm": 1.3484650757297245,
196
+ "learning_rate": 1.9916526487973678e-05,
197
+ "loss": 1.2939,
198
+ "step": 260
199
+ },
200
+ {
201
+ "epoch": 0.21828103683492497,
202
+ "grad_norm": 1.3330965331306333,
203
+ "learning_rate": 1.9904883715074525e-05,
204
+ "loss": 1.2795,
205
+ "step": 270
206
+ },
207
+ {
208
+ "epoch": 0.22636551968066293,
209
+ "grad_norm": 1.3917589397233552,
210
+ "learning_rate": 1.989248496793335e-05,
211
+ "loss": 1.269,
212
+ "step": 280
213
+ },
214
+ {
215
+ "epoch": 0.23445000252640089,
216
+ "grad_norm": 1.3905412148367542,
217
+ "learning_rate": 1.9879331192864492e-05,
218
+ "loss": 1.286,
219
+ "step": 290
220
+ },
221
+ {
222
+ "epoch": 0.24253448537213884,
223
+ "grad_norm": 1.4569325197708967,
224
+ "learning_rate": 1.9865423393808573e-05,
225
+ "loss": 1.2944,
226
+ "step": 300
227
+ },
228
+ {
229
+ "epoch": 0.2506189682178768,
230
+ "grad_norm": 1.3399495909594208,
231
+ "learning_rate": 1.985076263225588e-05,
232
+ "loss": 1.3106,
233
+ "step": 310
234
+ },
235
+ {
236
+ "epoch": 0.25870345106361475,
237
+ "grad_norm": 1.478802579336813,
238
+ "learning_rate": 1.9835350027165342e-05,
239
+ "loss": 1.2994,
240
+ "step": 320
241
+ },
242
+ {
243
+ "epoch": 0.26678793390935274,
244
+ "grad_norm": 1.3105244819439577,
245
+ "learning_rate": 1.9819186754879137e-05,
246
+ "loss": 1.2871,
247
+ "step": 330
248
+ },
249
+ {
250
+ "epoch": 0.2748724167550907,
251
+ "grad_norm": 1.3667119896120177,
252
+ "learning_rate": 1.9802274049032898e-05,
253
+ "loss": 1.2893,
254
+ "step": 340
255
+ },
256
+ {
257
+ "epoch": 0.28295689960082865,
258
+ "grad_norm": 1.5054526064910085,
259
+ "learning_rate": 1.9784613200461568e-05,
260
+ "loss": 1.2912,
261
+ "step": 350
262
+ },
263
+ {
264
+ "epoch": 0.29104138244656663,
265
+ "grad_norm": 1.3163243486049039,
266
+ "learning_rate": 1.976620555710087e-05,
267
+ "loss": 1.2761,
268
+ "step": 360
269
+ },
270
+ {
271
+ "epoch": 0.29912586529230456,
272
+ "grad_norm": 1.322920539242633,
273
+ "learning_rate": 1.9747052523884435e-05,
274
+ "loss": 1.2572,
275
+ "step": 370
276
+ },
277
+ {
278
+ "epoch": 0.30721034813804254,
279
+ "grad_norm": 1.3954468357326724,
280
+ "learning_rate": 1.972715556263657e-05,
281
+ "loss": 1.2745,
282
+ "step": 380
283
+ },
284
+ {
285
+ "epoch": 0.3152948309837805,
286
+ "grad_norm": 1.3451929159695755,
287
+ "learning_rate": 1.9706516191960687e-05,
288
+ "loss": 1.2472,
289
+ "step": 390
290
+ },
291
+ {
292
+ "epoch": 0.32337931382951846,
293
+ "grad_norm": 1.2765565775996142,
294
+ "learning_rate": 1.9685135987123396e-05,
295
+ "loss": 1.255,
296
+ "step": 400
297
+ },
298
+ {
299
+ "epoch": 0.33146379667525644,
300
+ "grad_norm": 1.4632541877317655,
301
+ "learning_rate": 1.966301657993428e-05,
302
+ "loss": 1.2565,
303
+ "step": 410
304
+ },
305
+ {
306
+ "epoch": 0.33954827952099437,
307
+ "grad_norm": 1.3554436136314076,
308
+ "learning_rate": 1.9640159658621344e-05,
309
+ "loss": 1.2593,
310
+ "step": 420
311
+ },
312
+ {
313
+ "epoch": 0.34763276236673235,
314
+ "grad_norm": 1.3154961346767526,
315
+ "learning_rate": 1.9616566967702164e-05,
316
+ "loss": 1.2604,
317
+ "step": 430
318
+ },
319
+ {
320
+ "epoch": 0.35571724521247033,
321
+ "grad_norm": 1.3833700211512812,
322
+ "learning_rate": 1.9592240307850748e-05,
323
+ "loss": 1.2625,
324
+ "step": 440
325
+ },
326
+ {
327
+ "epoch": 0.36380172805820826,
328
+ "grad_norm": 1.2812641775550833,
329
+ "learning_rate": 1.95671815357601e-05,
330
+ "loss": 1.2661,
331
+ "step": 450
332
+ },
333
+ {
334
+ "epoch": 0.37188621090394625,
335
+ "grad_norm": 1.3509908047727408,
336
+ "learning_rate": 1.954139256400049e-05,
337
+ "loss": 1.2448,
338
+ "step": 460
339
+ },
340
+ {
341
+ "epoch": 0.3799706937496842,
342
+ "grad_norm": 1.356891388574271,
343
+ "learning_rate": 1.951487536087352e-05,
344
+ "loss": 1.2551,
345
+ "step": 470
346
+ },
347
+ {
348
+ "epoch": 0.38805517659542216,
349
+ "grad_norm": 1.2921423460134738,
350
+ "learning_rate": 1.948763195026186e-05,
351
+ "loss": 1.2503,
352
+ "step": 480
353
+ },
354
+ {
355
+ "epoch": 0.39613965944116014,
356
+ "grad_norm": 1.3494188641362486,
357
+ "learning_rate": 1.9459664411474793e-05,
358
+ "loss": 1.2509,
359
+ "step": 490
360
+ },
361
+ {
362
+ "epoch": 0.40422414228689807,
363
+ "grad_norm": 1.336605272931222,
364
+ "learning_rate": 1.9430974879089522e-05,
365
+ "loss": 1.251,
366
+ "step": 500
367
+ },
368
+ {
369
+ "epoch": 0.41230862513263605,
370
+ "grad_norm": 1.3167568815144604,
371
+ "learning_rate": 1.9401565542788238e-05,
372
+ "loss": 1.2341,
373
+ "step": 510
374
+ },
375
+ {
376
+ "epoch": 0.420393107978374,
377
+ "grad_norm": 1.3704112316871029,
378
+ "learning_rate": 1.9371438647191007e-05,
379
+ "loss": 1.2483,
380
+ "step": 520
381
+ },
382
+ {
383
+ "epoch": 0.42847759082411196,
384
+ "grad_norm": 1.2971253486447214,
385
+ "learning_rate": 1.9340596491684443e-05,
386
+ "loss": 1.2483,
387
+ "step": 530
388
+ },
389
+ {
390
+ "epoch": 0.43656207366984995,
391
+ "grad_norm": 1.278671915851734,
392
+ "learning_rate": 1.9309041430246228e-05,
393
+ "loss": 1.247,
394
+ "step": 540
395
+ },
396
+ {
397
+ "epoch": 0.4446465565155879,
398
+ "grad_norm": 1.7143062654632688,
399
+ "learning_rate": 1.927677587126542e-05,
400
+ "loss": 1.2582,
401
+ "step": 550
402
+ },
403
+ {
404
+ "epoch": 0.45273103936132586,
405
+ "grad_norm": 3.071400396021207,
406
+ "learning_rate": 1.924380227735867e-05,
407
+ "loss": 1.2369,
408
+ "step": 560
409
+ },
410
+ {
411
+ "epoch": 0.46081552220706384,
412
+ "grad_norm": 1.3043426791795303,
413
+ "learning_rate": 1.921012316518224e-05,
414
+ "loss": 1.2564,
415
+ "step": 570
416
+ },
417
+ {
418
+ "epoch": 0.46890000505280177,
419
+ "grad_norm": 1.4606599266501914,
420
+ "learning_rate": 1.917574110523994e-05,
421
+ "loss": 1.2455,
422
+ "step": 580
423
+ },
424
+ {
425
+ "epoch": 0.47698448789853976,
426
+ "grad_norm": 1.4084512281248918,
427
+ "learning_rate": 1.914065872168692e-05,
428
+ "loss": 1.237,
429
+ "step": 590
430
+ },
431
+ {
432
+ "epoch": 0.4850689707442777,
433
+ "grad_norm": 1.7442614101619536,
434
+ "learning_rate": 1.910487869212942e-05,
435
+ "loss": 1.2428,
436
+ "step": 600
437
+ },
438
+ {
439
+ "epoch": 0.49315345359001567,
440
+ "grad_norm": 3.217405569169141,
441
+ "learning_rate": 1.9068403747420365e-05,
442
+ "loss": 1.2406,
443
+ "step": 610
444
+ },
445
+ {
446
+ "epoch": 0.5012379364357537,
447
+ "grad_norm": 2.028319411000034,
448
+ "learning_rate": 1.9031236671450963e-05,
449
+ "loss": 1.2295,
450
+ "step": 620
451
+ },
452
+ {
453
+ "epoch": 0.5093224192814916,
454
+ "grad_norm": 1.3534016099705948,
455
+ "learning_rate": 1.899338030093822e-05,
456
+ "loss": 1.2287,
457
+ "step": 630
458
+ },
459
+ {
460
+ "epoch": 0.5174069021272295,
461
+ "grad_norm": 16.560363141349324,
462
+ "learning_rate": 1.8954837525208432e-05,
463
+ "loss": 1.2239,
464
+ "step": 640
465
+ },
466
+ {
467
+ "epoch": 0.5254913849729675,
468
+ "grad_norm": 1.6507428977016882,
469
+ "learning_rate": 1.8915611285976672e-05,
470
+ "loss": 1.2122,
471
+ "step": 650
472
+ },
473
+ {
474
+ "epoch": 0.5335758678187055,
475
+ "grad_norm": 1.4599892786481696,
476
+ "learning_rate": 1.887570457712225e-05,
477
+ "loss": 1.2448,
478
+ "step": 660
479
+ },
480
+ {
481
+ "epoch": 0.5416603506644434,
482
+ "grad_norm": 1.3576328654723246,
483
+ "learning_rate": 1.883512044446023e-05,
484
+ "loss": 1.235,
485
+ "step": 670
486
+ },
487
+ {
488
+ "epoch": 0.5497448335101814,
489
+ "grad_norm": 2.7275818350144383,
490
+ "learning_rate": 1.879386198550895e-05,
491
+ "loss": 1.2302,
492
+ "step": 680
493
+ },
494
+ {
495
+ "epoch": 0.5578293163559194,
496
+ "grad_norm": 1.4783688191374078,
497
+ "learning_rate": 1.8751932349253595e-05,
498
+ "loss": 1.2183,
499
+ "step": 690
500
+ },
501
+ {
502
+ "epoch": 0.5659137992016573,
503
+ "grad_norm": 1.3696848099680126,
504
+ "learning_rate": 1.8709334735905908e-05,
505
+ "loss": 1.2202,
506
+ "step": 700
507
+ },
508
+ {
509
+ "epoch": 0.5739982820473952,
510
+ "grad_norm": 1.3843064222445587,
511
+ "learning_rate": 1.866607239665988e-05,
512
+ "loss": 1.2292,
513
+ "step": 710
514
+ },
515
+ {
516
+ "epoch": 0.5820827648931333,
517
+ "grad_norm": 1.3013446345815274,
518
+ "learning_rate": 1.8622148633443626e-05,
519
+ "loss": 1.2404,
520
+ "step": 720
521
+ },
522
+ {
523
+ "epoch": 0.5901672477388712,
524
+ "grad_norm": 1.3389494076775972,
525
+ "learning_rate": 1.8577566798667397e-05,
526
+ "loss": 1.2,
527
+ "step": 730
528
+ },
529
+ {
530
+ "epoch": 0.5982517305846091,
531
+ "grad_norm": 1.2803553653933784,
532
+ "learning_rate": 1.8532330294967678e-05,
533
+ "loss": 1.2019,
534
+ "step": 740
535
+ },
536
+ {
537
+ "epoch": 0.6063362134303472,
538
+ "grad_norm": 1.3940783442430897,
539
+ "learning_rate": 1.848644257494751e-05,
540
+ "loss": 1.2111,
541
+ "step": 750
542
+ },
543
+ {
544
+ "epoch": 0.6144206962760851,
545
+ "grad_norm": 1.2967372912925752,
546
+ "learning_rate": 1.8439907140912962e-05,
547
+ "loss": 1.2044,
548
+ "step": 760
549
+ },
550
+ {
551
+ "epoch": 0.622505179121823,
552
+ "grad_norm": 1.307050777866234,
553
+ "learning_rate": 1.839272754460583e-05,
554
+ "loss": 1.211,
555
+ "step": 770
556
+ },
557
+ {
558
+ "epoch": 0.630589661967561,
559
+ "grad_norm": 1.7851865803650349,
560
+ "learning_rate": 1.8344907386932552e-05,
561
+ "loss": 1.2038,
562
+ "step": 780
563
+ },
564
+ {
565
+ "epoch": 0.638674144813299,
566
+ "grad_norm": 1.8614266164299924,
567
+ "learning_rate": 1.8296450317689377e-05,
568
+ "loss": 1.2054,
569
+ "step": 790
570
+ },
571
+ {
572
+ "epoch": 0.6467586276590369,
573
+ "grad_norm": 1.3262638540650757,
574
+ "learning_rate": 1.824736003528381e-05,
575
+ "loss": 1.209,
576
+ "step": 800
577
+ },
578
+ {
579
+ "epoch": 0.654843110504775,
580
+ "grad_norm": 1.290793353111858,
581
+ "learning_rate": 1.8197640286452312e-05,
582
+ "loss": 1.213,
583
+ "step": 810
584
+ },
585
+ {
586
+ "epoch": 0.6629275933505129,
587
+ "grad_norm": 1.2558226934999566,
588
+ "learning_rate": 1.814729486597436e-05,
589
+ "loss": 1.2266,
590
+ "step": 820
591
+ },
592
+ {
593
+ "epoch": 0.6710120761962508,
594
+ "grad_norm": 1.277465841944589,
595
+ "learning_rate": 1.8096327616382815e-05,
596
+ "loss": 1.2167,
597
+ "step": 830
598
+ },
599
+ {
600
+ "epoch": 0.6790965590419887,
601
+ "grad_norm": 1.298887855615747,
602
+ "learning_rate": 1.8044742427670627e-05,
603
+ "loss": 1.2226,
604
+ "step": 840
605
+ },
606
+ {
607
+ "epoch": 0.6871810418877268,
608
+ "grad_norm": 5.857168222574854,
609
+ "learning_rate": 1.7992543236993952e-05,
610
+ "loss": 1.2027,
611
+ "step": 850
612
+ },
613
+ {
614
+ "epoch": 0.6952655247334647,
615
+ "grad_norm": 1.3361306728189393,
616
+ "learning_rate": 1.7939734028371663e-05,
617
+ "loss": 1.207,
618
+ "step": 860
619
+ },
620
+ {
621
+ "epoch": 0.7033500075792026,
622
+ "grad_norm": 1.3969769044659528,
623
+ "learning_rate": 1.7886318832381264e-05,
624
+ "loss": 1.1799,
625
+ "step": 870
626
+ },
627
+ {
628
+ "epoch": 0.7114344904249407,
629
+ "grad_norm": 1.4266930108547686,
630
+ "learning_rate": 1.783230172585126e-05,
631
+ "loss": 1.2111,
632
+ "step": 880
633
+ },
634
+ {
635
+ "epoch": 0.7195189732706786,
636
+ "grad_norm": 1.3440902999919684,
637
+ "learning_rate": 1.7777686831550008e-05,
638
+ "loss": 1.1854,
639
+ "step": 890
640
+ },
641
+ {
642
+ "epoch": 0.7276034561164165,
643
+ "grad_norm": 1.251718689797153,
644
+ "learning_rate": 1.7722478317871053e-05,
645
+ "loss": 1.1803,
646
+ "step": 900
647
+ },
648
+ {
649
+ "epoch": 0.7356879389621546,
650
+ "grad_norm": 1.2756808323680056,
651
+ "learning_rate": 1.7666680398514978e-05,
652
+ "loss": 1.2148,
653
+ "step": 910
654
+ },
655
+ {
656
+ "epoch": 0.7437724218078925,
657
+ "grad_norm": 1.3774590120848857,
658
+ "learning_rate": 1.76102973321678e-05,
659
+ "loss": 1.189,
660
+ "step": 920
661
+ },
662
+ {
663
+ "epoch": 0.7518569046536304,
664
+ "grad_norm": 1.5207360711907143,
665
+ "learning_rate": 1.7553333422175933e-05,
666
+ "loss": 1.1819,
667
+ "step": 930
668
+ },
669
+ {
670
+ "epoch": 0.7599413874993683,
671
+ "grad_norm": 1.302009300658742,
672
+ "learning_rate": 1.7495793016217754e-05,
673
+ "loss": 1.191,
674
+ "step": 940
675
+ },
676
+ {
677
+ "epoch": 0.7680258703451064,
678
+ "grad_norm": 1.3859954985668783,
679
+ "learning_rate": 1.743768050597175e-05,
680
+ "loss": 1.1835,
681
+ "step": 950
682
+ },
683
+ {
684
+ "epoch": 0.7761103531908443,
685
+ "grad_norm": 1.3435502591474426,
686
+ "learning_rate": 1.7379000326781348e-05,
687
+ "loss": 1.2035,
688
+ "step": 960
689
+ },
690
+ {
691
+ "epoch": 0.7841948360365822,
692
+ "grad_norm": 1.38981939520544,
693
+ "learning_rate": 1.7319756957316392e-05,
694
+ "loss": 1.1887,
695
+ "step": 970
696
+ },
697
+ {
698
+ "epoch": 0.7922793188823203,
699
+ "grad_norm": 1.4015519572670776,
700
+ "learning_rate": 1.725995491923131e-05,
701
+ "loss": 1.1843,
702
+ "step": 980
703
+ },
704
+ {
705
+ "epoch": 0.8003638017280582,
706
+ "grad_norm": 1.4763071143801054,
707
+ "learning_rate": 1.7199598776820013e-05,
708
+ "loss": 1.1753,
709
+ "step": 990
710
+ },
711
+ {
712
+ "epoch": 0.8084482845737961,
713
+ "grad_norm": 1.3577477544239007,
714
+ "learning_rate": 1.713869313666753e-05,
715
+ "loss": 1.1966,
716
+ "step": 1000
717
+ },
718
+ {
719
+ "epoch": 0.8165327674195342,
720
+ "grad_norm": 1.3963231420568032,
721
+ "learning_rate": 1.7077242647298405e-05,
722
+ "loss": 1.1985,
723
+ "step": 1010
724
+ },
725
+ {
726
+ "epoch": 0.8246172502652721,
727
+ "grad_norm": 1.5498623314696613,
728
+ "learning_rate": 1.7015251998821938e-05,
729
+ "loss": 1.1785,
730
+ "step": 1020
731
+ },
732
+ {
733
+ "epoch": 0.83270173311101,
734
+ "grad_norm": 1.3586468512222978,
735
+ "learning_rate": 1.6952725922574188e-05,
736
+ "loss": 1.1648,
737
+ "step": 1030
738
+ },
739
+ {
740
+ "epoch": 0.840786215956748,
741
+ "grad_norm": 1.4300342736321576,
742
+ "learning_rate": 1.688966919075687e-05,
743
+ "loss": 1.1666,
744
+ "step": 1040
745
+ },
746
+ {
747
+ "epoch": 0.848870698802486,
748
+ "grad_norm": 1.5788283624417567,
749
+ "learning_rate": 1.682608661607313e-05,
750
+ "loss": 1.1821,
751
+ "step": 1050
752
+ },
753
+ {
754
+ "epoch": 0.8569551816482239,
755
+ "grad_norm": 1.359570582214726,
756
+ "learning_rate": 1.6761983051360232e-05,
757
+ "loss": 1.1958,
758
+ "step": 1060
759
+ },
760
+ {
761
+ "epoch": 0.8650396644939619,
762
+ "grad_norm": 1.3046392847858388,
763
+ "learning_rate": 1.6697363389219147e-05,
764
+ "loss": 1.1557,
765
+ "step": 1070
766
+ },
767
+ {
768
+ "epoch": 0.8731241473396999,
769
+ "grad_norm": 1.4677129965264875,
770
+ "learning_rate": 1.6632232561641158e-05,
771
+ "loss": 1.1593,
772
+ "step": 1080
773
+ },
774
+ {
775
+ "epoch": 0.8812086301854378,
776
+ "grad_norm": 1.4859252531152671,
777
+ "learning_rate": 1.6566595539631417e-05,
778
+ "loss": 1.1753,
779
+ "step": 1090
780
+ },
781
+ {
782
+ "epoch": 0.8892931130311758,
783
+ "grad_norm": 1.3209365154297203,
784
+ "learning_rate": 1.6500457332829553e-05,
785
+ "loss": 1.161,
786
+ "step": 1100
787
+ },
788
+ {
789
+ "epoch": 0.8973775958769138,
790
+ "grad_norm": 1.3862159117294945,
791
+ "learning_rate": 1.6433822989127314e-05,
792
+ "loss": 1.1592,
793
+ "step": 1110
794
+ },
795
+ {
796
+ "epoch": 0.9054620787226517,
797
+ "grad_norm": 1.4456179949854164,
798
+ "learning_rate": 1.636669759428329e-05,
799
+ "loss": 1.1484,
800
+ "step": 1120
801
+ },
802
+ {
803
+ "epoch": 0.9135465615683896,
804
+ "grad_norm": 1.288756152636894,
805
+ "learning_rate": 1.6299086271534764e-05,
806
+ "loss": 1.181,
807
+ "step": 1130
808
+ },
809
+ {
810
+ "epoch": 0.9216310444141277,
811
+ "grad_norm": 1.2599229391965052,
812
+ "learning_rate": 1.6230994181206674e-05,
813
+ "loss": 1.1718,
814
+ "step": 1140
815
+ },
816
+ {
817
+ "epoch": 0.9297155272598656,
818
+ "grad_norm": 1.4973902946133841,
819
+ "learning_rate": 1.6162426520317765e-05,
820
+ "loss": 1.1773,
821
+ "step": 1150
822
+ },
823
+ {
824
+ "epoch": 0.9378000101056035,
825
+ "grad_norm": 1.3698767908727083,
826
+ "learning_rate": 1.6093388522183948e-05,
827
+ "loss": 1.1666,
828
+ "step": 1160
829
+ },
830
+ {
831
+ "epoch": 0.9458844929513415,
832
+ "grad_norm": 1.386433062647111,
833
+ "learning_rate": 1.6023885456018852e-05,
834
+ "loss": 1.1859,
835
+ "step": 1170
836
+ },
837
+ {
838
+ "epoch": 0.9539689757970795,
839
+ "grad_norm": 1.284904254015402,
840
+ "learning_rate": 1.595392262653168e-05,
841
+ "loss": 1.1906,
842
+ "step": 1180
843
+ },
844
+ {
845
+ "epoch": 0.9620534586428174,
846
+ "grad_norm": 1.4402131637475677,
847
+ "learning_rate": 1.5883505373522317e-05,
848
+ "loss": 1.1593,
849
+ "step": 1190
850
+ },
851
+ {
852
+ "epoch": 0.9701379414885554,
853
+ "grad_norm": 1.6049356540049453,
854
+ "learning_rate": 1.5812639071473804e-05,
855
+ "loss": 1.1636,
856
+ "step": 1200
857
+ },
858
+ {
859
+ "epoch": 0.9782224243342934,
860
+ "grad_norm": 1.505036374645861,
861
+ "learning_rate": 1.574132912914211e-05,
862
+ "loss": 1.14,
863
+ "step": 1210
864
+ },
865
+ {
866
+ "epoch": 0.9863069071800313,
867
+ "grad_norm": 1.6280895974825729,
868
+ "learning_rate": 1.566958098914334e-05,
869
+ "loss": 1.1358,
870
+ "step": 1220
871
+ },
872
+ {
873
+ "epoch": 0.9943913900257693,
874
+ "grad_norm": 1.2574161457807662,
875
+ "learning_rate": 1.5597400127538324e-05,
876
+ "loss": 1.1754,
877
+ "step": 1230
878
+ },
879
+ {
880
+ "epoch": 0.9992420797332121,
881
+ "eval_loss": 1.0555766820907593,
882
+ "eval_runtime": 476.758,
883
+ "eval_samples_per_second": 25.514,
884
+ "eval_steps_per_second": 12.757,
885
+ "step": 1236
886
+ },
887
+ {
888
+ "epoch": 1.0024758728715073,
889
+ "grad_norm": 2.9356360899500897,
890
+ "learning_rate": 1.5524792053414676e-05,
891
+ "loss": 1.1182,
892
+ "step": 1240
893
+ },
894
+ {
895
+ "epoch": 1.0105603557172451,
896
+ "grad_norm": 1.4115997260524025,
897
+ "learning_rate": 1.5451762308466302e-05,
898
+ "loss": 1.0448,
899
+ "step": 1250
900
+ },
901
+ {
902
+ "epoch": 1.0186448385629832,
903
+ "grad_norm": 1.4408354404654395,
904
+ "learning_rate": 1.5378316466570466e-05,
905
+ "loss": 1.027,
906
+ "step": 1260
907
+ },
908
+ {
909
+ "epoch": 1.0267293214087212,
910
+ "grad_norm": 1.40209737150782,
911
+ "learning_rate": 1.530446013336235e-05,
912
+ "loss": 1.0253,
913
+ "step": 1270
914
+ },
915
+ {
916
+ "epoch": 1.034813804254459,
917
+ "grad_norm": 1.4050923085204698,
918
+ "learning_rate": 1.5230198945807226e-05,
919
+ "loss": 1.0596,
920
+ "step": 1280
921
+ },
922
+ {
923
+ "epoch": 1.042898287100197,
924
+ "grad_norm": 1.3850604464116953,
925
+ "learning_rate": 1.515553857177022e-05,
926
+ "loss": 1.0354,
927
+ "step": 1290
928
+ },
929
+ {
930
+ "epoch": 1.050982769945935,
931
+ "grad_norm": 1.6192982769908866,
932
+ "learning_rate": 1.5080484709583715e-05,
933
+ "loss": 1.0338,
934
+ "step": 1300
935
+ },
936
+ {
937
+ "epoch": 1.059067252791673,
938
+ "grad_norm": 1.5443333411983042,
939
+ "learning_rate": 1.5005043087612452e-05,
940
+ "loss": 1.0224,
941
+ "step": 1310
942
+ },
943
+ {
944
+ "epoch": 1.067151735637411,
945
+ "grad_norm": 1.4795375887873081,
946
+ "learning_rate": 1.4929219463816302e-05,
947
+ "loss": 1.0273,
948
+ "step": 1320
949
+ },
950
+ {
951
+ "epoch": 1.075236218483149,
952
+ "grad_norm": 1.3952469643942318,
953
+ "learning_rate": 1.4853019625310813e-05,
954
+ "loss": 1.0165,
955
+ "step": 1330
956
+ },
957
+ {
958
+ "epoch": 1.0833207013288868,
959
+ "grad_norm": 1.4102438583126526,
960
+ "learning_rate": 1.4776449387925507e-05,
961
+ "loss": 1.0323,
962
+ "step": 1340
963
+ },
964
+ {
965
+ "epoch": 1.0914051841746248,
966
+ "grad_norm": 1.4166513317270177,
967
+ "learning_rate": 1.4699514595760006e-05,
968
+ "loss": 1.0343,
969
+ "step": 1350
970
+ },
971
+ {
972
+ "epoch": 1.0994896670203629,
973
+ "grad_norm": 1.4572773218335806,
974
+ "learning_rate": 1.4622221120737985e-05,
975
+ "loss": 1.0449,
976
+ "step": 1360
977
+ },
978
+ {
979
+ "epoch": 1.1075741498661007,
980
+ "grad_norm": 1.4277575864922984,
981
+ "learning_rate": 1.4544574862159013e-05,
982
+ "loss": 1.0157,
983
+ "step": 1370
984
+ },
985
+ {
986
+ "epoch": 1.1156586327118387,
987
+ "grad_norm": 1.8246683293221693,
988
+ "learning_rate": 1.446658174624829e-05,
989
+ "loss": 1.037,
990
+ "step": 1380
991
+ },
992
+ {
993
+ "epoch": 1.1237431155575768,
994
+ "grad_norm": 1.4515508954548648,
995
+ "learning_rate": 1.4388247725704338e-05,
996
+ "loss": 1.0163,
997
+ "step": 1390
998
+ },
999
+ {
1000
+ "epoch": 1.1318275984033146,
1001
+ "grad_norm": 1.4472625641065484,
1002
+ "learning_rate": 1.4309578779244678e-05,
1003
+ "loss": 1.0339,
1004
+ "step": 1400
1005
+ },
1006
+ {
1007
+ "epoch": 1.1399120812490526,
1008
+ "grad_norm": 1.441284439472294,
1009
+ "learning_rate": 1.423058091114951e-05,
1010
+ "loss": 1.0153,
1011
+ "step": 1410
1012
+ },
1013
+ {
1014
+ "epoch": 1.1479965640947905,
1015
+ "grad_norm": 1.4505444065925723,
1016
+ "learning_rate": 1.4151260150803445e-05,
1017
+ "loss": 1.0413,
1018
+ "step": 1420
1019
+ },
1020
+ {
1021
+ "epoch": 1.1560810469405285,
1022
+ "grad_norm": 1.5566575848024742,
1023
+ "learning_rate": 1.4071622552235327e-05,
1024
+ "loss": 1.014,
1025
+ "step": 1430
1026
+ },
1027
+ {
1028
+ "epoch": 1.1641655297862665,
1029
+ "grad_norm": 1.476527456836737,
1030
+ "learning_rate": 1.399167419365616e-05,
1031
+ "loss": 1.0374,
1032
+ "step": 1440
1033
+ },
1034
+ {
1035
+ "epoch": 1.1722500126320043,
1036
+ "grad_norm": 1.7587555981022083,
1037
+ "learning_rate": 1.3911421176995206e-05,
1038
+ "loss": 1.0145,
1039
+ "step": 1450
1040
+ },
1041
+ {
1042
+ "epoch": 1.1803344954777424,
1043
+ "grad_norm": 1.5447530212974045,
1044
+ "learning_rate": 1.3830869627434267e-05,
1045
+ "loss": 1.0104,
1046
+ "step": 1460
1047
+ },
1048
+ {
1049
+ "epoch": 1.1884189783234804,
1050
+ "grad_norm": 1.368002967716879,
1051
+ "learning_rate": 1.3750025692940174e-05,
1052
+ "loss": 1.0102,
1053
+ "step": 1470
1054
+ },
1055
+ {
1056
+ "epoch": 1.1965034611692182,
1057
+ "grad_norm": 1.5132346329088506,
1058
+ "learning_rate": 1.3668895543795581e-05,
1059
+ "loss": 1.0241,
1060
+ "step": 1480
1061
+ },
1062
+ {
1063
+ "epoch": 1.2045879440149563,
1064
+ "grad_norm": 1.4535090384504317,
1065
+ "learning_rate": 1.3587485372128e-05,
1066
+ "loss": 1.01,
1067
+ "step": 1490
1068
+ },
1069
+ {
1070
+ "epoch": 1.2126724268606943,
1071
+ "grad_norm": 1.6349536867702466,
1072
+ "learning_rate": 1.3505801391437215e-05,
1073
+ "loss": 1.0538,
1074
+ "step": 1500
1075
+ },
1076
+ {
1077
+ "epoch": 1.2207569097064321,
1078
+ "grad_norm": 1.608679365926187,
1079
+ "learning_rate": 1.3423849836121043e-05,
1080
+ "loss": 1.0256,
1081
+ "step": 1510
1082
+ },
1083
+ {
1084
+ "epoch": 1.2288413925521702,
1085
+ "grad_norm": 1.4875509565909706,
1086
+ "learning_rate": 1.33416369609995e-05,
1087
+ "loss": 1.0365,
1088
+ "step": 1520
1089
+ },
1090
+ {
1091
+ "epoch": 1.2369258753979082,
1092
+ "grad_norm": 1.4161399144655036,
1093
+ "learning_rate": 1.325916904083741e-05,
1094
+ "loss": 1.0285,
1095
+ "step": 1530
1096
+ },
1097
+ {
1098
+ "epoch": 1.245010358243646,
1099
+ "grad_norm": 1.516547180031239,
1100
+ "learning_rate": 1.3176452369865504e-05,
1101
+ "loss": 0.9972,
1102
+ "step": 1540
1103
+ },
1104
+ {
1105
+ "epoch": 1.253094841089384,
1106
+ "grad_norm": 1.4500310981963098,
1107
+ "learning_rate": 1.3093493261300012e-05,
1108
+ "loss": 1.0122,
1109
+ "step": 1550
1110
+ },
1111
+ {
1112
+ "epoch": 1.261179323935122,
1113
+ "grad_norm": 1.3787551364346502,
1114
+ "learning_rate": 1.3010298046860821e-05,
1115
+ "loss": 1.0221,
1116
+ "step": 1560
1117
+ },
1118
+ {
1119
+ "epoch": 1.26926380678086,
1120
+ "grad_norm": 1.3579456863416077,
1121
+ "learning_rate": 1.2926873076288222e-05,
1122
+ "loss": 1.0213,
1123
+ "step": 1570
1124
+ },
1125
+ {
1126
+ "epoch": 1.277348289626598,
1127
+ "grad_norm": 1.4774509503134268,
1128
+ "learning_rate": 1.2843224716858271e-05,
1129
+ "loss": 1.012,
1130
+ "step": 1580
1131
+ },
1132
+ {
1133
+ "epoch": 1.285432772472336,
1134
+ "grad_norm": 1.4805342986177266,
1135
+ "learning_rate": 1.2759359352896809e-05,
1136
+ "loss": 1.0193,
1137
+ "step": 1590
1138
+ },
1139
+ {
1140
+ "epoch": 1.2935172553180738,
1141
+ "grad_norm": 1.4527468028008124,
1142
+ "learning_rate": 1.2675283385292212e-05,
1143
+ "loss": 1.0431,
1144
+ "step": 1600
1145
+ },
1146
+ {
1147
+ "epoch": 1.3016017381638119,
1148
+ "grad_norm": 1.5688075844044822,
1149
+ "learning_rate": 1.259100323100682e-05,
1150
+ "loss": 1.0226,
1151
+ "step": 1610
1152
+ },
1153
+ {
1154
+ "epoch": 1.30968622100955,
1155
+ "grad_norm": 1.493324687221304,
1156
+ "learning_rate": 1.2506525322587207e-05,
1157
+ "loss": 0.9966,
1158
+ "step": 1620
1159
+ },
1160
+ {
1161
+ "epoch": 1.3177707038552877,
1162
+ "grad_norm": 1.563824009098089,
1163
+ "learning_rate": 1.2421856107673205e-05,
1164
+ "loss": 1.0317,
1165
+ "step": 1630
1166
+ },
1167
+ {
1168
+ "epoch": 1.3258551867010258,
1169
+ "grad_norm": 1.4698666764020467,
1170
+ "learning_rate": 1.233700204850581e-05,
1171
+ "loss": 1.0013,
1172
+ "step": 1640
1173
+ },
1174
+ {
1175
+ "epoch": 1.3339396695467638,
1176
+ "grad_norm": 1.625463847709757,
1177
+ "learning_rate": 1.2251969621433947e-05,
1178
+ "loss": 1.0233,
1179
+ "step": 1650
1180
+ },
1181
+ {
1182
+ "epoch": 1.3420241523925016,
1183
+ "grad_norm": 1.560576858468798,
1184
+ "learning_rate": 1.2166765316420195e-05,
1185
+ "loss": 1.0137,
1186
+ "step": 1660
1187
+ },
1188
+ {
1189
+ "epoch": 1.3501086352382397,
1190
+ "grad_norm": 1.6305115869655395,
1191
+ "learning_rate": 1.2081395636545432e-05,
1192
+ "loss": 1.0074,
1193
+ "step": 1670
1194
+ },
1195
+ {
1196
+ "epoch": 1.3581931180839777,
1197
+ "grad_norm": 1.683367869903662,
1198
+ "learning_rate": 1.1995867097512504e-05,
1199
+ "loss": 1.0202,
1200
+ "step": 1680
1201
+ },
1202
+ {
1203
+ "epoch": 1.3662776009297155,
1204
+ "grad_norm": 1.342629975477622,
1205
+ "learning_rate": 1.191018622714893e-05,
1206
+ "loss": 1.0039,
1207
+ "step": 1690
1208
+ },
1209
+ {
1210
+ "epoch": 1.3743620837754535,
1211
+ "grad_norm": 1.4162506108365653,
1212
+ "learning_rate": 1.1824359564908667e-05,
1213
+ "loss": 1.0303,
1214
+ "step": 1700
1215
+ },
1216
+ {
1217
+ "epoch": 1.3824465666211916,
1218
+ "grad_norm": 1.4322509952288762,
1219
+ "learning_rate": 1.1738393661373004e-05,
1220
+ "loss": 1.0223,
1221
+ "step": 1710
1222
+ },
1223
+ {
1224
+ "epoch": 1.3905310494669294,
1225
+ "grad_norm": 1.4429525488762647,
1226
+ "learning_rate": 1.1652295077750599e-05,
1227
+ "loss": 1.0079,
1228
+ "step": 1720
1229
+ },
1230
+ {
1231
+ "epoch": 1.3986155323126674,
1232
+ "grad_norm": 1.5044521870868257,
1233
+ "learning_rate": 1.1566070385376705e-05,
1234
+ "loss": 0.9903,
1235
+ "step": 1730
1236
+ },
1237
+ {
1238
+ "epoch": 1.4067000151584053,
1239
+ "grad_norm": 1.4591518605463256,
1240
+ "learning_rate": 1.1479726165211609e-05,
1241
+ "loss": 1.0133,
1242
+ "step": 1740
1243
+ },
1244
+ {
1245
+ "epoch": 1.4147844980041433,
1246
+ "grad_norm": 1.38699009818023,
1247
+ "learning_rate": 1.1393269007338375e-05,
1248
+ "loss": 1.0191,
1249
+ "step": 1750
1250
+ },
1251
+ {
1252
+ "epoch": 1.4228689808498813,
1253
+ "grad_norm": 1.4248174199771946,
1254
+ "learning_rate": 1.1306705510459852e-05,
1255
+ "loss": 1.0048,
1256
+ "step": 1760
1257
+ },
1258
+ {
1259
+ "epoch": 1.4309534636956192,
1260
+ "grad_norm": 1.5368128288739022,
1261
+ "learning_rate": 1.1220042281395042e-05,
1262
+ "loss": 1.0169,
1263
+ "step": 1770
1264
+ },
1265
+ {
1266
+ "epoch": 1.4390379465413572,
1267
+ "grad_norm": 1.620365193180215,
1268
+ "learning_rate": 1.1133285934574849e-05,
1269
+ "loss": 0.9982,
1270
+ "step": 1780
1271
+ },
1272
+ {
1273
+ "epoch": 1.447122429387095,
1274
+ "grad_norm": 1.4821421519804139,
1275
+ "learning_rate": 1.1046443091537232e-05,
1276
+ "loss": 1.0241,
1277
+ "step": 1790
1278
+ },
1279
+ {
1280
+ "epoch": 1.455206912232833,
1281
+ "grad_norm": 1.5012997646705204,
1282
+ "learning_rate": 1.0959520380421831e-05,
1283
+ "loss": 1.0116,
1284
+ "step": 1800
1285
+ },
1286
+ {
1287
+ "epoch": 1.463291395078571,
1288
+ "grad_norm": 1.4878335919543981,
1289
+ "learning_rate": 1.0872524435464104e-05,
1290
+ "loss": 0.9993,
1291
+ "step": 1810
1292
+ },
1293
+ {
1294
+ "epoch": 1.471375877924309,
1295
+ "grad_norm": 1.3918759318142178,
1296
+ "learning_rate": 1.0785461896488947e-05,
1297
+ "loss": 1.0103,
1298
+ "step": 1820
1299
+ },
1300
+ {
1301
+ "epoch": 1.479460360770047,
1302
+ "grad_norm": 1.7724767013914755,
1303
+ "learning_rate": 1.0698339408403944e-05,
1304
+ "loss": 0.9862,
1305
+ "step": 1830
1306
+ },
1307
+ {
1308
+ "epoch": 1.487544843615785,
1309
+ "grad_norm": 2.0093844914876717,
1310
+ "learning_rate": 1.06111636206922e-05,
1311
+ "loss": 1.0039,
1312
+ "step": 1840
1313
+ },
1314
+ {
1315
+ "epoch": 1.4956293264615228,
1316
+ "grad_norm": 1.4440349729006745,
1317
+ "learning_rate": 1.0523941186904823e-05,
1318
+ "loss": 1.0091,
1319
+ "step": 1850
1320
+ },
1321
+ {
1322
+ "epoch": 1.5037138093072608,
1323
+ "grad_norm": 1.5530469064140777,
1324
+ "learning_rate": 1.043667876415311e-05,
1325
+ "loss": 0.9959,
1326
+ "step": 1860
1327
+ },
1328
+ {
1329
+ "epoch": 1.5117982921529989,
1330
+ "grad_norm": 1.9710010624543786,
1331
+ "learning_rate": 1.0349383012600448e-05,
1332
+ "loss": 0.9902,
1333
+ "step": 1870
1334
+ },
1335
+ {
1336
+ "epoch": 1.5198827749987367,
1337
+ "grad_norm": 1.4874119470603941,
1338
+ "learning_rate": 1.0262060594954e-05,
1339
+ "loss": 0.9889,
1340
+ "step": 1880
1341
+ },
1342
+ {
1343
+ "epoch": 1.5279672578444747,
1344
+ "grad_norm": 1.5760932908781828,
1345
+ "learning_rate": 1.0174718175956164e-05,
1346
+ "loss": 0.997,
1347
+ "step": 1890
1348
+ },
1349
+ {
1350
+ "epoch": 1.5360517406902128,
1351
+ "grad_norm": 1.5140336706570001,
1352
+ "learning_rate": 1.0087362421875912e-05,
1353
+ "loss": 1.0162,
1354
+ "step": 1900
1355
+ },
1356
+ {
1357
+ "epoch": 1.5441362235359506,
1358
+ "grad_norm": 1.4275012742483075,
1359
+ "learning_rate": 1e-05,
1360
+ "loss": 1.0056,
1361
+ "step": 1910
1362
+ },
1363
+ {
1364
+ "epoch": 1.5522207063816886,
1365
+ "grad_norm": 1.4479646715349155,
1366
+ "learning_rate": 9.912637578124092e-06,
1367
+ "loss": 0.9831,
1368
+ "step": 1920
1369
+ },
1370
+ {
1371
+ "epoch": 1.5603051892274267,
1372
+ "grad_norm": 1.6529106306573094,
1373
+ "learning_rate": 9.825281824043838e-06,
1374
+ "loss": 1.0009,
1375
+ "step": 1930
1376
+ },
1377
+ {
1378
+ "epoch": 1.5683896720731645,
1379
+ "grad_norm": 1.4537655155385498,
1380
+ "learning_rate": 9.737939405046002e-06,
1381
+ "loss": 1.0058,
1382
+ "step": 1940
1383
+ },
1384
+ {
1385
+ "epoch": 1.5764741549189025,
1386
+ "grad_norm": 1.3881828231981752,
1387
+ "learning_rate": 9.650616987399553e-06,
1388
+ "loss": 0.9752,
1389
+ "step": 1950
1390
+ },
1391
+ {
1392
+ "epoch": 1.5845586377646406,
1393
+ "grad_norm": 1.4410127433172688,
1394
+ "learning_rate": 9.563321235846894e-06,
1395
+ "loss": 1.0026,
1396
+ "step": 1960
1397
+ },
1398
+ {
1399
+ "epoch": 1.5926431206103784,
1400
+ "grad_norm": 1.6585729752037028,
1401
+ "learning_rate": 9.476058813095182e-06,
1402
+ "loss": 0.9942,
1403
+ "step": 1970
1404
+ },
1405
+ {
1406
+ "epoch": 1.6007276034561164,
1407
+ "grad_norm": 1.6572316797520206,
1408
+ "learning_rate": 9.388836379307802e-06,
1409
+ "loss": 0.9968,
1410
+ "step": 1980
1411
+ },
1412
+ {
1413
+ "epoch": 1.6088120863018545,
1414
+ "grad_norm": 1.451151024162774,
1415
+ "learning_rate": 9.301660591596059e-06,
1416
+ "loss": 0.9921,
1417
+ "step": 1990
1418
+ },
1419
+ {
1420
+ "epoch": 1.6168965691475923,
1421
+ "grad_norm": 1.5042478185497792,
1422
+ "learning_rate": 9.214538103511053e-06,
1423
+ "loss": 0.9959,
1424
+ "step": 2000
1425
+ },
1426
+ {
1427
+ "epoch": 1.6249810519933303,
1428
+ "grad_norm": 1.4096442655309245,
1429
+ "learning_rate": 9.127475564535898e-06,
1430
+ "loss": 0.9944,
1431
+ "step": 2010
1432
+ },
1433
+ {
1434
+ "epoch": 1.6330655348390684,
1435
+ "grad_norm": 1.3701103693221475,
1436
+ "learning_rate": 9.04047961957817e-06,
1437
+ "loss": 0.9806,
1438
+ "step": 2020
1439
+ },
1440
+ {
1441
+ "epoch": 1.6411500176848062,
1442
+ "grad_norm": 1.6771886101217564,
1443
+ "learning_rate": 8.953556908462773e-06,
1444
+ "loss": 0.9986,
1445
+ "step": 2030
1446
+ },
1447
+ {
1448
+ "epoch": 1.6492345005305442,
1449
+ "grad_norm": 1.4606744478213272,
1450
+ "learning_rate": 8.866714065425154e-06,
1451
+ "loss": 0.9894,
1452
+ "step": 2040
1453
+ },
1454
+ {
1455
+ "epoch": 1.6573189833762823,
1456
+ "grad_norm": 1.5696191298486186,
1457
+ "learning_rate": 8.779957718604956e-06,
1458
+ "loss": 1.0055,
1459
+ "step": 2050
1460
+ },
1461
+ {
1462
+ "epoch": 1.66540346622202,
1463
+ "grad_norm": 1.4621439613400917,
1464
+ "learning_rate": 8.693294489540151e-06,
1465
+ "loss": 1.0055,
1466
+ "step": 2060
1467
+ },
1468
+ {
1469
+ "epoch": 1.673487949067758,
1470
+ "grad_norm": 1.4224764910826249,
1471
+ "learning_rate": 8.60673099266163e-06,
1472
+ "loss": 0.9687,
1473
+ "step": 2070
1474
+ },
1475
+ {
1476
+ "epoch": 1.6815724319134961,
1477
+ "grad_norm": 1.6938323822086323,
1478
+ "learning_rate": 8.520273834788395e-06,
1479
+ "loss": 0.978,
1480
+ "step": 2080
1481
+ },
1482
+ {
1483
+ "epoch": 1.689656914759234,
1484
+ "grad_norm": 1.5856717495753165,
1485
+ "learning_rate": 8.4339296146233e-06,
1486
+ "loss": 0.992,
1487
+ "step": 2090
1488
+ },
1489
+ {
1490
+ "epoch": 1.697741397604972,
1491
+ "grad_norm": 1.4737528022353619,
1492
+ "learning_rate": 8.3477049222494e-06,
1493
+ "loss": 0.9882,
1494
+ "step": 2100
1495
+ },
1496
+ {
1497
+ "epoch": 1.70582588045071,
1498
+ "grad_norm": 1.4413576604331515,
1499
+ "learning_rate": 8.261606338626998e-06,
1500
+ "loss": 0.9717,
1501
+ "step": 2110
1502
+ },
1503
+ {
1504
+ "epoch": 1.7139103632964479,
1505
+ "grad_norm": 1.4533604100239785,
1506
+ "learning_rate": 8.17564043509134e-06,
1507
+ "loss": 0.9878,
1508
+ "step": 2120
1509
+ },
1510
+ {
1511
+ "epoch": 1.7219948461421857,
1512
+ "grad_norm": 1.4996211527080612,
1513
+ "learning_rate": 8.089813772851073e-06,
1514
+ "loss": 0.9932,
1515
+ "step": 2130
1516
+ },
1517
+ {
1518
+ "epoch": 1.730079328987924,
1519
+ "grad_norm": 1.4183735479797297,
1520
+ "learning_rate": 8.004132902487499e-06,
1521
+ "loss": 1.0021,
1522
+ "step": 2140
1523
+ },
1524
+ {
1525
+ "epoch": 1.7381638118336618,
1526
+ "grad_norm": 1.4020103234354604,
1527
+ "learning_rate": 7.91860436345457e-06,
1528
+ "loss": 0.9717,
1529
+ "step": 2150
1530
+ },
1531
+ {
1532
+ "epoch": 1.7462482946793996,
1533
+ "grad_norm": 1.4529101522297827,
1534
+ "learning_rate": 7.833234683579806e-06,
1535
+ "loss": 0.9844,
1536
+ "step": 2160
1537
+ },
1538
+ {
1539
+ "epoch": 1.7543327775251378,
1540
+ "grad_norm": 1.4502465958251158,
1541
+ "learning_rate": 7.748030378566056e-06,
1542
+ "loss": 0.9782,
1543
+ "step": 2170
1544
+ },
1545
+ {
1546
+ "epoch": 1.7624172603708756,
1547
+ "grad_norm": 1.4461707858445054,
1548
+ "learning_rate": 7.662997951494193e-06,
1549
+ "loss": 0.9836,
1550
+ "step": 2180
1551
+ },
1552
+ {
1553
+ "epoch": 1.7705017432166135,
1554
+ "grad_norm": 1.3966480403360386,
1555
+ "learning_rate": 7.578143892326797e-06,
1556
+ "loss": 1.0089,
1557
+ "step": 2190
1558
+ },
1559
+ {
1560
+ "epoch": 1.7785862260623517,
1561
+ "grad_norm": 1.5838575969719086,
1562
+ "learning_rate": 7.493474677412795e-06,
1563
+ "loss": 1.0017,
1564
+ "step": 2200
1565
+ },
1566
+ {
1567
+ "epoch": 1.7866707089080895,
1568
+ "grad_norm": 1.6412461821364432,
1569
+ "learning_rate": 7.408996768993184e-06,
1570
+ "loss": 0.9889,
1571
+ "step": 2210
1572
+ },
1573
+ {
1574
+ "epoch": 1.7947551917538274,
1575
+ "grad_norm": 1.8686882471940454,
1576
+ "learning_rate": 7.324716614707794e-06,
1577
+ "loss": 0.9814,
1578
+ "step": 2220
1579
+ },
1580
+ {
1581
+ "epoch": 1.8028396745995656,
1582
+ "grad_norm": 1.4444454657231485,
1583
+ "learning_rate": 7.240640647103192e-06,
1584
+ "loss": 0.9934,
1585
+ "step": 2230
1586
+ },
1587
+ {
1588
+ "epoch": 1.8109241574453034,
1589
+ "grad_norm": 1.5880994051473134,
1590
+ "learning_rate": 7.156775283141733e-06,
1591
+ "loss": 0.9972,
1592
+ "step": 2240
1593
+ },
1594
+ {
1595
+ "epoch": 1.8190086402910413,
1596
+ "grad_norm": 1.6179768250952558,
1597
+ "learning_rate": 7.0731269237117775e-06,
1598
+ "loss": 0.9805,
1599
+ "step": 2250
1600
+ },
1601
+ {
1602
+ "epoch": 1.8270931231367793,
1603
+ "grad_norm": 1.4161571668846493,
1604
+ "learning_rate": 6.989701953139181e-06,
1605
+ "loss": 0.9695,
1606
+ "step": 2260
1607
+ },
1608
+ {
1609
+ "epoch": 1.8351776059825173,
1610
+ "grad_norm": 1.8752619329260358,
1611
+ "learning_rate": 6.906506738699994e-06,
1612
+ "loss": 0.9899,
1613
+ "step": 2270
1614
+ },
1615
+ {
1616
+ "epoch": 1.8432620888282552,
1617
+ "grad_norm": 1.8476640791436918,
1618
+ "learning_rate": 6.823547630134497e-06,
1619
+ "loss": 0.9799,
1620
+ "step": 2280
1621
+ },
1622
+ {
1623
+ "epoch": 1.8513465716739932,
1624
+ "grad_norm": 1.5003229948984453,
1625
+ "learning_rate": 6.740830959162592e-06,
1626
+ "loss": 0.9948,
1627
+ "step": 2290
1628
+ },
1629
+ {
1630
+ "epoch": 1.8594310545197312,
1631
+ "grad_norm": 1.4363919724793655,
1632
+ "learning_rate": 6.658363039000501e-06,
1633
+ "loss": 0.9625,
1634
+ "step": 2300
1635
+ },
1636
+ {
1637
+ "epoch": 1.867515537365469,
1638
+ "grad_norm": 1.45857815520064,
1639
+ "learning_rate": 6.57615016387896e-06,
1640
+ "loss": 0.976,
1641
+ "step": 2310
1642
+ },
1643
+ {
1644
+ "epoch": 1.875600020211207,
1645
+ "grad_norm": 1.3637017381911254,
1646
+ "learning_rate": 6.4941986085627895e-06,
1647
+ "loss": 0.9608,
1648
+ "step": 2320
1649
+ },
1650
+ {
1651
+ "epoch": 1.8836845030569451,
1652
+ "grad_norm": 1.586134857640991,
1653
+ "learning_rate": 6.412514627872003e-06,
1654
+ "loss": 0.9702,
1655
+ "step": 2330
1656
+ },
1657
+ {
1658
+ "epoch": 1.891768985902683,
1659
+ "grad_norm": 1.6293874205755696,
1660
+ "learning_rate": 6.331104456204423e-06,
1661
+ "loss": 0.9672,
1662
+ "step": 2340
1663
+ },
1664
+ {
1665
+ "epoch": 1.899853468748421,
1666
+ "grad_norm": 1.6185456719315228,
1667
+ "learning_rate": 6.249974307059826e-06,
1668
+ "loss": 0.9683,
1669
+ "step": 2350
1670
+ },
1671
+ {
1672
+ "epoch": 1.907937951594159,
1673
+ "grad_norm": 1.5897776438113254,
1674
+ "learning_rate": 6.169130372565737e-06,
1675
+ "loss": 0.9942,
1676
+ "step": 2360
1677
+ },
1678
+ {
1679
+ "epoch": 1.9160224344398968,
1680
+ "grad_norm": 1.4621464766459995,
1681
+ "learning_rate": 6.088578823004796e-06,
1682
+ "loss": 0.9552,
1683
+ "step": 2370
1684
+ },
1685
+ {
1686
+ "epoch": 1.9241069172856349,
1687
+ "grad_norm": 1.57419066036152,
1688
+ "learning_rate": 6.008325806343842e-06,
1689
+ "loss": 0.9635,
1690
+ "step": 2380
1691
+ },
1692
+ {
1693
+ "epoch": 1.932191400131373,
1694
+ "grad_norm": 1.4154240767952921,
1695
+ "learning_rate": 5.9283774477646775e-06,
1696
+ "loss": 0.9661,
1697
+ "step": 2390
1698
+ },
1699
+ {
1700
+ "epoch": 1.9402758829771107,
1701
+ "grad_norm": 1.4089774352311322,
1702
+ "learning_rate": 5.848739849196556e-06,
1703
+ "loss": 0.9623,
1704
+ "step": 2400
1705
+ },
1706
+ {
1707
+ "epoch": 1.9483603658228488,
1708
+ "grad_norm": 1.4330997113061938,
1709
+ "learning_rate": 5.7694190888504964e-06,
1710
+ "loss": 0.982,
1711
+ "step": 2410
1712
+ },
1713
+ {
1714
+ "epoch": 1.9564448486685868,
1715
+ "grad_norm": 1.762833270995275,
1716
+ "learning_rate": 5.690421220755329e-06,
1717
+ "loss": 0.968,
1718
+ "step": 2420
1719
+ },
1720
+ {
1721
+ "epoch": 1.9645293315143246,
1722
+ "grad_norm": 1.57370551896378,
1723
+ "learning_rate": 5.611752274295665e-06,
1724
+ "loss": 0.9639,
1725
+ "step": 2430
1726
+ },
1727
+ {
1728
+ "epoch": 1.9726138143600627,
1729
+ "grad_norm": 1.4682932578058885,
1730
+ "learning_rate": 5.533418253751714e-06,
1731
+ "loss": 0.9786,
1732
+ "step": 2440
1733
+ },
1734
+ {
1735
+ "epoch": 1.9806982972058007,
1736
+ "grad_norm": 1.7633821953728437,
1737
+ "learning_rate": 5.455425137840987e-06,
1738
+ "loss": 0.9618,
1739
+ "step": 2450
1740
+ },
1741
+ {
1742
+ "epoch": 1.9887827800515385,
1743
+ "grad_norm": 1.5018261369656176,
1744
+ "learning_rate": 5.377778879262017e-06,
1745
+ "loss": 0.9454,
1746
+ "step": 2460
1747
+ },
1748
+ {
1749
+ "epoch": 1.9968672628972766,
1750
+ "grad_norm": 1.5404280086355402,
1751
+ "learning_rate": 5.300485404239999e-06,
1752
+ "loss": 0.9628,
1753
+ "step": 2470
1754
+ },
1755
+ {
1756
+ "epoch": 1.999292607750998,
1757
+ "eval_loss": 0.8751075863838196,
1758
+ "eval_runtime": 481.67,
1759
+ "eval_samples_per_second": 25.254,
1760
+ "eval_steps_per_second": 12.627,
1761
+ "step": 2473
1762
+ },
1763
+ {
1764
+ "epoch": 2.0049517457430146,
1765
+ "grad_norm": 1.8577507088673693,
1766
+ "learning_rate": 5.223550612074497e-06,
1767
+ "loss": 0.8752,
1768
+ "step": 2480
1769
+ },
1770
+ {
1771
+ "epoch": 2.0130362285887524,
1772
+ "grad_norm": 1.5570324756102374,
1773
+ "learning_rate": 5.146980374689192e-06,
1774
+ "loss": 0.8398,
1775
+ "step": 2490
1776
+ },
1777
+ {
1778
+ "epoch": 2.0211207114344902,
1779
+ "grad_norm": 1.645225536576169,
1780
+ "learning_rate": 5.070780536183698e-06,
1781
+ "loss": 0.856,
1782
+ "step": 2500
1783
+ },
1784
+ {
1785
+ "epoch": 2.0292051942802285,
1786
+ "grad_norm": 1.6698633554870226,
1787
+ "learning_rate": 4.99495691238755e-06,
1788
+ "loss": 0.8365,
1789
+ "step": 2510
1790
+ },
1791
+ {
1792
+ "epoch": 2.0372896771259663,
1793
+ "grad_norm": 2.010967933907663,
1794
+ "learning_rate": 4.9195152904162865e-06,
1795
+ "loss": 0.8308,
1796
+ "step": 2520
1797
+ },
1798
+ {
1799
+ "epoch": 2.045374159971704,
1800
+ "grad_norm": 1.4592026658551123,
1801
+ "learning_rate": 4.844461428229782e-06,
1802
+ "loss": 0.8387,
1803
+ "step": 2530
1804
+ },
1805
+ {
1806
+ "epoch": 2.0534586428174424,
1807
+ "grad_norm": 1.9716723547932462,
1808
+ "learning_rate": 4.769801054192776e-06,
1809
+ "loss": 0.8374,
1810
+ "step": 2540
1811
+ },
1812
+ {
1813
+ "epoch": 2.06154312566318,
1814
+ "grad_norm": 1.6334367414667887,
1815
+ "learning_rate": 4.695539866637653e-06,
1816
+ "loss": 0.8587,
1817
+ "step": 2550
1818
+ },
1819
+ {
1820
+ "epoch": 2.069627608508918,
1821
+ "grad_norm": 1.713926689166813,
1822
+ "learning_rate": 4.6216835334295385e-06,
1823
+ "loss": 0.8376,
1824
+ "step": 2560
1825
+ },
1826
+ {
1827
+ "epoch": 2.0777120913546563,
1828
+ "grad_norm": 1.5714175555320091,
1829
+ "learning_rate": 4.548237691533699e-06,
1830
+ "loss": 0.8346,
1831
+ "step": 2570
1832
+ },
1833
+ {
1834
+ "epoch": 2.085796574200394,
1835
+ "grad_norm": 1.4811489223457255,
1836
+ "learning_rate": 4.475207946585328e-06,
1837
+ "loss": 0.8473,
1838
+ "step": 2580
1839
+ },
1840
+ {
1841
+ "epoch": 2.093881057046132,
1842
+ "grad_norm": 1.4400201402098334,
1843
+ "learning_rate": 4.402599872461678e-06,
1844
+ "loss": 0.8309,
1845
+ "step": 2590
1846
+ },
1847
+ {
1848
+ "epoch": 2.10196553989187,
1849
+ "grad_norm": 1.5527150219002093,
1850
+ "learning_rate": 4.330419010856661e-06,
1851
+ "loss": 0.8312,
1852
+ "step": 2600
1853
+ },
1854
+ {
1855
+ "epoch": 2.110050022737608,
1856
+ "grad_norm": 1.4540137626455856,
1857
+ "learning_rate": 4.258670870857894e-06,
1858
+ "loss": 0.8461,
1859
+ "step": 2610
1860
+ },
1861
+ {
1862
+ "epoch": 2.118134505583346,
1863
+ "grad_norm": 1.5200526871374724,
1864
+ "learning_rate": 4.187360928526198e-06,
1865
+ "loss": 0.8353,
1866
+ "step": 2620
1867
+ },
1868
+ {
1869
+ "epoch": 2.126218988429084,
1870
+ "grad_norm": 1.487656190760893,
1871
+ "learning_rate": 4.116494626477684e-06,
1872
+ "loss": 0.842,
1873
+ "step": 2630
1874
+ },
1875
+ {
1876
+ "epoch": 2.134303471274822,
1877
+ "grad_norm": 1.4541876796717628,
1878
+ "learning_rate": 4.046077373468325e-06,
1879
+ "loss": 0.8285,
1880
+ "step": 2640
1881
+ },
1882
+ {
1883
+ "epoch": 2.1423879541205597,
1884
+ "grad_norm": 1.515080712913025,
1885
+ "learning_rate": 3.976114543981148e-06,
1886
+ "loss": 0.8278,
1887
+ "step": 2650
1888
+ },
1889
+ {
1890
+ "epoch": 2.150472436966298,
1891
+ "grad_norm": 1.5925627792233104,
1892
+ "learning_rate": 3.906611477816054e-06,
1893
+ "loss": 0.8382,
1894
+ "step": 2660
1895
+ },
1896
+ {
1897
+ "epoch": 2.158556919812036,
1898
+ "grad_norm": 1.4749306746231339,
1899
+ "learning_rate": 3.837573479682236e-06,
1900
+ "loss": 0.8453,
1901
+ "step": 2670
1902
+ },
1903
+ {
1904
+ "epoch": 2.1666414026577736,
1905
+ "grad_norm": 1.888042329530717,
1906
+ "learning_rate": 3.769005818793329e-06,
1907
+ "loss": 0.854,
1908
+ "step": 2680
1909
+ },
1910
+ {
1911
+ "epoch": 2.174725885503512,
1912
+ "grad_norm": 1.598037794600047,
1913
+ "learning_rate": 3.7009137284652386e-06,
1914
+ "loss": 0.8519,
1915
+ "step": 2690
1916
+ },
1917
+ {
1918
+ "epoch": 2.1828103683492497,
1919
+ "grad_norm": 1.5540837615094885,
1920
+ "learning_rate": 3.633302405716712e-06,
1921
+ "loss": 0.8397,
1922
+ "step": 2700
1923
+ },
1924
+ {
1925
+ "epoch": 2.1908948511949875,
1926
+ "grad_norm": 1.430485289060877,
1927
+ "learning_rate": 3.5661770108726914e-06,
1928
+ "loss": 0.8271,
1929
+ "step": 2710
1930
+ },
1931
+ {
1932
+ "epoch": 2.1989793340407258,
1933
+ "grad_norm": 2.401835949374892,
1934
+ "learning_rate": 3.4995426671704493e-06,
1935
+ "loss": 0.8335,
1936
+ "step": 2720
1937
+ },
1938
+ {
1939
+ "epoch": 2.2070638168864636,
1940
+ "grad_norm": 1.506353292247366,
1941
+ "learning_rate": 3.433404460368587e-06,
1942
+ "loss": 0.828,
1943
+ "step": 2730
1944
+ },
1945
+ {
1946
+ "epoch": 2.2151482997322014,
1947
+ "grad_norm": 1.4406717845115946,
1948
+ "learning_rate": 3.3677674383588476e-06,
1949
+ "loss": 0.8315,
1950
+ "step": 2740
1951
+ },
1952
+ {
1953
+ "epoch": 2.2232327825779397,
1954
+ "grad_norm": 1.5393945850323205,
1955
+ "learning_rate": 3.302636610780855e-06,
1956
+ "loss": 0.8504,
1957
+ "step": 2750
1958
+ },
1959
+ {
1960
+ "epoch": 2.2313172654236775,
1961
+ "grad_norm": 1.7257558230682333,
1962
+ "learning_rate": 3.238016948639772e-06,
1963
+ "loss": 0.8232,
1964
+ "step": 2760
1965
+ },
1966
+ {
1967
+ "epoch": 2.2394017482694153,
1968
+ "grad_norm": 1.8326756661400847,
1969
+ "learning_rate": 3.1739133839268698e-06,
1970
+ "loss": 0.8154,
1971
+ "step": 2770
1972
+ },
1973
+ {
1974
+ "epoch": 2.2474862311151536,
1975
+ "grad_norm": 1.5269518503128512,
1976
+ "learning_rate": 3.110330809243134e-06,
1977
+ "loss": 0.8317,
1978
+ "step": 2780
1979
+ },
1980
+ {
1981
+ "epoch": 2.2555707139608914,
1982
+ "grad_norm": 1.504166909878008,
1983
+ "learning_rate": 3.0472740774258157e-06,
1984
+ "loss": 0.8368,
1985
+ "step": 2790
1986
+ },
1987
+ {
1988
+ "epoch": 2.263655196806629,
1989
+ "grad_norm": 1.480047137104623,
1990
+ "learning_rate": 2.9847480011780607e-06,
1991
+ "loss": 0.8409,
1992
+ "step": 2800
1993
+ },
1994
+ {
1995
+ "epoch": 2.2717396796523674,
1996
+ "grad_norm": 1.492023552078346,
1997
+ "learning_rate": 2.922757352701595e-06,
1998
+ "loss": 0.8243,
1999
+ "step": 2810
2000
+ },
2001
+ {
2002
+ "epoch": 2.2798241624981053,
2003
+ "grad_norm": 1.467055149697424,
2004
+ "learning_rate": 2.861306863332475e-06,
2005
+ "loss": 0.8289,
2006
+ "step": 2820
2007
+ },
2008
+ {
2009
+ "epoch": 2.287908645343843,
2010
+ "grad_norm": 1.504514345406056,
2011
+ "learning_rate": 2.8004012231799905e-06,
2012
+ "loss": 0.8375,
2013
+ "step": 2830
2014
+ },
2015
+ {
2016
+ "epoch": 2.295993128189581,
2017
+ "grad_norm": 1.5091792435489357,
2018
+ "learning_rate": 2.740045080768694e-06,
2019
+ "loss": 0.8233,
2020
+ "step": 2840
2021
+ },
2022
+ {
2023
+ "epoch": 2.304077611035319,
2024
+ "grad_norm": 1.4619080284602382,
2025
+ "learning_rate": 2.6802430426836113e-06,
2026
+ "loss": 0.8356,
2027
+ "step": 2850
2028
+ },
2029
+ {
2030
+ "epoch": 2.312162093881057,
2031
+ "grad_norm": 1.4085751552174153,
2032
+ "learning_rate": 2.620999673218656e-06,
2033
+ "loss": 0.8156,
2034
+ "step": 2860
2035
+ },
2036
+ {
2037
+ "epoch": 2.3202465767267952,
2038
+ "grad_norm": 1.4755258769825808,
2039
+ "learning_rate": 2.5623194940282526e-06,
2040
+ "loss": 0.8353,
2041
+ "step": 2870
2042
+ },
2043
+ {
2044
+ "epoch": 2.328331059572533,
2045
+ "grad_norm": 1.5852343601430656,
2046
+ "learning_rate": 2.504206983782248e-06,
2047
+ "loss": 0.8133,
2048
+ "step": 2880
2049
+ },
2050
+ {
2051
+ "epoch": 2.336415542418271,
2052
+ "grad_norm": 1.4903107631764194,
2053
+ "learning_rate": 2.446666577824068e-06,
2054
+ "loss": 0.8459,
2055
+ "step": 2890
2056
+ },
2057
+ {
2058
+ "epoch": 2.3445000252640087,
2059
+ "grad_norm": 1.523719484539125,
2060
+ "learning_rate": 2.389702667832202e-06,
2061
+ "loss": 0.8285,
2062
+ "step": 2900
2063
+ },
2064
+ {
2065
+ "epoch": 2.352584508109747,
2066
+ "grad_norm": 1.457321496284554,
2067
+ "learning_rate": 2.3333196014850246e-06,
2068
+ "loss": 0.8304,
2069
+ "step": 2910
2070
+ },
2071
+ {
2072
+ "epoch": 2.3606689909554848,
2073
+ "grad_norm": 1.537434676857527,
2074
+ "learning_rate": 2.277521682128947e-06,
2075
+ "loss": 0.829,
2076
+ "step": 2920
2077
+ },
2078
+ {
2079
+ "epoch": 2.3687534738012226,
2080
+ "grad_norm": 1.4707817420987006,
2081
+ "learning_rate": 2.2223131684499932e-06,
2082
+ "loss": 0.8372,
2083
+ "step": 2930
2084
+ },
2085
+ {
2086
+ "epoch": 2.376837956646961,
2087
+ "grad_norm": 1.46749047915079,
2088
+ "learning_rate": 2.1676982741487427e-06,
2089
+ "loss": 0.8222,
2090
+ "step": 2940
2091
+ },
2092
+ {
2093
+ "epoch": 2.3849224394926987,
2094
+ "grad_norm": 1.518122852634397,
2095
+ "learning_rate": 2.113681167618736e-06,
2096
+ "loss": 0.8401,
2097
+ "step": 2950
2098
+ },
2099
+ {
2100
+ "epoch": 2.3930069223384365,
2101
+ "grad_norm": 1.8575848589445734,
2102
+ "learning_rate": 2.060265971628338e-06,
2103
+ "loss": 0.8339,
2104
+ "step": 2960
2105
+ },
2106
+ {
2107
+ "epoch": 2.4010914051841747,
2108
+ "grad_norm": 1.5601145654381285,
2109
+ "learning_rate": 2.0074567630060514e-06,
2110
+ "loss": 0.8154,
2111
+ "step": 2970
2112
+ },
2113
+ {
2114
+ "epoch": 2.4091758880299126,
2115
+ "grad_norm": 1.530898387002521,
2116
+ "learning_rate": 1.955257572329379e-06,
2117
+ "loss": 0.823,
2118
+ "step": 2980
2119
+ },
2120
+ {
2121
+ "epoch": 2.4172603708756504,
2122
+ "grad_norm": 1.6224545445427798,
2123
+ "learning_rate": 1.9036723836171899e-06,
2124
+ "loss": 0.8145,
2125
+ "step": 2990
2126
+ },
2127
+ {
2128
+ "epoch": 2.4253448537213886,
2129
+ "grad_norm": 1.4013679708594033,
2130
+ "learning_rate": 1.8527051340256397e-06,
2131
+ "loss": 0.8215,
2132
+ "step": 3000
2133
+ },
2134
+ {
2135
+ "epoch": 2.4334293365671265,
2136
+ "grad_norm": 1.5692785609667004,
2137
+ "learning_rate": 1.8023597135476923e-06,
2138
+ "loss": 0.8241,
2139
+ "step": 3010
2140
+ },
2141
+ {
2142
+ "epoch": 2.4415138194128643,
2143
+ "grad_norm": 1.5126974695662643,
2144
+ "learning_rate": 1.752639964716193e-06,
2145
+ "loss": 0.8421,
2146
+ "step": 3020
2147
+ },
2148
+ {
2149
+ "epoch": 2.4495983022586025,
2150
+ "grad_norm": 1.6242742569822604,
2151
+ "learning_rate": 1.7035496823106247e-06,
2152
+ "loss": 0.8141,
2153
+ "step": 3030
2154
+ },
2155
+ {
2156
+ "epoch": 2.4576827851043404,
2157
+ "grad_norm": 1.4628790110692993,
2158
+ "learning_rate": 1.6550926130674527e-06,
2159
+ "loss": 0.8184,
2160
+ "step": 3040
2161
+ },
2162
+ {
2163
+ "epoch": 2.465767267950078,
2164
+ "grad_norm": 1.4807837431822446,
2165
+ "learning_rate": 1.607272455394172e-06,
2166
+ "loss": 0.8202,
2167
+ "step": 3050
2168
+ },
2169
+ {
2170
+ "epoch": 2.4738517507958164,
2171
+ "grad_norm": 1.5539937903441552,
2172
+ "learning_rate": 1.5600928590870402e-06,
2173
+ "loss": 0.8391,
2174
+ "step": 3060
2175
+ },
2176
+ {
2177
+ "epoch": 2.4819362336415542,
2178
+ "grad_norm": 1.6677495360703212,
2179
+ "learning_rate": 1.5135574250524898e-06,
2180
+ "loss": 0.8436,
2181
+ "step": 3070
2182
+ },
2183
+ {
2184
+ "epoch": 2.490020716487292,
2185
+ "grad_norm": 1.53769857798961,
2186
+ "learning_rate": 1.467669705032323e-06,
2187
+ "loss": 0.8263,
2188
+ "step": 3080
2189
+ },
2190
+ {
2191
+ "epoch": 2.4981051993330303,
2192
+ "grad_norm": 1.4732928239069325,
2193
+ "learning_rate": 1.422433201332607e-06,
2194
+ "loss": 0.8284,
2195
+ "step": 3090
2196
+ },
2197
+ {
2198
+ "epoch": 2.506189682178768,
2199
+ "grad_norm": 1.5928757648188723,
2200
+ "learning_rate": 1.3778513665563786e-06,
2201
+ "loss": 0.8319,
2202
+ "step": 3100
2203
+ },
2204
+ {
2205
+ "epoch": 2.514274165024506,
2206
+ "grad_norm": 1.4230928346180836,
2207
+ "learning_rate": 1.3339276033401283e-06,
2208
+ "loss": 0.8052,
2209
+ "step": 3110
2210
+ },
2211
+ {
2212
+ "epoch": 2.522358647870244,
2213
+ "grad_norm": 1.4772661299744003,
2214
+ "learning_rate": 1.290665264094093e-06,
2215
+ "loss": 0.8241,
2216
+ "step": 3120
2217
+ },
2218
+ {
2219
+ "epoch": 2.530443130715982,
2220
+ "grad_norm": 1.522091825661006,
2221
+ "learning_rate": 1.2480676507464051e-06,
2222
+ "loss": 0.8106,
2223
+ "step": 3130
2224
+ },
2225
+ {
2226
+ "epoch": 2.53852761356172,
2227
+ "grad_norm": 1.525599170654266,
2228
+ "learning_rate": 1.2061380144910572e-06,
2229
+ "loss": 0.8166,
2230
+ "step": 3140
2231
+ },
2232
+ {
2233
+ "epoch": 2.5466120964074577,
2234
+ "grad_norm": 1.4929327017491605,
2235
+ "learning_rate": 1.1648795555397719e-06,
2236
+ "loss": 0.8251,
2237
+ "step": 3150
2238
+ },
2239
+ {
2240
+ "epoch": 2.554696579253196,
2241
+ "grad_norm": 1.5920001415947864,
2242
+ "learning_rate": 1.1242954228777513e-06,
2243
+ "loss": 0.8268,
2244
+ "step": 3160
2245
+ },
2246
+ {
2247
+ "epoch": 2.5627810620989337,
2248
+ "grad_norm": 1.5252651359986042,
2249
+ "learning_rate": 1.08438871402333e-06,
2250
+ "loss": 0.831,
2251
+ "step": 3170
2252
+ },
2253
+ {
2254
+ "epoch": 2.570865544944672,
2255
+ "grad_norm": 1.6461347768103347,
2256
+ "learning_rate": 1.04516247479157e-06,
2257
+ "loss": 0.8239,
2258
+ "step": 3180
2259
+ },
2260
+ {
2261
+ "epoch": 2.57895002779041,
2262
+ "grad_norm": 1.490863354097273,
2263
+ "learning_rate": 1.006619699061785e-06,
2264
+ "loss": 0.823,
2265
+ "step": 3190
2266
+ },
2267
+ {
2268
+ "epoch": 2.5870345106361476,
2269
+ "grad_norm": 1.5158841203253022,
2270
+ "learning_rate": 9.687633285490395e-07,
2271
+ "loss": 0.8333,
2272
+ "step": 3200
2273
+ },
2274
+ {
2275
+ "epoch": 2.5951189934818855,
2276
+ "grad_norm": 1.4861408651974157,
2277
+ "learning_rate": 9.315962525796374e-07,
2278
+ "loss": 0.8178,
2279
+ "step": 3210
2280
+ },
2281
+ {
2282
+ "epoch": 2.6032034763276237,
2283
+ "grad_norm": 1.4847726389856295,
2284
+ "learning_rate": 8.951213078705811e-07,
2285
+ "loss": 0.8244,
2286
+ "step": 3220
2287
+ },
2288
+ {
2289
+ "epoch": 2.6112879591733615,
2290
+ "grad_norm": 1.4579228976188288,
2291
+ "learning_rate": 8.593412783130805e-07,
2292
+ "loss": 0.8116,
2293
+ "step": 3230
2294
+ },
2295
+ {
2296
+ "epoch": 2.6193724420191,
2297
+ "grad_norm": 1.4309284818257009,
2298
+ "learning_rate": 8.24258894760066e-07,
2299
+ "loss": 0.8233,
2300
+ "step": 3240
2301
+ },
2302
+ {
2303
+ "epoch": 2.6274569248648376,
2304
+ "grad_norm": 1.481662266621092,
2305
+ "learning_rate": 7.898768348177643e-07,
2306
+ "loss": 0.8393,
2307
+ "step": 3250
2308
+ },
2309
+ {
2310
+ "epoch": 2.6355414077105754,
2311
+ "grad_norm": 1.42582017885812,
2312
+ "learning_rate": 7.561977226413341e-07,
2313
+ "loss": 0.8344,
2314
+ "step": 3260
2315
+ },
2316
+ {
2317
+ "epoch": 2.6436258905563133,
2318
+ "grad_norm": 1.4203791210214531,
2319
+ "learning_rate": 7.23224128734582e-07,
2320
+ "loss": 0.821,
2321
+ "step": 3270
2322
+ },
2323
+ {
2324
+ "epoch": 2.6517103734020515,
2325
+ "grad_norm": 1.4780417621137758,
2326
+ "learning_rate": 6.909585697537758e-07,
2327
+ "loss": 0.8353,
2328
+ "step": 3280
2329
+ },
2330
+ {
2331
+ "epoch": 2.6597948562477893,
2332
+ "grad_norm": 1.4466612391449976,
2333
+ "learning_rate": 6.594035083155581e-07,
2334
+ "loss": 0.8268,
2335
+ "step": 3290
2336
+ },
2337
+ {
2338
+ "epoch": 2.6678793390935276,
2339
+ "grad_norm": 1.4584592752103582,
2340
+ "learning_rate": 6.285613528089962e-07,
2341
+ "loss": 0.8164,
2342
+ "step": 3300
2343
+ },
2344
+ {
2345
+ "epoch": 2.6759638219392654,
2346
+ "grad_norm": 1.487514724946772,
2347
+ "learning_rate": 5.98434457211765e-07,
2348
+ "loss": 0.8027,
2349
+ "step": 3310
2350
+ },
2351
+ {
2352
+ "epoch": 2.6840483047850032,
2353
+ "grad_norm": 1.4294666752405771,
2354
+ "learning_rate": 5.690251209104802e-07,
2355
+ "loss": 0.8105,
2356
+ "step": 3320
2357
+ },
2358
+ {
2359
+ "epoch": 2.692132787630741,
2360
+ "grad_norm": 1.4638925402226952,
2361
+ "learning_rate": 5.403355885252104e-07,
2362
+ "loss": 0.8135,
2363
+ "step": 3330
2364
+ },
2365
+ {
2366
+ "epoch": 2.7002172704764793,
2367
+ "grad_norm": 1.4458763488108235,
2368
+ "learning_rate": 5.123680497381444e-07,
2369
+ "loss": 0.8102,
2370
+ "step": 3340
2371
+ },
2372
+ {
2373
+ "epoch": 2.708301753322217,
2374
+ "grad_norm": 1.4903596037049076,
2375
+ "learning_rate": 4.851246391264819e-07,
2376
+ "loss": 0.8152,
2377
+ "step": 3350
2378
+ },
2379
+ {
2380
+ "epoch": 2.7163862361679554,
2381
+ "grad_norm": 1.4429528216246368,
2382
+ "learning_rate": 4.5860743599951186e-07,
2383
+ "loss": 0.8121,
2384
+ "step": 3360
2385
+ },
2386
+ {
2387
+ "epoch": 2.724470719013693,
2388
+ "grad_norm": 1.452035259914063,
2389
+ "learning_rate": 4.328184642399036e-07,
2390
+ "loss": 0.821,
2391
+ "step": 3370
2392
+ },
2393
+ {
2394
+ "epoch": 2.732555201859431,
2395
+ "grad_norm": 1.5303877229228735,
2396
+ "learning_rate": 4.077596921492533e-07,
2397
+ "loss": 0.8145,
2398
+ "step": 3380
2399
+ },
2400
+ {
2401
+ "epoch": 2.740639684705169,
2402
+ "grad_norm": 1.4449405328561624,
2403
+ "learning_rate": 3.834330322978397e-07,
2404
+ "loss": 0.8214,
2405
+ "step": 3390
2406
+ },
2407
+ {
2408
+ "epoch": 2.748724167550907,
2409
+ "grad_norm": 1.4371584227135465,
2410
+ "learning_rate": 3.598403413786611e-07,
2411
+ "loss": 0.8131,
2412
+ "step": 3400
2413
+ },
2414
+ {
2415
+ "epoch": 2.756808650396645,
2416
+ "grad_norm": 1.4632980675092546,
2417
+ "learning_rate": 3.3698342006572294e-07,
2418
+ "loss": 0.8244,
2419
+ "step": 3410
2420
+ },
2421
+ {
2422
+ "epoch": 2.764893133242383,
2423
+ "grad_norm": 1.4500755832832954,
2424
+ "learning_rate": 3.148640128766056e-07,
2425
+ "loss": 0.823,
2426
+ "step": 3420
2427
+ },
2428
+ {
2429
+ "epoch": 2.772977616088121,
2430
+ "grad_norm": 1.4751477866660623,
2431
+ "learning_rate": 2.934838080393154e-07,
2432
+ "loss": 0.8211,
2433
+ "step": 3430
2434
+ },
2435
+ {
2436
+ "epoch": 2.781062098933859,
2437
+ "grad_norm": 1.4653755137740456,
2438
+ "learning_rate": 2.7284443736343203e-07,
2439
+ "loss": 0.8024,
2440
+ "step": 3440
2441
+ },
2442
+ {
2443
+ "epoch": 2.7891465817795966,
2444
+ "grad_norm": 1.4089563044736344,
2445
+ "learning_rate": 2.52947476115567e-07,
2446
+ "loss": 0.8228,
2447
+ "step": 3450
2448
+ },
2449
+ {
2450
+ "epoch": 2.797231064625335,
2451
+ "grad_norm": 1.460696621649454,
2452
+ "learning_rate": 2.3379444289913344e-07,
2453
+ "loss": 0.8184,
2454
+ "step": 3460
2455
+ },
2456
+ {
2457
+ "epoch": 2.8053155474710727,
2458
+ "grad_norm": 1.4693334824298931,
2459
+ "learning_rate": 2.153867995384351e-07,
2460
+ "loss": 0.8224,
2461
+ "step": 3470
2462
+ },
2463
+ {
2464
+ "epoch": 2.8134000303168105,
2465
+ "grad_norm": 1.4469954005038157,
2466
+ "learning_rate": 1.9772595096710477e-07,
2467
+ "loss": 0.8373,
2468
+ "step": 3480
2469
+ },
2470
+ {
2471
+ "epoch": 2.821484513162549,
2472
+ "grad_norm": 1.4331150676229163,
2473
+ "learning_rate": 1.8081324512086663e-07,
2474
+ "loss": 0.8185,
2475
+ "step": 3490
2476
+ },
2477
+ {
2478
+ "epoch": 2.8295689960082866,
2479
+ "grad_norm": 1.5335384382024873,
2480
+ "learning_rate": 1.6464997283466067e-07,
2481
+ "loss": 0.8124,
2482
+ "step": 3500
2483
+ },
2484
+ {
2485
+ "epoch": 2.8376534788540244,
2486
+ "grad_norm": 1.4445147972537609,
2487
+ "learning_rate": 1.492373677441228e-07,
2488
+ "loss": 0.8145,
2489
+ "step": 3510
2490
+ },
2491
+ {
2492
+ "epoch": 2.8457379616997627,
2493
+ "grad_norm": 1.4976188260457166,
2494
+ "learning_rate": 1.3457660619142887e-07,
2495
+ "loss": 0.8163,
2496
+ "step": 3520
2497
+ },
2498
+ {
2499
+ "epoch": 2.8538224445455005,
2500
+ "grad_norm": 1.439452743377751,
2501
+ "learning_rate": 1.2066880713550888e-07,
2502
+ "loss": 0.829,
2503
+ "step": 3530
2504
+ },
2505
+ {
2506
+ "epoch": 2.8619069273912383,
2507
+ "grad_norm": 1.524984754735583,
2508
+ "learning_rate": 1.0751503206665071e-07,
2509
+ "loss": 0.8236,
2510
+ "step": 3540
2511
+ },
2512
+ {
2513
+ "epoch": 2.8699914102369766,
2514
+ "grad_norm": 1.448229914768272,
2515
+ "learning_rate": 9.511628492547609e-08,
2516
+ "loss": 0.8223,
2517
+ "step": 3550
2518
+ },
2519
+ {
2520
+ "epoch": 2.8780758930827144,
2521
+ "grad_norm": 1.4915344957228824,
2522
+ "learning_rate": 8.347351202632525e-08,
2523
+ "loss": 0.843,
2524
+ "step": 3560
2525
+ },
2526
+ {
2527
+ "epoch": 2.886160375928452,
2528
+ "grad_norm": 1.4891660841319714,
2529
+ "learning_rate": 7.258760198502246e-08,
2530
+ "loss": 0.8173,
2531
+ "step": 3570
2532
+ },
2533
+ {
2534
+ "epoch": 2.89424485877419,
2535
+ "grad_norm": 1.4485487573496472,
2536
+ "learning_rate": 6.245938565105803e-08,
2537
+ "loss": 0.8299,
2538
+ "step": 3580
2539
+ },
2540
+ {
2541
+ "epoch": 2.9023293416199283,
2542
+ "grad_norm": 1.452602418516034,
2543
+ "learning_rate": 5.308963604417572e-08,
2544
+ "loss": 0.8216,
2545
+ "step": 3590
2546
+ },
2547
+ {
2548
+ "epoch": 2.910413824465666,
2549
+ "grad_norm": 1.4554407329371093,
2550
+ "learning_rate": 4.447906829537219e-08,
2551
+ "loss": 0.8284,
2552
+ "step": 3600
2553
+ },
2554
+ {
2555
+ "epoch": 2.9184983073114044,
2556
+ "grad_norm": 1.4918607001029844,
2557
+ "learning_rate": 3.6628339592313935e-08,
2558
+ "loss": 0.8012,
2559
+ "step": 3610
2560
+ },
2561
+ {
2562
+ "epoch": 2.926582790157142,
2563
+ "grad_norm": 1.4229324193215207,
2564
+ "learning_rate": 2.95380491291819e-08,
2565
+ "loss": 0.8401,
2566
+ "step": 3620
2567
+ },
2568
+ {
2569
+ "epoch": 2.93466727300288,
2570
+ "grad_norm": 1.4288366788035922,
2571
+ "learning_rate": 2.320873806093804e-08,
2572
+ "loss": 0.8228,
2573
+ "step": 3630
2574
+ },
2575
+ {
2576
+ "epoch": 2.942751755848618,
2577
+ "grad_norm": 1.4724134547959333,
2578
+ "learning_rate": 1.764088946201947e-08,
2579
+ "loss": 0.8064,
2580
+ "step": 3640
2581
+ },
2582
+ {
2583
+ "epoch": 2.950836238694356,
2584
+ "grad_norm": 1.4984479737935563,
2585
+ "learning_rate": 1.2834928289472415e-08,
2586
+ "loss": 0.81,
2587
+ "step": 3650
2588
+ },
2589
+ {
2590
+ "epoch": 2.958920721540094,
2591
+ "grad_norm": 1.4666816312445612,
2592
+ "learning_rate": 8.79122135051591e-09,
2593
+ "loss": 0.822,
2594
+ "step": 3660
2595
+ },
2596
+ {
2597
+ "epoch": 2.967005204385832,
2598
+ "grad_norm": 1.445201429621803,
2599
+ "learning_rate": 5.510077274547554e-09,
2600
+ "loss": 0.8271,
2601
+ "step": 3670
2602
+ },
2603
+ {
2604
+ "epoch": 2.97508968723157,
2605
+ "grad_norm": 1.4460059967392547,
2606
+ "learning_rate": 2.9917464895856673e-09,
2607
+ "loss": 0.8389,
2608
+ "step": 3680
2609
+ },
2610
+ {
2611
+ "epoch": 2.983174170077308,
2612
+ "grad_norm": 1.435390156627942,
2613
+ "learning_rate": 1.2364212031579226e-09,
2614
+ "loss": 0.8294,
2615
+ "step": 3690
2616
+ },
2617
+ {
2618
+ "epoch": 2.9912586529230456,
2619
+ "grad_norm": 1.5066669703721747,
2620
+ "learning_rate": 2.442353876297432e-10,
2621
+ "loss": 0.801,
2622
+ "step": 3700
2623
+ },
2624
+ {
2625
+ "epoch": 2.997726239199636,
2626
+ "eval_loss": 0.8224219083786011,
2627
+ "eval_runtime": 474.463,
2628
+ "eval_samples_per_second": 25.637,
2629
+ "eval_steps_per_second": 12.819,
2630
+ "step": 3708
2631
+ },
2632
+ {
2633
+ "epoch": 2.997726239199636,
2634
+ "step": 3708,
2635
+ "total_flos": 0.0,
2636
+ "train_loss": 1.0273753281164324,
2637
+ "train_runtime": 58675.1239,
2638
+ "train_samples_per_second": 8.095,
2639
+ "train_steps_per_second": 0.063
2640
+ }
2641
+ ],
2642
+ "logging_steps": 10,
2643
+ "max_steps": 3708,
2644
+ "num_input_tokens_seen": 0,
2645
+ "num_train_epochs": 3,
2646
+ "save_steps": 100,
2647
+ "total_flos": 0.0,
2648
+ "train_batch_size": 8,
2649
+ "trial_name": null,
2650
+ "trial_params": null
2651
+ }