Shawon16 commited on
Commit
8e28465
·
verified ·
1 Parent(s): 88ec1e0

End of training

Browse files
README.md ADDED
@@ -0,0 +1,114 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ license: cc-by-nc-4.0
4
+ base_model: facebook/timesformer-base-finetuned-k400
5
+ tags:
6
+ - generated_from_trainer
7
+ metrics:
8
+ - accuracy
9
+ - precision
10
+ - recall
11
+ - f1
12
+ model-index:
13
+ - name: Timesformer_WLASL_100_200_epochs_p20_SR_16
14
+ results: []
15
+ ---
16
+
17
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
18
+ should probably proofread and complete it, then remove this comment. -->
19
+
20
+ # Timesformer_WLASL_100_200_epochs_p20_SR_16
21
+
22
+ This model is a fine-tuned version of [facebook/timesformer-base-finetuned-k400](https://huggingface.co/facebook/timesformer-base-finetuned-k400) on an unknown dataset.
23
+ It achieves the following results on the evaluation set:
24
+ - Loss: 2.2599
25
+ - Top 1 Accuracy: 0.5828
26
+ - Top 5 Accuracy: 0.7899
27
+ - Top 10 Accuracy: 0.8698
28
+ - Accuracy: 0.5828
29
+ - Precision: 0.5806
30
+ - Recall: 0.5828
31
+ - F1: 0.5510
32
+
33
+ ## Model description
34
+
35
+ More information needed
36
+
37
+ ## Intended uses & limitations
38
+
39
+ More information needed
40
+
41
+ ## Training and evaluation data
42
+
43
+ More information needed
44
+
45
+ ## Training procedure
46
+
47
+ ### Training hyperparameters
48
+
49
+ The following hyperparameters were used during training:
50
+ - learning_rate: 5e-05
51
+ - train_batch_size: 2
52
+ - eval_batch_size: 2
53
+ - seed: 42
54
+ - gradient_accumulation_steps: 4
55
+ - total_train_batch_size: 8
56
+ - optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
57
+ - lr_scheduler_type: linear
58
+ - lr_scheduler_warmup_ratio: 0.1
59
+ - training_steps: 36000
60
+ - mixed_precision_training: Native AMP
61
+
62
+ ### Training results
63
+
64
+ | Training Loss | Epoch | Step | Validation Loss | Top 1 Accuracy | Top 5 Accuracy | Top 10 Accuracy | Accuracy | Precision | Recall | F1 |
65
+ |:-------------:|:-------:|:----:|:---------------:|:--------------:|:--------------:|:---------------:|:--------:|:---------:|:------:|:------:|
66
+ | 19.1155 | 0.005 | 180 | 4.6927 | 0.0089 | 0.0414 | 0.0888 | 0.0089 | 0.0155 | 0.0089 | 0.0105 |
67
+ | 18.5538 | 1.0050 | 360 | 4.5821 | 0.0266 | 0.0769 | 0.1302 | 0.0266 | 0.0137 | 0.0266 | 0.0116 |
68
+ | 17.5848 | 2.0050 | 540 | 4.3988 | 0.0562 | 0.1450 | 0.2633 | 0.0562 | 0.0486 | 0.0562 | 0.0390 |
69
+ | 15.8283 | 3.0050 | 721 | 4.0516 | 0.1302 | 0.2959 | 0.4645 | 0.1302 | 0.1012 | 0.1302 | 0.0976 |
70
+ | 13.3102 | 4.005 | 901 | 3.6150 | 0.2249 | 0.4704 | 0.6154 | 0.2249 | 0.1781 | 0.2249 | 0.1741 |
71
+ | 11.2113 | 5.0050 | 1081 | 3.2389 | 0.2604 | 0.6065 | 0.7367 | 0.2604 | 0.2422 | 0.2604 | 0.2215 |
72
+ | 8.898 | 6.0050 | 1261 | 2.8714 | 0.3757 | 0.6775 | 0.8166 | 0.3757 | 0.3584 | 0.3757 | 0.3324 |
73
+ | 6.715 | 7.0050 | 1442 | 2.6518 | 0.4231 | 0.7249 | 0.8402 | 0.4231 | 0.3828 | 0.4231 | 0.3730 |
74
+ | 4.8442 | 8.005 | 1622 | 2.3294 | 0.4645 | 0.7929 | 0.8876 | 0.4645 | 0.5077 | 0.4645 | 0.4377 |
75
+ | 3.3825 | 9.0050 | 1802 | 2.1747 | 0.4911 | 0.7899 | 0.8964 | 0.4911 | 0.5436 | 0.4911 | 0.4654 |
76
+ | 2.0471 | 10.0050 | 1982 | 1.9990 | 0.5148 | 0.8107 | 0.9053 | 0.5178 | 0.5871 | 0.5178 | 0.5057 |
77
+ | 1.3242 | 11.0050 | 2163 | 1.8964 | 0.5473 | 0.8166 | 0.8935 | 0.5473 | 0.5822 | 0.5473 | 0.5199 |
78
+ | 0.8746 | 12.005 | 2343 | 1.8222 | 0.5562 | 0.8254 | 0.9083 | 0.5562 | 0.5796 | 0.5562 | 0.5320 |
79
+ | 0.5537 | 13.0050 | 2523 | 1.7525 | 0.5769 | 0.8343 | 0.9142 | 0.5769 | 0.5813 | 0.5769 | 0.5468 |
80
+ | 0.4081 | 14.0050 | 2703 | 1.7351 | 0.5947 | 0.8136 | 0.8964 | 0.5947 | 0.6684 | 0.5947 | 0.5834 |
81
+ | 0.17 | 15.0050 | 2884 | 1.6998 | 0.5592 | 0.8225 | 0.9083 | 0.5592 | 0.5763 | 0.5592 | 0.5342 |
82
+ | 0.2053 | 16.005 | 3064 | 1.7340 | 0.5651 | 0.8343 | 0.9083 | 0.5651 | 0.6215 | 0.5651 | 0.5390 |
83
+ | 0.1434 | 17.0050 | 3244 | 1.7350 | 0.6006 | 0.8432 | 0.9142 | 0.6006 | 0.6347 | 0.6006 | 0.5806 |
84
+ | 0.1957 | 18.0050 | 3424 | 1.8179 | 0.5621 | 0.8373 | 0.9142 | 0.5621 | 0.6060 | 0.5621 | 0.5350 |
85
+ | 0.1636 | 19.0050 | 3605 | 1.7831 | 0.6154 | 0.8225 | 0.8905 | 0.6154 | 0.6401 | 0.6154 | 0.5917 |
86
+ | 0.0908 | 20.005 | 3785 | 1.7552 | 0.6213 | 0.8402 | 0.9053 | 0.6213 | 0.6504 | 0.6213 | 0.6014 |
87
+ | 0.058 | 21.0050 | 3965 | 1.8422 | 0.6243 | 0.8254 | 0.9112 | 0.6213 | 0.6392 | 0.6213 | 0.5962 |
88
+ | 0.0924 | 22.0050 | 4145 | 1.8347 | 0.6006 | 0.8225 | 0.9201 | 0.6006 | 0.6218 | 0.6006 | 0.5735 |
89
+ | 0.0799 | 23.0050 | 4326 | 1.9650 | 0.6036 | 0.8107 | 0.8846 | 0.6036 | 0.6182 | 0.6036 | 0.5724 |
90
+ | 0.176 | 24.005 | 4506 | 1.9326 | 0.5858 | 0.8402 | 0.9142 | 0.5858 | 0.6240 | 0.5858 | 0.5671 |
91
+ | 0.0786 | 25.0050 | 4686 | 1.7753 | 0.6124 | 0.8491 | 0.9142 | 0.6124 | 0.6607 | 0.6124 | 0.5998 |
92
+ | 0.242 | 26.0050 | 4866 | 2.0219 | 0.5769 | 0.7722 | 0.8876 | 0.5769 | 0.6337 | 0.5769 | 0.5552 |
93
+ | 0.1767 | 27.0050 | 5047 | 1.9744 | 0.5828 | 0.8166 | 0.9024 | 0.5828 | 0.6330 | 0.5828 | 0.5721 |
94
+ | 0.14 | 28.005 | 5227 | 2.1996 | 0.5769 | 0.7811 | 0.8609 | 0.5769 | 0.5983 | 0.5769 | 0.5430 |
95
+ | 0.104 | 29.0050 | 5407 | 2.0881 | 0.5769 | 0.8166 | 0.8876 | 0.5769 | 0.6146 | 0.5769 | 0.5641 |
96
+ | 0.1454 | 30.0050 | 5587 | 2.3394 | 0.5621 | 0.7959 | 0.8905 | 0.5621 | 0.6280 | 0.5621 | 0.5448 |
97
+ | 0.2221 | 31.0050 | 5768 | 1.9360 | 0.5947 | 0.8225 | 0.9024 | 0.5947 | 0.6606 | 0.5947 | 0.5881 |
98
+ | 0.1026 | 32.005 | 5948 | 2.0920 | 0.6036 | 0.8107 | 0.8935 | 0.6036 | 0.6376 | 0.6036 | 0.5832 |
99
+ | 0.0968 | 33.0050 | 6128 | 2.2746 | 0.5740 | 0.8047 | 0.8846 | 0.5740 | 0.6308 | 0.5740 | 0.5542 |
100
+ | 0.1864 | 34.0050 | 6308 | 2.2081 | 0.5888 | 0.8047 | 0.8698 | 0.5888 | 0.6394 | 0.5888 | 0.5704 |
101
+ | 0.1353 | 35.0050 | 6489 | 2.1853 | 0.5799 | 0.8254 | 0.8935 | 0.5799 | 0.6133 | 0.5799 | 0.5636 |
102
+ | 0.1618 | 36.005 | 6669 | 2.2661 | 0.5710 | 0.7959 | 0.8698 | 0.5710 | 0.6243 | 0.5710 | 0.5515 |
103
+ | 0.259 | 37.0050 | 6849 | 2.3163 | 0.5740 | 0.7870 | 0.8580 | 0.5740 | 0.6088 | 0.5740 | 0.5459 |
104
+ | 0.3394 | 38.0050 | 7029 | 2.0984 | 0.5769 | 0.7988 | 0.8905 | 0.5769 | 0.6154 | 0.5769 | 0.5614 |
105
+ | 0.0833 | 39.0050 | 7210 | 2.2811 | 0.5533 | 0.8047 | 0.8698 | 0.5533 | 0.6051 | 0.5533 | 0.5328 |
106
+ | 0.1259 | 40.005 | 7390 | 2.2599 | 0.5828 | 0.7899 | 0.8698 | 0.5828 | 0.5806 | 0.5828 | 0.5510 |
107
+
108
+
109
+ ### Framework versions
110
+
111
+ - Transformers 4.46.1
112
+ - Pytorch 2.5.1+cu124
113
+ - Datasets 3.1.0
114
+ - Tokenizers 0.20.1
all_results.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "accuracy": 0.5697674418604651,
3
+ "f1": 0.5384705303309955,
4
+ "precision": 0.5654485049833887,
5
+ "recall": 0.5697674418604651,
6
+ "top_10_accuracy": 0.9031007751937985,
7
+ "top_1_accuracy": 0.5697674418604651,
8
+ "top_5_accuracy": 0.8449612403100775
9
+ }
config.json ADDED
@@ -0,0 +1,230 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "facebook/timesformer-base-finetuned-k400",
3
+ "architectures": [
4
+ "TimesformerForVideoClassification"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.0,
7
+ "attention_type": "divided_space_time",
8
+ "drop_path_rate": 0,
9
+ "hidden_act": "gelu",
10
+ "hidden_dropout_prob": 0.0,
11
+ "hidden_size": 768,
12
+ "id2label": {
13
+ "0": "accident",
14
+ "1": "africa",
15
+ "2": "all",
16
+ "3": "apple",
17
+ "4": "basketball",
18
+ "5": "bed",
19
+ "6": "before",
20
+ "7": "bird",
21
+ "8": "birthday",
22
+ "9": "black",
23
+ "10": "blue",
24
+ "11": "book",
25
+ "12": "bowling",
26
+ "13": "brown",
27
+ "14": "but",
28
+ "15": "can",
29
+ "16": "candy",
30
+ "17": "chair",
31
+ "18": "change",
32
+ "19": "cheat",
33
+ "20": "city",
34
+ "21": "clothes",
35
+ "22": "color",
36
+ "23": "computer",
37
+ "24": "cook",
38
+ "25": "cool",
39
+ "26": "corn",
40
+ "27": "cousin",
41
+ "28": "cow",
42
+ "29": "dance",
43
+ "30": "dark",
44
+ "31": "deaf",
45
+ "32": "decide",
46
+ "33": "doctor",
47
+ "34": "dog",
48
+ "35": "drink",
49
+ "36": "eat",
50
+ "37": "enjoy",
51
+ "38": "family",
52
+ "39": "fine",
53
+ "40": "finish",
54
+ "41": "fish",
55
+ "42": "forget",
56
+ "43": "full",
57
+ "44": "give",
58
+ "45": "go",
59
+ "46": "graduate",
60
+ "47": "hat",
61
+ "48": "hearing",
62
+ "49": "help",
63
+ "50": "hot",
64
+ "51": "how",
65
+ "52": "jacket",
66
+ "53": "kiss",
67
+ "54": "language",
68
+ "55": "last",
69
+ "56": "later",
70
+ "57": "letter",
71
+ "58": "like",
72
+ "59": "man",
73
+ "60": "many",
74
+ "61": "medicine",
75
+ "62": "meet",
76
+ "63": "mother",
77
+ "64": "need",
78
+ "65": "no",
79
+ "66": "now",
80
+ "67": "orange",
81
+ "68": "paint",
82
+ "69": "paper",
83
+ "70": "pink",
84
+ "71": "pizza",
85
+ "72": "play",
86
+ "73": "pull",
87
+ "74": "purple",
88
+ "75": "right",
89
+ "76": "same",
90
+ "77": "school",
91
+ "78": "secretary",
92
+ "79": "shirt",
93
+ "80": "short",
94
+ "81": "son",
95
+ "82": "study",
96
+ "83": "table",
97
+ "84": "tall",
98
+ "85": "tell",
99
+ "86": "thanksgiving",
100
+ "87": "thin",
101
+ "88": "thursday",
102
+ "89": "time",
103
+ "90": "walk",
104
+ "91": "want",
105
+ "92": "what",
106
+ "93": "white",
107
+ "94": "who",
108
+ "95": "woman",
109
+ "96": "work",
110
+ "97": "wrong",
111
+ "98": "year",
112
+ "99": "yes"
113
+ },
114
+ "image_size": 224,
115
+ "initializer_range": 0.02,
116
+ "intermediate_size": 3072,
117
+ "label2id": {
118
+ "accident": 0,
119
+ "africa": 1,
120
+ "all": 2,
121
+ "apple": 3,
122
+ "basketball": 4,
123
+ "bed": 5,
124
+ "before": 6,
125
+ "bird": 7,
126
+ "birthday": 8,
127
+ "black": 9,
128
+ "blue": 10,
129
+ "book": 11,
130
+ "bowling": 12,
131
+ "brown": 13,
132
+ "but": 14,
133
+ "can": 15,
134
+ "candy": 16,
135
+ "chair": 17,
136
+ "change": 18,
137
+ "cheat": 19,
138
+ "city": 20,
139
+ "clothes": 21,
140
+ "color": 22,
141
+ "computer": 23,
142
+ "cook": 24,
143
+ "cool": 25,
144
+ "corn": 26,
145
+ "cousin": 27,
146
+ "cow": 28,
147
+ "dance": 29,
148
+ "dark": 30,
149
+ "deaf": 31,
150
+ "decide": 32,
151
+ "doctor": 33,
152
+ "dog": 34,
153
+ "drink": 35,
154
+ "eat": 36,
155
+ "enjoy": 37,
156
+ "family": 38,
157
+ "fine": 39,
158
+ "finish": 40,
159
+ "fish": 41,
160
+ "forget": 42,
161
+ "full": 43,
162
+ "give": 44,
163
+ "go": 45,
164
+ "graduate": 46,
165
+ "hat": 47,
166
+ "hearing": 48,
167
+ "help": 49,
168
+ "hot": 50,
169
+ "how": 51,
170
+ "jacket": 52,
171
+ "kiss": 53,
172
+ "language": 54,
173
+ "last": 55,
174
+ "later": 56,
175
+ "letter": 57,
176
+ "like": 58,
177
+ "man": 59,
178
+ "many": 60,
179
+ "medicine": 61,
180
+ "meet": 62,
181
+ "mother": 63,
182
+ "need": 64,
183
+ "no": 65,
184
+ "now": 66,
185
+ "orange": 67,
186
+ "paint": 68,
187
+ "paper": 69,
188
+ "pink": 70,
189
+ "pizza": 71,
190
+ "play": 72,
191
+ "pull": 73,
192
+ "purple": 74,
193
+ "right": 75,
194
+ "same": 76,
195
+ "school": 77,
196
+ "secretary": 78,
197
+ "shirt": 79,
198
+ "short": 80,
199
+ "son": 81,
200
+ "study": 82,
201
+ "table": 83,
202
+ "tall": 84,
203
+ "tell": 85,
204
+ "thanksgiving": 86,
205
+ "thin": 87,
206
+ "thursday": 88,
207
+ "time": 89,
208
+ "walk": 90,
209
+ "want": 91,
210
+ "what": 92,
211
+ "white": 93,
212
+ "who": 94,
213
+ "woman": 95,
214
+ "work": 96,
215
+ "wrong": 97,
216
+ "year": 98,
217
+ "yes": 99
218
+ },
219
+ "layer_norm_eps": 1e-06,
220
+ "model_type": "timesformer",
221
+ "num_attention_heads": 12,
222
+ "num_channels": 3,
223
+ "num_frames": 8,
224
+ "num_hidden_layers": 12,
225
+ "patch_size": 16,
226
+ "problem_type": "single_label_classification",
227
+ "qkv_bias": true,
228
+ "torch_dtype": "float32",
229
+ "transformers_version": "4.46.1"
230
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7ad548c8b5ca8471c2a8d5f3fd2a642106b1a7a930abf51fb3b89dc171e331fb
3
+ size 485373720
preprocessor_config.json ADDED
@@ -0,0 +1,26 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "crop_size": {
3
+ "height": 224,
4
+ "width": 224
5
+ },
6
+ "do_center_crop": true,
7
+ "do_normalize": true,
8
+ "do_rescale": true,
9
+ "do_resize": true,
10
+ "image_mean": [
11
+ 0.45,
12
+ 0.45,
13
+ 0.45
14
+ ],
15
+ "image_processor_type": "VideoMAEImageProcessor",
16
+ "image_std": [
17
+ 0.225,
18
+ 0.225,
19
+ 0.225
20
+ ],
21
+ "resample": 2,
22
+ "rescale_factor": 0.00392156862745098,
23
+ "size": {
24
+ "shortest_edge": 224
25
+ }
26
+ }
test_results.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "accuracy": 0.5697674418604651,
3
+ "f1": 0.5384705303309955,
4
+ "precision": 0.5654485049833887,
5
+ "recall": 0.5697674418604651,
6
+ "top_10_accuracy": 0.9031007751937985,
7
+ "top_1_accuracy": 0.5697674418604651,
8
+ "top_5_accuracy": 0.8449612403100775
9
+ }
trainer_state.json ADDED
@@ -0,0 +1,1177 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": 0.621301775147929,
3
+ "best_model_checkpoint": "/media/cse/HDD/Shawon/shawon/MY DATA/Timesformer_WLASL_100_200_epochs_p20_SR_16/checkpoint-3785",
4
+ "epoch": 40.005,
5
+ "eval_steps": 500,
6
+ "global_step": 7390,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 0.002777777777777778,
13
+ "grad_norm": 50.3983268737793,
14
+ "learning_rate": 1.3194444444444444e-06,
15
+ "loss": 19.1155,
16
+ "step": 100
17
+ },
18
+ {
19
+ "epoch": 0.005,
20
+ "eval_accuracy": 0.008875739644970414,
21
+ "eval_f1": 0.010524091293322064,
22
+ "eval_loss": 4.692718029022217,
23
+ "eval_precision": 0.015532544378698224,
24
+ "eval_recall": 0.008875739644970414,
25
+ "eval_runtime": 11.6929,
26
+ "eval_samples_per_second": 28.906,
27
+ "eval_steps_per_second": 14.453,
28
+ "eval_top_10_accuracy": 0.08875739644970414,
29
+ "eval_top_1_accuracy": 0.008875739644970414,
30
+ "eval_top_5_accuracy": 0.04142011834319527,
31
+ "step": 180
32
+ },
33
+ {
34
+ "epoch": 1.000548611111111,
35
+ "grad_norm": 49.16985321044922,
36
+ "learning_rate": 2.6944444444444444e-06,
37
+ "loss": 18.7026,
38
+ "step": 200
39
+ },
40
+ {
41
+ "epoch": 1.0033263888888888,
42
+ "grad_norm": 54.74333572387695,
43
+ "learning_rate": 4.083333333333334e-06,
44
+ "loss": 18.5538,
45
+ "step": 300
46
+ },
47
+ {
48
+ "epoch": 1.0049930555555555,
49
+ "eval_accuracy": 0.026627218934911243,
50
+ "eval_f1": 0.01160748704902843,
51
+ "eval_loss": 4.582146644592285,
52
+ "eval_precision": 0.013707884027829569,
53
+ "eval_recall": 0.026627218934911243,
54
+ "eval_runtime": 11.4919,
55
+ "eval_samples_per_second": 29.412,
56
+ "eval_steps_per_second": 14.706,
57
+ "eval_top_10_accuracy": 0.1301775147928994,
58
+ "eval_top_1_accuracy": 0.026627218934911243,
59
+ "eval_top_5_accuracy": 0.07692307692307693,
60
+ "step": 360
61
+ },
62
+ {
63
+ "epoch": 2.001097222222222,
64
+ "grad_norm": 49.70963668823242,
65
+ "learning_rate": 5.472222222222223e-06,
66
+ "loss": 18.0616,
67
+ "step": 400
68
+ },
69
+ {
70
+ "epoch": 2.003875,
71
+ "grad_norm": 52.42399978637695,
72
+ "learning_rate": 6.861111111111111e-06,
73
+ "loss": 17.5848,
74
+ "step": 500
75
+ },
76
+ {
77
+ "epoch": 2.004986111111111,
78
+ "eval_accuracy": 0.05621301775147929,
79
+ "eval_f1": 0.03896742883758426,
80
+ "eval_loss": 4.3987812995910645,
81
+ "eval_precision": 0.04859307359307359,
82
+ "eval_recall": 0.05621301775147929,
83
+ "eval_runtime": 12.0994,
84
+ "eval_samples_per_second": 27.935,
85
+ "eval_steps_per_second": 13.968,
86
+ "eval_top_10_accuracy": 0.26331360946745563,
87
+ "eval_top_1_accuracy": 0.05621301775147929,
88
+ "eval_top_5_accuracy": 0.14497041420118342,
89
+ "step": 540
90
+ },
91
+ {
92
+ "epoch": 3.0016458333333333,
93
+ "grad_norm": 54.07436752319336,
94
+ "learning_rate": 8.25e-06,
95
+ "loss": 16.7062,
96
+ "step": 600
97
+ },
98
+ {
99
+ "epoch": 3.004423611111111,
100
+ "grad_norm": 57.311805725097656,
101
+ "learning_rate": 9.625e-06,
102
+ "loss": 15.8283,
103
+ "step": 700
104
+ },
105
+ {
106
+ "epoch": 3.0050069444444443,
107
+ "eval_accuracy": 0.1301775147928994,
108
+ "eval_f1": 0.09761319747359186,
109
+ "eval_loss": 4.051616191864014,
110
+ "eval_precision": 0.10124132555089023,
111
+ "eval_recall": 0.1301775147928994,
112
+ "eval_runtime": 12.2039,
113
+ "eval_samples_per_second": 27.696,
114
+ "eval_steps_per_second": 13.848,
115
+ "eval_top_10_accuracy": 0.46449704142011833,
116
+ "eval_top_1_accuracy": 0.1301775147928994,
117
+ "eval_top_5_accuracy": 0.2958579881656805,
118
+ "step": 721
119
+ },
120
+ {
121
+ "epoch": 4.002194444444444,
122
+ "grad_norm": 55.184627532958984,
123
+ "learning_rate": 1.1013888888888889e-05,
124
+ "loss": 14.1811,
125
+ "step": 800
126
+ },
127
+ {
128
+ "epoch": 4.004972222222222,
129
+ "grad_norm": 62.36821746826172,
130
+ "learning_rate": 1.2402777777777778e-05,
131
+ "loss": 13.3102,
132
+ "step": 900
133
+ },
134
+ {
135
+ "epoch": 4.005,
136
+ "eval_accuracy": 0.22485207100591717,
137
+ "eval_f1": 0.17408013159057364,
138
+ "eval_loss": 3.615004062652588,
139
+ "eval_precision": 0.17807893898652688,
140
+ "eval_recall": 0.22485207100591717,
141
+ "eval_runtime": 12.3307,
142
+ "eval_samples_per_second": 27.411,
143
+ "eval_steps_per_second": 13.706,
144
+ "eval_top_10_accuracy": 0.6153846153846154,
145
+ "eval_top_1_accuracy": 0.22485207100591717,
146
+ "eval_top_5_accuracy": 0.47041420118343197,
147
+ "step": 901
148
+ },
149
+ {
150
+ "epoch": 5.002743055555555,
151
+ "grad_norm": 59.29837417602539,
152
+ "learning_rate": 1.3791666666666667e-05,
153
+ "loss": 11.2113,
154
+ "step": 1000
155
+ },
156
+ {
157
+ "epoch": 5.0049930555555555,
158
+ "eval_accuracy": 0.2603550295857988,
159
+ "eval_f1": 0.2214583903933016,
160
+ "eval_loss": 3.238880157470703,
161
+ "eval_precision": 0.24217353159660848,
162
+ "eval_recall": 0.2603550295857988,
163
+ "eval_runtime": 11.9439,
164
+ "eval_samples_per_second": 28.299,
165
+ "eval_steps_per_second": 14.149,
166
+ "eval_top_10_accuracy": 0.7366863905325444,
167
+ "eval_top_1_accuracy": 0.2603550295857988,
168
+ "eval_top_5_accuracy": 0.606508875739645,
169
+ "step": 1081
170
+ },
171
+ {
172
+ "epoch": 6.0005138888888885,
173
+ "grad_norm": 55.69058609008789,
174
+ "learning_rate": 1.5180555555555556e-05,
175
+ "loss": 10.4522,
176
+ "step": 1100
177
+ },
178
+ {
179
+ "epoch": 6.003291666666667,
180
+ "grad_norm": 61.44662094116211,
181
+ "learning_rate": 1.6569444444444447e-05,
182
+ "loss": 8.898,
183
+ "step": 1200
184
+ },
185
+ {
186
+ "epoch": 6.004986111111111,
187
+ "eval_accuracy": 0.3757396449704142,
188
+ "eval_f1": 0.33238789867749136,
189
+ "eval_loss": 2.8713691234588623,
190
+ "eval_precision": 0.358355869968296,
191
+ "eval_recall": 0.3757396449704142,
192
+ "eval_runtime": 11.7736,
193
+ "eval_samples_per_second": 28.708,
194
+ "eval_steps_per_second": 14.354,
195
+ "eval_top_10_accuracy": 0.8165680473372781,
196
+ "eval_top_1_accuracy": 0.3757396449704142,
197
+ "eval_top_5_accuracy": 0.6775147928994083,
198
+ "step": 1261
199
+ },
200
+ {
201
+ "epoch": 7.0010625,
202
+ "grad_norm": 52.68465042114258,
203
+ "learning_rate": 1.7958333333333334e-05,
204
+ "loss": 7.9604,
205
+ "step": 1300
206
+ },
207
+ {
208
+ "epoch": 7.003840277777778,
209
+ "grad_norm": 47.81911849975586,
210
+ "learning_rate": 1.934722222222222e-05,
211
+ "loss": 6.715,
212
+ "step": 1400
213
+ },
214
+ {
215
+ "epoch": 7.005006944444444,
216
+ "eval_accuracy": 0.4230769230769231,
217
+ "eval_f1": 0.372985971950469,
218
+ "eval_loss": 2.6518218517303467,
219
+ "eval_precision": 0.3827507962123347,
220
+ "eval_recall": 0.4230769230769231,
221
+ "eval_runtime": 11.7923,
222
+ "eval_samples_per_second": 28.663,
223
+ "eval_steps_per_second": 14.331,
224
+ "eval_top_10_accuracy": 0.8402366863905325,
225
+ "eval_top_1_accuracy": 0.4230769230769231,
226
+ "eval_top_5_accuracy": 0.7248520710059172,
227
+ "step": 1442
228
+ },
229
+ {
230
+ "epoch": 8.001611111111112,
231
+ "grad_norm": 57.77692413330078,
232
+ "learning_rate": 2.0736111111111112e-05,
233
+ "loss": 5.661,
234
+ "step": 1500
235
+ },
236
+ {
237
+ "epoch": 8.004388888888888,
238
+ "grad_norm": 46.77447509765625,
239
+ "learning_rate": 2.2125000000000002e-05,
240
+ "loss": 4.8442,
241
+ "step": 1600
242
+ },
243
+ {
244
+ "epoch": 8.005,
245
+ "eval_accuracy": 0.46449704142011833,
246
+ "eval_f1": 0.4376789876789876,
247
+ "eval_loss": 2.329350471496582,
248
+ "eval_precision": 0.5076618893926585,
249
+ "eval_recall": 0.46449704142011833,
250
+ "eval_runtime": 11.6897,
251
+ "eval_samples_per_second": 28.914,
252
+ "eval_steps_per_second": 14.457,
253
+ "eval_top_10_accuracy": 0.8875739644970414,
254
+ "eval_top_1_accuracy": 0.46449704142011833,
255
+ "eval_top_5_accuracy": 0.7928994082840237,
256
+ "step": 1622
257
+ },
258
+ {
259
+ "epoch": 9.002159722222222,
260
+ "grad_norm": 37.64653396606445,
261
+ "learning_rate": 2.351388888888889e-05,
262
+ "loss": 3.5341,
263
+ "step": 1700
264
+ },
265
+ {
266
+ "epoch": 9.0049375,
267
+ "grad_norm": 37.44438171386719,
268
+ "learning_rate": 2.4902777777777777e-05,
269
+ "loss": 3.3825,
270
+ "step": 1800
271
+ },
272
+ {
273
+ "epoch": 9.004993055555556,
274
+ "eval_accuracy": 0.4911242603550296,
275
+ "eval_f1": 0.46539428684399103,
276
+ "eval_loss": 2.174729347229004,
277
+ "eval_precision": 0.5436003099464638,
278
+ "eval_recall": 0.4911242603550296,
279
+ "eval_runtime": 11.6633,
280
+ "eval_samples_per_second": 28.98,
281
+ "eval_steps_per_second": 14.49,
282
+ "eval_top_10_accuracy": 0.8964497041420119,
283
+ "eval_top_1_accuracy": 0.4911242603550296,
284
+ "eval_top_5_accuracy": 0.7899408284023669,
285
+ "step": 1802
286
+ },
287
+ {
288
+ "epoch": 10.002708333333333,
289
+ "grad_norm": 41.797882080078125,
290
+ "learning_rate": 2.629166666666667e-05,
291
+ "loss": 2.0471,
292
+ "step": 1900
293
+ },
294
+ {
295
+ "epoch": 10.00498611111111,
296
+ "eval_accuracy": 0.5177514792899408,
297
+ "eval_f1": 0.5056634660885672,
298
+ "eval_loss": 1.9989553689956665,
299
+ "eval_precision": 0.5871284164553396,
300
+ "eval_recall": 0.5177514792899408,
301
+ "eval_runtime": 11.7241,
302
+ "eval_samples_per_second": 28.83,
303
+ "eval_steps_per_second": 14.415,
304
+ "eval_top_10_accuracy": 0.9053254437869822,
305
+ "eval_top_1_accuracy": 0.514792899408284,
306
+ "eval_top_5_accuracy": 0.8106508875739645,
307
+ "step": 1982
308
+ },
309
+ {
310
+ "epoch": 11.000479166666667,
311
+ "grad_norm": 35.05084228515625,
312
+ "learning_rate": 2.7680555555555558e-05,
313
+ "loss": 2.1684,
314
+ "step": 2000
315
+ },
316
+ {
317
+ "epoch": 11.003256944444445,
318
+ "grad_norm": 32.317020416259766,
319
+ "learning_rate": 2.9069444444444442e-05,
320
+ "loss": 1.3242,
321
+ "step": 2100
322
+ },
323
+ {
324
+ "epoch": 11.005006944444444,
325
+ "eval_accuracy": 0.5473372781065089,
326
+ "eval_f1": 0.5199061622138544,
327
+ "eval_loss": 1.896411418914795,
328
+ "eval_precision": 0.582156945618484,
329
+ "eval_recall": 0.5473372781065089,
330
+ "eval_runtime": 11.653,
331
+ "eval_samples_per_second": 29.005,
332
+ "eval_steps_per_second": 14.503,
333
+ "eval_top_10_accuracy": 0.893491124260355,
334
+ "eval_top_1_accuracy": 0.5473372781065089,
335
+ "eval_top_5_accuracy": 0.8165680473372781,
336
+ "step": 2163
337
+ },
338
+ {
339
+ "epoch": 12.001027777777777,
340
+ "grad_norm": 20.89638900756836,
341
+ "learning_rate": 3.0458333333333333e-05,
342
+ "loss": 1.244,
343
+ "step": 2200
344
+ },
345
+ {
346
+ "epoch": 12.003805555555555,
347
+ "grad_norm": 25.03165054321289,
348
+ "learning_rate": 3.184722222222222e-05,
349
+ "loss": 0.8746,
350
+ "step": 2300
351
+ },
352
+ {
353
+ "epoch": 12.005,
354
+ "eval_accuracy": 0.5562130177514792,
355
+ "eval_f1": 0.531996208919286,
356
+ "eval_loss": 1.8221518993377686,
357
+ "eval_precision": 0.5796251825097979,
358
+ "eval_recall": 0.5562130177514792,
359
+ "eval_runtime": 11.5373,
360
+ "eval_samples_per_second": 29.296,
361
+ "eval_steps_per_second": 14.648,
362
+ "eval_top_10_accuracy": 0.908284023668639,
363
+ "eval_top_1_accuracy": 0.5562130177514792,
364
+ "eval_top_5_accuracy": 0.8254437869822485,
365
+ "step": 2343
366
+ },
367
+ {
368
+ "epoch": 13.00157638888889,
369
+ "grad_norm": 30.899749755859375,
370
+ "learning_rate": 3.3236111111111114e-05,
371
+ "loss": 0.6561,
372
+ "step": 2400
373
+ },
374
+ {
375
+ "epoch": 13.004354166666667,
376
+ "grad_norm": 9.373037338256836,
377
+ "learning_rate": 3.4625e-05,
378
+ "loss": 0.5537,
379
+ "step": 2500
380
+ },
381
+ {
382
+ "epoch": 13.004993055555556,
383
+ "eval_accuracy": 0.5769230769230769,
384
+ "eval_f1": 0.5467897487128257,
385
+ "eval_loss": 1.7525219917297363,
386
+ "eval_precision": 0.5813186813186813,
387
+ "eval_recall": 0.5769230769230769,
388
+ "eval_runtime": 11.5774,
389
+ "eval_samples_per_second": 29.195,
390
+ "eval_steps_per_second": 14.597,
391
+ "eval_top_10_accuracy": 0.9142011834319527,
392
+ "eval_top_1_accuracy": 0.5769230769230769,
393
+ "eval_top_5_accuracy": 0.834319526627219,
394
+ "step": 2523
395
+ },
396
+ {
397
+ "epoch": 14.002125,
398
+ "grad_norm": 4.441218376159668,
399
+ "learning_rate": 3.601388888888889e-05,
400
+ "loss": 0.3664,
401
+ "step": 2600
402
+ },
403
+ {
404
+ "epoch": 14.004902777777778,
405
+ "grad_norm": 39.638824462890625,
406
+ "learning_rate": 3.740277777777778e-05,
407
+ "loss": 0.4081,
408
+ "step": 2700
409
+ },
410
+ {
411
+ "epoch": 14.00498611111111,
412
+ "eval_accuracy": 0.5946745562130178,
413
+ "eval_f1": 0.5833541540642132,
414
+ "eval_loss": 1.7350622415542603,
415
+ "eval_precision": 0.6683572837418991,
416
+ "eval_recall": 0.5946745562130178,
417
+ "eval_runtime": 12.1445,
418
+ "eval_samples_per_second": 27.831,
419
+ "eval_steps_per_second": 13.916,
420
+ "eval_top_10_accuracy": 0.8964497041420119,
421
+ "eval_top_1_accuracy": 0.5946745562130178,
422
+ "eval_top_5_accuracy": 0.8136094674556213,
423
+ "step": 2703
424
+ },
425
+ {
426
+ "epoch": 15.002673611111112,
427
+ "grad_norm": 7.7865376472473145,
428
+ "learning_rate": 3.879166666666667e-05,
429
+ "loss": 0.17,
430
+ "step": 2800
431
+ },
432
+ {
433
+ "epoch": 15.005006944444444,
434
+ "eval_accuracy": 0.5591715976331361,
435
+ "eval_f1": 0.5341941753184356,
436
+ "eval_loss": 1.6997803449630737,
437
+ "eval_precision": 0.5763416071108378,
438
+ "eval_recall": 0.5591715976331361,
439
+ "eval_runtime": 12.1127,
440
+ "eval_samples_per_second": 27.905,
441
+ "eval_steps_per_second": 13.952,
442
+ "eval_top_10_accuracy": 0.908284023668639,
443
+ "eval_top_1_accuracy": 0.5591715976331361,
444
+ "eval_top_5_accuracy": 0.8224852071005917,
445
+ "step": 2884
446
+ },
447
+ {
448
+ "epoch": 16.000444444444444,
449
+ "grad_norm": 7.096034049987793,
450
+ "learning_rate": 4.018055555555556e-05,
451
+ "loss": 0.3333,
452
+ "step": 2900
453
+ },
454
+ {
455
+ "epoch": 16.003222222222224,
456
+ "grad_norm": 8.13654613494873,
457
+ "learning_rate": 4.1569444444444444e-05,
458
+ "loss": 0.2053,
459
+ "step": 3000
460
+ },
461
+ {
462
+ "epoch": 16.005,
463
+ "eval_accuracy": 0.5650887573964497,
464
+ "eval_f1": 0.5390417275032658,
465
+ "eval_loss": 1.7339895963668823,
466
+ "eval_precision": 0.6214919695688926,
467
+ "eval_recall": 0.5650887573964497,
468
+ "eval_runtime": 12.1867,
469
+ "eval_samples_per_second": 27.735,
470
+ "eval_steps_per_second": 13.868,
471
+ "eval_top_10_accuracy": 0.908284023668639,
472
+ "eval_top_1_accuracy": 0.5650887573964497,
473
+ "eval_top_5_accuracy": 0.834319526627219,
474
+ "step": 3064
475
+ },
476
+ {
477
+ "epoch": 17.000993055555554,
478
+ "grad_norm": 2.6876680850982666,
479
+ "learning_rate": 4.295833333333333e-05,
480
+ "loss": 0.1874,
481
+ "step": 3100
482
+ },
483
+ {
484
+ "epoch": 17.003770833333334,
485
+ "grad_norm": 29.59245491027832,
486
+ "learning_rate": 4.4347222222222226e-05,
487
+ "loss": 0.1434,
488
+ "step": 3200
489
+ },
490
+ {
491
+ "epoch": 17.004993055555556,
492
+ "eval_accuracy": 0.6005917159763313,
493
+ "eval_f1": 0.5806121557600847,
494
+ "eval_loss": 1.7350496053695679,
495
+ "eval_precision": 0.6346953096213452,
496
+ "eval_recall": 0.6005917159763313,
497
+ "eval_runtime": 12.2719,
498
+ "eval_samples_per_second": 27.543,
499
+ "eval_steps_per_second": 13.771,
500
+ "eval_top_10_accuracy": 0.9142011834319527,
501
+ "eval_top_1_accuracy": 0.6005917159763313,
502
+ "eval_top_5_accuracy": 0.8431952662721893,
503
+ "step": 3244
504
+ },
505
+ {
506
+ "epoch": 18.001541666666668,
507
+ "grad_norm": 0.8009536266326904,
508
+ "learning_rate": 4.573611111111111e-05,
509
+ "loss": 0.0921,
510
+ "step": 3300
511
+ },
512
+ {
513
+ "epoch": 18.004319444444445,
514
+ "grad_norm": 2.7831170558929443,
515
+ "learning_rate": 4.7125e-05,
516
+ "loss": 0.1957,
517
+ "step": 3400
518
+ },
519
+ {
520
+ "epoch": 18.004986111111112,
521
+ "eval_accuracy": 0.5621301775147929,
522
+ "eval_f1": 0.5349809460756207,
523
+ "eval_loss": 1.8179223537445068,
524
+ "eval_precision": 0.6059920848382387,
525
+ "eval_recall": 0.5621301775147929,
526
+ "eval_runtime": 12.0029,
527
+ "eval_samples_per_second": 28.16,
528
+ "eval_steps_per_second": 14.08,
529
+ "eval_top_10_accuracy": 0.9142011834319527,
530
+ "eval_top_1_accuracy": 0.5621301775147929,
531
+ "eval_top_5_accuracy": 0.8372781065088757,
532
+ "step": 3424
533
+ },
534
+ {
535
+ "epoch": 19.00209027777778,
536
+ "grad_norm": 0.349692165851593,
537
+ "learning_rate": 4.8513888888888894e-05,
538
+ "loss": 0.1161,
539
+ "step": 3500
540
+ },
541
+ {
542
+ "epoch": 19.004868055555555,
543
+ "grad_norm": 86.96188354492188,
544
+ "learning_rate": 4.990277777777778e-05,
545
+ "loss": 0.1636,
546
+ "step": 3600
547
+ },
548
+ {
549
+ "epoch": 19.005006944444446,
550
+ "eval_accuracy": 0.6153846153846154,
551
+ "eval_f1": 0.5916679966975824,
552
+ "eval_loss": 1.7831283807754517,
553
+ "eval_precision": 0.6401178949255872,
554
+ "eval_recall": 0.6153846153846154,
555
+ "eval_runtime": 11.9167,
556
+ "eval_samples_per_second": 28.364,
557
+ "eval_steps_per_second": 14.182,
558
+ "eval_top_10_accuracy": 0.8905325443786982,
559
+ "eval_top_1_accuracy": 0.6153846153846154,
560
+ "eval_top_5_accuracy": 0.8224852071005917,
561
+ "step": 3605
562
+ },
563
+ {
564
+ "epoch": 20.00263888888889,
565
+ "grad_norm": 0.18171709775924683,
566
+ "learning_rate": 4.985648148148148e-05,
567
+ "loss": 0.0908,
568
+ "step": 3700
569
+ },
570
+ {
571
+ "epoch": 20.005,
572
+ "eval_accuracy": 0.621301775147929,
573
+ "eval_f1": 0.6014277142975367,
574
+ "eval_loss": 1.7552212476730347,
575
+ "eval_precision": 0.6504156100309946,
576
+ "eval_recall": 0.621301775147929,
577
+ "eval_runtime": 11.9034,
578
+ "eval_samples_per_second": 28.395,
579
+ "eval_steps_per_second": 14.198,
580
+ "eval_top_10_accuracy": 0.9053254437869822,
581
+ "eval_top_1_accuracy": 0.621301775147929,
582
+ "eval_top_5_accuracy": 0.8402366863905325,
583
+ "step": 3785
584
+ },
585
+ {
586
+ "epoch": 21.000409722222223,
587
+ "grad_norm": 32.656211853027344,
588
+ "learning_rate": 4.970216049382716e-05,
589
+ "loss": 0.1217,
590
+ "step": 3800
591
+ },
592
+ {
593
+ "epoch": 21.0031875,
594
+ "grad_norm": 0.07043986022472382,
595
+ "learning_rate": 4.954783950617284e-05,
596
+ "loss": 0.058,
597
+ "step": 3900
598
+ },
599
+ {
600
+ "epoch": 21.004993055555556,
601
+ "eval_accuracy": 0.621301775147929,
602
+ "eval_f1": 0.5961879507737495,
603
+ "eval_loss": 1.8422198295593262,
604
+ "eval_precision": 0.6392187940264863,
605
+ "eval_recall": 0.621301775147929,
606
+ "eval_runtime": 11.7783,
607
+ "eval_samples_per_second": 28.697,
608
+ "eval_steps_per_second": 14.348,
609
+ "eval_top_10_accuracy": 0.9112426035502958,
610
+ "eval_top_1_accuracy": 0.6242603550295858,
611
+ "eval_top_5_accuracy": 0.8254437869822485,
612
+ "step": 3965
613
+ },
614
+ {
615
+ "epoch": 22.000958333333333,
616
+ "grad_norm": 0.16647548973560333,
617
+ "learning_rate": 4.939351851851852e-05,
618
+ "loss": 0.1357,
619
+ "step": 4000
620
+ },
621
+ {
622
+ "epoch": 22.00373611111111,
623
+ "grad_norm": 0.13775382936000824,
624
+ "learning_rate": 4.92391975308642e-05,
625
+ "loss": 0.0924,
626
+ "step": 4100
627
+ },
628
+ {
629
+ "epoch": 22.004986111111112,
630
+ "eval_accuracy": 0.6005917159763313,
631
+ "eval_f1": 0.5735153735153736,
632
+ "eval_loss": 1.834716796875,
633
+ "eval_precision": 0.6217596506058044,
634
+ "eval_recall": 0.6005917159763313,
635
+ "eval_runtime": 11.8787,
636
+ "eval_samples_per_second": 28.454,
637
+ "eval_steps_per_second": 14.227,
638
+ "eval_top_10_accuracy": 0.9201183431952663,
639
+ "eval_top_1_accuracy": 0.6005917159763313,
640
+ "eval_top_5_accuracy": 0.8224852071005917,
641
+ "step": 4145
642
+ },
643
+ {
644
+ "epoch": 23.001506944444444,
645
+ "grad_norm": 1.0737590789794922,
646
+ "learning_rate": 4.908487654320988e-05,
647
+ "loss": 0.102,
648
+ "step": 4200
649
+ },
650
+ {
651
+ "epoch": 23.004284722222224,
652
+ "grad_norm": 3.3763539791107178,
653
+ "learning_rate": 4.893055555555556e-05,
654
+ "loss": 0.0799,
655
+ "step": 4300
656
+ },
657
+ {
658
+ "epoch": 23.005006944444446,
659
+ "eval_accuracy": 0.6035502958579881,
660
+ "eval_f1": 0.572392564700257,
661
+ "eval_loss": 1.9649921655654907,
662
+ "eval_precision": 0.6182439355516278,
663
+ "eval_recall": 0.6035502958579881,
664
+ "eval_runtime": 12.1311,
665
+ "eval_samples_per_second": 27.862,
666
+ "eval_steps_per_second": 13.931,
667
+ "eval_top_10_accuracy": 0.8846153846153846,
668
+ "eval_top_1_accuracy": 0.6035502958579881,
669
+ "eval_top_5_accuracy": 0.8106508875739645,
670
+ "step": 4326
671
+ },
672
+ {
673
+ "epoch": 24.002055555555554,
674
+ "grad_norm": 116.17411804199219,
675
+ "learning_rate": 4.877623456790124e-05,
676
+ "loss": 0.1349,
677
+ "step": 4400
678
+ },
679
+ {
680
+ "epoch": 24.004833333333334,
681
+ "grad_norm": 47.67039108276367,
682
+ "learning_rate": 4.8621913580246915e-05,
683
+ "loss": 0.176,
684
+ "step": 4500
685
+ },
686
+ {
687
+ "epoch": 24.005,
688
+ "eval_accuracy": 0.5857988165680473,
689
+ "eval_f1": 0.5670846247769326,
690
+ "eval_loss": 1.9325687885284424,
691
+ "eval_precision": 0.6240464663541586,
692
+ "eval_recall": 0.5857988165680473,
693
+ "eval_runtime": 12.1023,
694
+ "eval_samples_per_second": 27.929,
695
+ "eval_steps_per_second": 13.964,
696
+ "eval_top_10_accuracy": 0.9142011834319527,
697
+ "eval_top_1_accuracy": 0.5857988165680473,
698
+ "eval_top_5_accuracy": 0.8402366863905325,
699
+ "step": 4506
700
+ },
701
+ {
702
+ "epoch": 25.002604166666668,
703
+ "grad_norm": 0.11518964916467667,
704
+ "learning_rate": 4.846759259259259e-05,
705
+ "loss": 0.0786,
706
+ "step": 4600
707
+ },
708
+ {
709
+ "epoch": 25.004993055555556,
710
+ "eval_accuracy": 0.6124260355029586,
711
+ "eval_f1": 0.599836816700722,
712
+ "eval_loss": 1.775345802307129,
713
+ "eval_precision": 0.660682586644125,
714
+ "eval_recall": 0.6124260355029586,
715
+ "eval_runtime": 11.4818,
716
+ "eval_samples_per_second": 29.438,
717
+ "eval_steps_per_second": 14.719,
718
+ "eval_top_10_accuracy": 0.9142011834319527,
719
+ "eval_top_1_accuracy": 0.6124260355029586,
720
+ "eval_top_5_accuracy": 0.849112426035503,
721
+ "step": 4686
722
+ },
723
+ {
724
+ "epoch": 26.000375,
725
+ "grad_norm": 0.5315603017807007,
726
+ "learning_rate": 4.831327160493828e-05,
727
+ "loss": 0.2613,
728
+ "step": 4700
729
+ },
730
+ {
731
+ "epoch": 26.00315277777778,
732
+ "grad_norm": 0.860506534576416,
733
+ "learning_rate": 4.81604938271605e-05,
734
+ "loss": 0.242,
735
+ "step": 4800
736
+ },
737
+ {
738
+ "epoch": 26.004986111111112,
739
+ "eval_accuracy": 0.5769230769230769,
740
+ "eval_f1": 0.5552139674920741,
741
+ "eval_loss": 2.021881580352783,
742
+ "eval_precision": 0.6336890673429134,
743
+ "eval_recall": 0.5769230769230769,
744
+ "eval_runtime": 11.9203,
745
+ "eval_samples_per_second": 28.355,
746
+ "eval_steps_per_second": 14.177,
747
+ "eval_top_10_accuracy": 0.8875739644970414,
748
+ "eval_top_1_accuracy": 0.5769230769230769,
749
+ "eval_top_5_accuracy": 0.772189349112426,
750
+ "step": 4866
751
+ },
752
+ {
753
+ "epoch": 27.000923611111112,
754
+ "grad_norm": 5.388744354248047,
755
+ "learning_rate": 4.8006172839506177e-05,
756
+ "loss": 0.1656,
757
+ "step": 4900
758
+ },
759
+ {
760
+ "epoch": 27.00370138888889,
761
+ "grad_norm": 2.032120704650879,
762
+ "learning_rate": 4.7851851851851854e-05,
763
+ "loss": 0.1767,
764
+ "step": 5000
765
+ },
766
+ {
767
+ "epoch": 27.005006944444446,
768
+ "eval_accuracy": 0.5828402366863905,
769
+ "eval_f1": 0.5721016163323855,
770
+ "eval_loss": 1.9743586778640747,
771
+ "eval_precision": 0.6330223031406463,
772
+ "eval_recall": 0.5828402366863905,
773
+ "eval_runtime": 11.9635,
774
+ "eval_samples_per_second": 28.253,
775
+ "eval_steps_per_second": 14.126,
776
+ "eval_top_10_accuracy": 0.9023668639053254,
777
+ "eval_top_1_accuracy": 0.5828402366863905,
778
+ "eval_top_5_accuracy": 0.8165680473372781,
779
+ "step": 5047
780
+ },
781
+ {
782
+ "epoch": 28.001472222222223,
783
+ "grad_norm": 1.4519288539886475,
784
+ "learning_rate": 4.769753086419753e-05,
785
+ "loss": 0.19,
786
+ "step": 5100
787
+ },
788
+ {
789
+ "epoch": 28.00425,
790
+ "grad_norm": 7.525012016296387,
791
+ "learning_rate": 4.754320987654321e-05,
792
+ "loss": 0.14,
793
+ "step": 5200
794
+ },
795
+ {
796
+ "epoch": 28.005,
797
+ "eval_accuracy": 0.5769230769230769,
798
+ "eval_f1": 0.5429622288219573,
799
+ "eval_loss": 2.1995532512664795,
800
+ "eval_precision": 0.5982988165680473,
801
+ "eval_recall": 0.5769230769230769,
802
+ "eval_runtime": 11.6103,
803
+ "eval_samples_per_second": 29.112,
804
+ "eval_steps_per_second": 14.556,
805
+ "eval_top_10_accuracy": 0.8609467455621301,
806
+ "eval_top_1_accuracy": 0.5769230769230769,
807
+ "eval_top_5_accuracy": 0.7810650887573964,
808
+ "step": 5227
809
+ },
810
+ {
811
+ "epoch": 29.002020833333333,
812
+ "grad_norm": 0.7301017045974731,
813
+ "learning_rate": 4.7388888888888894e-05,
814
+ "loss": 0.2472,
815
+ "step": 5300
816
+ },
817
+ {
818
+ "epoch": 29.00479861111111,
819
+ "grad_norm": 0.6413145065307617,
820
+ "learning_rate": 4.723456790123457e-05,
821
+ "loss": 0.104,
822
+ "step": 5400
823
+ },
824
+ {
825
+ "epoch": 29.004993055555556,
826
+ "eval_accuracy": 0.5769230769230769,
827
+ "eval_f1": 0.5640588044434198,
828
+ "eval_loss": 2.0880820751190186,
829
+ "eval_precision": 0.6145991828684136,
830
+ "eval_recall": 0.5769230769230769,
831
+ "eval_runtime": 11.4084,
832
+ "eval_samples_per_second": 29.627,
833
+ "eval_steps_per_second": 14.814,
834
+ "eval_top_10_accuracy": 0.8875739644970414,
835
+ "eval_top_1_accuracy": 0.5769230769230769,
836
+ "eval_top_5_accuracy": 0.8165680473372781,
837
+ "step": 5407
838
+ },
839
+ {
840
+ "epoch": 30.002569444444443,
841
+ "grad_norm": 0.333312064409256,
842
+ "learning_rate": 4.708024691358025e-05,
843
+ "loss": 0.1454,
844
+ "step": 5500
845
+ },
846
+ {
847
+ "epoch": 30.004986111111112,
848
+ "eval_accuracy": 0.5621301775147929,
849
+ "eval_f1": 0.5447567389875081,
850
+ "eval_loss": 2.33941388130188,
851
+ "eval_precision": 0.628030303030303,
852
+ "eval_recall": 0.5621301775147929,
853
+ "eval_runtime": 12.0785,
854
+ "eval_samples_per_second": 27.984,
855
+ "eval_steps_per_second": 13.992,
856
+ "eval_top_10_accuracy": 0.8905325443786982,
857
+ "eval_top_1_accuracy": 0.5621301775147929,
858
+ "eval_top_5_accuracy": 0.7958579881656804,
859
+ "step": 5587
860
+ },
861
+ {
862
+ "epoch": 31.000340277777777,
863
+ "grad_norm": 0.10098100453615189,
864
+ "learning_rate": 4.692592592592593e-05,
865
+ "loss": 0.1388,
866
+ "step": 5600
867
+ },
868
+ {
869
+ "epoch": 31.003118055555557,
870
+ "grad_norm": 0.03804658353328705,
871
+ "learning_rate": 4.6771604938271605e-05,
872
+ "loss": 0.2221,
873
+ "step": 5700
874
+ },
875
+ {
876
+ "epoch": 31.005006944444446,
877
+ "eval_accuracy": 0.5946745562130178,
878
+ "eval_f1": 0.5881447119612799,
879
+ "eval_loss": 1.9360294342041016,
880
+ "eval_precision": 0.6606297548605241,
881
+ "eval_recall": 0.5946745562130178,
882
+ "eval_runtime": 11.5488,
883
+ "eval_samples_per_second": 29.267,
884
+ "eval_steps_per_second": 14.634,
885
+ "eval_top_10_accuracy": 0.9023668639053254,
886
+ "eval_top_1_accuracy": 0.5946745562130178,
887
+ "eval_top_5_accuracy": 0.8224852071005917,
888
+ "step": 5768
889
+ },
890
+ {
891
+ "epoch": 32.00088888888889,
892
+ "grad_norm": 0.024992674589157104,
893
+ "learning_rate": 4.661728395061728e-05,
894
+ "loss": 0.1003,
895
+ "step": 5800
896
+ },
897
+ {
898
+ "epoch": 32.00366666666667,
899
+ "grad_norm": 0.2089391052722931,
900
+ "learning_rate": 4.646296296296297e-05,
901
+ "loss": 0.1026,
902
+ "step": 5900
903
+ },
904
+ {
905
+ "epoch": 32.005,
906
+ "eval_accuracy": 0.6035502958579881,
907
+ "eval_f1": 0.5831550927704774,
908
+ "eval_loss": 2.092036485671997,
909
+ "eval_precision": 0.6375950972104818,
910
+ "eval_recall": 0.6035502958579881,
911
+ "eval_runtime": 11.4387,
912
+ "eval_samples_per_second": 29.549,
913
+ "eval_steps_per_second": 14.774,
914
+ "eval_top_10_accuracy": 0.893491124260355,
915
+ "eval_top_1_accuracy": 0.6035502958579881,
916
+ "eval_top_5_accuracy": 0.8106508875739645,
917
+ "step": 5948
918
+ },
919
+ {
920
+ "epoch": 33.0014375,
921
+ "grad_norm": 0.051621366292238235,
922
+ "learning_rate": 4.6308641975308645e-05,
923
+ "loss": 0.0709,
924
+ "step": 6000
925
+ },
926
+ {
927
+ "epoch": 33.004215277777774,
928
+ "grad_norm": 0.05871783196926117,
929
+ "learning_rate": 4.615432098765433e-05,
930
+ "loss": 0.0968,
931
+ "step": 6100
932
+ },
933
+ {
934
+ "epoch": 33.00499305555556,
935
+ "eval_accuracy": 0.5739644970414202,
936
+ "eval_f1": 0.5541961818589037,
937
+ "eval_loss": 2.2745707035064697,
938
+ "eval_precision": 0.6308041317656701,
939
+ "eval_recall": 0.5739644970414202,
940
+ "eval_runtime": 11.2862,
941
+ "eval_samples_per_second": 29.948,
942
+ "eval_steps_per_second": 14.974,
943
+ "eval_top_10_accuracy": 0.8846153846153846,
944
+ "eval_top_1_accuracy": 0.5739644970414202,
945
+ "eval_top_5_accuracy": 0.8047337278106509,
946
+ "step": 6128
947
+ },
948
+ {
949
+ "epoch": 34.00198611111111,
950
+ "grad_norm": 0.23713918030261993,
951
+ "learning_rate": 4.600000000000001e-05,
952
+ "loss": 0.2097,
953
+ "step": 6200
954
+ },
955
+ {
956
+ "epoch": 34.00476388888889,
957
+ "grad_norm": 0.9631951451301575,
958
+ "learning_rate": 4.5845679012345684e-05,
959
+ "loss": 0.1864,
960
+ "step": 6300
961
+ },
962
+ {
963
+ "epoch": 34.00498611111111,
964
+ "eval_accuracy": 0.5887573964497042,
965
+ "eval_f1": 0.5704292684109307,
966
+ "eval_loss": 2.208103656768799,
967
+ "eval_precision": 0.639407621471231,
968
+ "eval_recall": 0.5887573964497042,
969
+ "eval_runtime": 11.7319,
970
+ "eval_samples_per_second": 28.81,
971
+ "eval_steps_per_second": 14.405,
972
+ "eval_top_10_accuracy": 0.8698224852071006,
973
+ "eval_top_1_accuracy": 0.5887573964497042,
974
+ "eval_top_5_accuracy": 0.8047337278106509,
975
+ "step": 6308
976
+ },
977
+ {
978
+ "epoch": 35.00253472222222,
979
+ "grad_norm": 0.04889826104044914,
980
+ "learning_rate": 4.569135802469136e-05,
981
+ "loss": 0.1353,
982
+ "step": 6400
983
+ },
984
+ {
985
+ "epoch": 35.005006944444446,
986
+ "eval_accuracy": 0.5798816568047337,
987
+ "eval_f1": 0.5635964955491581,
988
+ "eval_loss": 2.1853461265563965,
989
+ "eval_precision": 0.6133398652629422,
990
+ "eval_recall": 0.5798816568047337,
991
+ "eval_runtime": 11.3905,
992
+ "eval_samples_per_second": 29.674,
993
+ "eval_steps_per_second": 14.837,
994
+ "eval_top_10_accuracy": 0.893491124260355,
995
+ "eval_top_1_accuracy": 0.5798816568047337,
996
+ "eval_top_5_accuracy": 0.8254437869822485,
997
+ "step": 6489
998
+ },
999
+ {
1000
+ "epoch": 36.000305555555556,
1001
+ "grad_norm": 0.8964897394180298,
1002
+ "learning_rate": 4.553703703703704e-05,
1003
+ "loss": 0.1746,
1004
+ "step": 6500
1005
+ },
1006
+ {
1007
+ "epoch": 36.003083333333336,
1008
+ "grad_norm": 0.05014768987894058,
1009
+ "learning_rate": 4.538271604938272e-05,
1010
+ "loss": 0.1618,
1011
+ "step": 6600
1012
+ },
1013
+ {
1014
+ "epoch": 36.005,
1015
+ "eval_accuracy": 0.5710059171597633,
1016
+ "eval_f1": 0.5514817365409082,
1017
+ "eval_loss": 2.266056537628174,
1018
+ "eval_precision": 0.624323753169907,
1019
+ "eval_recall": 0.5710059171597633,
1020
+ "eval_runtime": 11.0844,
1021
+ "eval_samples_per_second": 30.493,
1022
+ "eval_steps_per_second": 15.247,
1023
+ "eval_top_10_accuracy": 0.8698224852071006,
1024
+ "eval_top_1_accuracy": 0.5710059171597633,
1025
+ "eval_top_5_accuracy": 0.7958579881656804,
1026
+ "step": 6669
1027
+ },
1028
+ {
1029
+ "epoch": 37.00085416666667,
1030
+ "grad_norm": 0.18619082868099213,
1031
+ "learning_rate": 4.5228395061728395e-05,
1032
+ "loss": 0.3118,
1033
+ "step": 6700
1034
+ },
1035
+ {
1036
+ "epoch": 37.00363194444444,
1037
+ "grad_norm": 0.18089838325977325,
1038
+ "learning_rate": 4.507407407407407e-05,
1039
+ "loss": 0.259,
1040
+ "step": 6800
1041
+ },
1042
+ {
1043
+ "epoch": 37.00499305555556,
1044
+ "eval_accuracy": 0.5739644970414202,
1045
+ "eval_f1": 0.5459162632239556,
1046
+ "eval_loss": 2.3162882328033447,
1047
+ "eval_precision": 0.6088229078613694,
1048
+ "eval_recall": 0.5739644970414202,
1049
+ "eval_runtime": 11.492,
1050
+ "eval_samples_per_second": 29.412,
1051
+ "eval_steps_per_second": 14.706,
1052
+ "eval_top_10_accuracy": 0.8579881656804734,
1053
+ "eval_top_1_accuracy": 0.5739644970414202,
1054
+ "eval_top_5_accuracy": 0.7869822485207101,
1055
+ "step": 6849
1056
+ },
1057
+ {
1058
+ "epoch": 38.00140277777778,
1059
+ "grad_norm": 0.049058422446250916,
1060
+ "learning_rate": 4.49212962962963e-05,
1061
+ "loss": 0.3374,
1062
+ "step": 6900
1063
+ },
1064
+ {
1065
+ "epoch": 38.00418055555556,
1066
+ "grad_norm": 0.4456841051578522,
1067
+ "learning_rate": 4.476697530864198e-05,
1068
+ "loss": 0.3394,
1069
+ "step": 7000
1070
+ },
1071
+ {
1072
+ "epoch": 38.00498611111111,
1073
+ "eval_accuracy": 0.5769230769230769,
1074
+ "eval_f1": 0.5614232086125578,
1075
+ "eval_loss": 2.0984292030334473,
1076
+ "eval_precision": 0.6154339250493096,
1077
+ "eval_recall": 0.5769230769230769,
1078
+ "eval_runtime": 11.997,
1079
+ "eval_samples_per_second": 28.174,
1080
+ "eval_steps_per_second": 14.087,
1081
+ "eval_top_10_accuracy": 0.8905325443786982,
1082
+ "eval_top_1_accuracy": 0.5769230769230769,
1083
+ "eval_top_5_accuracy": 0.7988165680473372,
1084
+ "step": 7029
1085
+ },
1086
+ {
1087
+ "epoch": 39.00195138888889,
1088
+ "grad_norm": 0.010750464163720608,
1089
+ "learning_rate": 4.4612654320987657e-05,
1090
+ "loss": 0.1193,
1091
+ "step": 7100
1092
+ },
1093
+ {
1094
+ "epoch": 39.004729166666664,
1095
+ "grad_norm": 0.22571489214897156,
1096
+ "learning_rate": 4.4458333333333334e-05,
1097
+ "loss": 0.0833,
1098
+ "step": 7200
1099
+ },
1100
+ {
1101
+ "epoch": 39.005006944444446,
1102
+ "eval_accuracy": 0.5532544378698225,
1103
+ "eval_f1": 0.5328080203819848,
1104
+ "eval_loss": 2.281132936477661,
1105
+ "eval_precision": 0.6051346089807629,
1106
+ "eval_recall": 0.5532544378698225,
1107
+ "eval_runtime": 11.8037,
1108
+ "eval_samples_per_second": 28.635,
1109
+ "eval_steps_per_second": 14.317,
1110
+ "eval_top_10_accuracy": 0.8698224852071006,
1111
+ "eval_top_1_accuracy": 0.5532544378698225,
1112
+ "eval_top_5_accuracy": 0.8047337278106509,
1113
+ "step": 7210
1114
+ },
1115
+ {
1116
+ "epoch": 40.0025,
1117
+ "grad_norm": 1.91786789894104,
1118
+ "learning_rate": 4.430401234567901e-05,
1119
+ "loss": 0.1259,
1120
+ "step": 7300
1121
+ },
1122
+ {
1123
+ "epoch": 40.005,
1124
+ "eval_accuracy": 0.5828402366863905,
1125
+ "eval_f1": 0.551048433814706,
1126
+ "eval_loss": 2.2599146366119385,
1127
+ "eval_precision": 0.5806347252353169,
1128
+ "eval_recall": 0.5828402366863905,
1129
+ "eval_runtime": 11.6047,
1130
+ "eval_samples_per_second": 29.126,
1131
+ "eval_steps_per_second": 14.563,
1132
+ "eval_top_10_accuracy": 0.8698224852071006,
1133
+ "eval_top_1_accuracy": 0.5828402366863905,
1134
+ "eval_top_5_accuracy": 0.7899408284023669,
1135
+ "step": 7390
1136
+ },
1137
+ {
1138
+ "epoch": 40.005,
1139
+ "step": 7390,
1140
+ "total_flos": 5.1831774087363035e+19,
1141
+ "train_loss": 3.1396320400444515,
1142
+ "train_runtime": 4689.7449,
1143
+ "train_samples_per_second": 61.411,
1144
+ "train_steps_per_second": 7.676
1145
+ }
1146
+ ],
1147
+ "logging_steps": 100,
1148
+ "max_steps": 36000,
1149
+ "num_input_tokens_seen": 0,
1150
+ "num_train_epochs": 9223372036854775807,
1151
+ "save_steps": 500,
1152
+ "stateful_callbacks": {
1153
+ "EarlyStoppingCallback": {
1154
+ "args": {
1155
+ "early_stopping_patience": 20,
1156
+ "early_stopping_threshold": 0.0
1157
+ },
1158
+ "attributes": {
1159
+ "early_stopping_patience_counter": 20
1160
+ }
1161
+ },
1162
+ "TrainerControl": {
1163
+ "args": {
1164
+ "should_epoch_stop": false,
1165
+ "should_evaluate": false,
1166
+ "should_log": false,
1167
+ "should_save": true,
1168
+ "should_training_stop": true
1169
+ },
1170
+ "attributes": {}
1171
+ }
1172
+ },
1173
+ "total_flos": 5.1831774087363035e+19,
1174
+ "train_batch_size": 2,
1175
+ "trial_name": null,
1176
+ "trial_params": null
1177
+ }
training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:532dc0da4d223346407fcf361a1413936e46f5fc28f3898dbdd97b70ed727bb5
3
+ size 5368