bobox commited on
Commit
ac0c9bd
·
verified ·
1 Parent(s): dbda4ee

Training in progress, step 305, checkpoint

Browse files
checkpoint-305/1_AdvancedWeightedPooling/config.json ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "embed_dim": 768,
3
+ "num_heads": 4,
4
+ "dropout": 0.025,
5
+ "bias": true,
6
+ "gate_min": 0.05,
7
+ "gate_max": 0.95,
8
+ "gate_dropout": 0.01,
9
+ "dropout_gate_open": 0.075,
10
+ "dropout_gate_close": 0.05,
11
+ "CLS_self_attn": 0
12
+ }
checkpoint-305/1_AdvancedWeightedPooling/pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:68b599379e5b06ef871fb82d51b71a5d4b321a2416ef815c4c2bbb4dc6f7ed7f
3
+ size 18940723
checkpoint-305/README.md ADDED
@@ -0,0 +1,1158 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: microsoft/deberta-v3-small
3
+ library_name: sentence-transformers
4
+ metrics:
5
+ - pearson_cosine
6
+ - spearman_cosine
7
+ - pearson_manhattan
8
+ - spearman_manhattan
9
+ - pearson_euclidean
10
+ - spearman_euclidean
11
+ - pearson_dot
12
+ - spearman_dot
13
+ - pearson_max
14
+ - spearman_max
15
+ - cosine_accuracy
16
+ - cosine_accuracy_threshold
17
+ - cosine_f1
18
+ - cosine_f1_threshold
19
+ - cosine_precision
20
+ - cosine_recall
21
+ - cosine_ap
22
+ - dot_accuracy
23
+ - dot_accuracy_threshold
24
+ - dot_f1
25
+ - dot_f1_threshold
26
+ - dot_precision
27
+ - dot_recall
28
+ - dot_ap
29
+ - manhattan_accuracy
30
+ - manhattan_accuracy_threshold
31
+ - manhattan_f1
32
+ - manhattan_f1_threshold
33
+ - manhattan_precision
34
+ - manhattan_recall
35
+ - manhattan_ap
36
+ - euclidean_accuracy
37
+ - euclidean_accuracy_threshold
38
+ - euclidean_f1
39
+ - euclidean_f1_threshold
40
+ - euclidean_precision
41
+ - euclidean_recall
42
+ - euclidean_ap
43
+ - max_accuracy
44
+ - max_accuracy_threshold
45
+ - max_f1
46
+ - max_f1_threshold
47
+ - max_precision
48
+ - max_recall
49
+ - max_ap
50
+ pipeline_tag: sentence-similarity
51
+ tags:
52
+ - sentence-transformers
53
+ - sentence-similarity
54
+ - feature-extraction
55
+ - generated_from_trainer
56
+ - dataset_size:32500
57
+ - loss:GISTEmbedLoss
58
+ widget:
59
+ - source_sentence: Fish hatch into larvae that are different from the adult form of
60
+ species.
61
+ sentences:
62
+ - Fish hatch into larvae that are different from the adult form of?
63
+ - amphibians hatch from eggs
64
+ - A solenoid or coil wrapped around iron or certain other metals can form a(n) electromagnet.
65
+ - source_sentence: About 200 countries and territories have reported coronavirus cases
66
+ in 2020 .
67
+ sentences:
68
+ - All-Time Olympic Games Medal Tally Analysis Home > Events > Olympics > Summer
69
+ > Medal Tally > All-Time All-Time Olympic Games Medal Tally (Summer Olympics)
70
+ Which country is the most successful at he Olympic Games? Here are the top ranked
71
+ countries in terms of total medals won when all of the summer Games are considered
72
+ (including the 2016 Rio Games). There are two tables presented, the first just
73
+ lists the top countries based on the total medals won, the second table factors
74
+ in how many Olympic Games the country appeared, averaging the total number of
75
+ medals per Olympiad. A victory in a team sport is counted as one medal. The USA
76
+ Has Won the Most Medals The US have clearly won the most gold medals and the most
77
+ medals overall, more than doubling the next ranked country (these figures include
78
+ medals won in Rio 2016). Second placed USSR had fewer appearances at the Olympics,
79
+ and actually won more medals on average (see the 2nd table). The top 10 includes
80
+ one country no longer in existence (the Soviet Union), so their medal totals will
81
+ obviously not increase, however China is expected to continue a rapid rise up
82
+ the ranks. With the addition of the 2016 data, China has moved up from 11th (in
83
+ 2008) to 9th (2012) to 7th (2016). The country which has attended the most games
84
+ without a medal is Monaco (20 Olympic Games), the country which has won the most
85
+ medals without winning a gold medal is Malaysia (0 gold, 7 silver, 4 bronze).
86
+ rank
87
+ - An example of a reproductive behavior is salmon returning to their birthplace
88
+ to lay their eggs
89
+ - more than 664,000 cases of COVID-19 have been reported in over 190 countries and
90
+ territories , resulting in approximately 30,800 deaths .
91
+ - source_sentence: The wave on a guitar string is transverse. the sound wave rattles
92
+ a sheet of paper in a direction that shows the sound wave is what?
93
+ sentences:
94
+ - A Honda motorcycle parked in a grass driveway
95
+ - In Panama tipping is a question of rewarding good service rather than an obligation.
96
+ Restaurant bills don't include gratuities; adding 10% is customary. Bellhops and
97
+ maids expect tips only in more expensive hotels, and $1–$2 per bag is the norm.
98
+ You should also give a tip of up to $10 per day to tour guides.
99
+ - Figure 16.33 The wave on a guitar string is transverse. The sound wave rattles
100
+ a sheet of paper in a direction that shows the sound wave is longitudinal.
101
+ - source_sentence: The thermal production of a stove is generically used for
102
+ sentences:
103
+ - In total , 28 US victims were killed , while Viet Cong losses were killed 345
104
+ and a further 192 estimated killed .
105
+ - a stove generates heat for cooking usually
106
+ - A teenager has been charged over an incident in which a four-year-old girl was
107
+ hurt when she was hit in the face by a brick thrown through a van window.
108
+ - source_sentence: can sweet potatoes cause itching?
109
+ sentences:
110
+ - 'People with a true potato allergy may react immediately after touching, peeling,
111
+ or eating potatoes. Symptoms may vary from person to person, but typical symptoms
112
+ of a potato allergy include: rhinitis, including itchy or stinging eyes, a runny
113
+ or stuffy nose, and sneezing.'
114
+ - riding a bike does not cause pollution
115
+ - "Dilation occurs when cell walls relax.. An aneurysm is a dilation, or bubble,\
116
+ \ that occurs in the wall of an artery. \n an artery can be relaxed by dilation"
117
+ model-index:
118
+ - name: SentenceTransformer based on microsoft/deberta-v3-small
119
+ results:
120
+ - task:
121
+ type: semantic-similarity
122
+ name: Semantic Similarity
123
+ dataset:
124
+ name: sts test
125
+ type: sts-test
126
+ metrics:
127
+ - type: pearson_cosine
128
+ value: 0.2749904272806095
129
+ name: Pearson Cosine
130
+ - type: spearman_cosine
131
+ value: 0.31159390381099095
132
+ name: Spearman Cosine
133
+ - type: pearson_manhattan
134
+ value: 0.2923996087310511
135
+ name: Pearson Manhattan
136
+ - type: spearman_manhattan
137
+ value: 0.3095556181083969
138
+ name: Spearman Manhattan
139
+ - type: pearson_euclidean
140
+ value: 0.2934483033082174
141
+ name: Pearson Euclidean
142
+ - type: spearman_euclidean
143
+ value: 0.3115817314678925
144
+ name: Spearman Euclidean
145
+ - type: pearson_dot
146
+ value: 0.27496363262371837
147
+ name: Pearson Dot
148
+ - type: spearman_dot
149
+ value: 0.31138581044552094
150
+ name: Spearman Dot
151
+ - type: pearson_max
152
+ value: 0.2934483033082174
153
+ name: Pearson Max
154
+ - type: spearman_max
155
+ value: 0.31159390381099095
156
+ name: Spearman Max
157
+ - task:
158
+ type: binary-classification
159
+ name: Binary Classification
160
+ dataset:
161
+ name: allNLI dev
162
+ type: allNLI-dev
163
+ metrics:
164
+ - type: cosine_accuracy
165
+ value: 0.67578125
166
+ name: Cosine Accuracy
167
+ - type: cosine_accuracy_threshold
168
+ value: 0.9452645182609558
169
+ name: Cosine Accuracy Threshold
170
+ - type: cosine_f1
171
+ value: 0.512
172
+ name: Cosine F1
173
+ - type: cosine_f1_threshold
174
+ value: 0.8565204739570618
175
+ name: Cosine F1 Threshold
176
+ - type: cosine_precision
177
+ value: 0.39143730886850153
178
+ name: Cosine Precision
179
+ - type: cosine_recall
180
+ value: 0.7398843930635838
181
+ name: Cosine Recall
182
+ - type: cosine_ap
183
+ value: 0.4264736612515921
184
+ name: Cosine Ap
185
+ - type: dot_accuracy
186
+ value: 0.67578125
187
+ name: Dot Accuracy
188
+ - type: dot_accuracy_threshold
189
+ value: 726.30615234375
190
+ name: Dot Accuracy Threshold
191
+ - type: dot_f1
192
+ value: 0.512
193
+ name: Dot F1
194
+ - type: dot_f1_threshold
195
+ value: 658.1103515625
196
+ name: Dot F1 Threshold
197
+ - type: dot_precision
198
+ value: 0.39143730886850153
199
+ name: Dot Precision
200
+ - type: dot_recall
201
+ value: 0.7398843930635838
202
+ name: Dot Recall
203
+ - type: dot_ap
204
+ value: 0.42647535250956575
205
+ name: Dot Ap
206
+ - type: manhattan_accuracy
207
+ value: 0.67578125
208
+ name: Manhattan Accuracy
209
+ - type: manhattan_accuracy_threshold
210
+ value: 201.49061584472656
211
+ name: Manhattan Accuracy Threshold
212
+ - type: manhattan_f1
213
+ value: 0.5107692307692308
214
+ name: Manhattan F1
215
+ - type: manhattan_f1_threshold
216
+ value: 417.52728271484375
217
+ name: Manhattan F1 Threshold
218
+ - type: manhattan_precision
219
+ value: 0.3480083857442348
220
+ name: Manhattan Precision
221
+ - type: manhattan_recall
222
+ value: 0.9595375722543352
223
+ name: Manhattan Recall
224
+ - type: manhattan_ap
225
+ value: 0.4252213828672732
226
+ name: Manhattan Ap
227
+ - type: euclidean_accuracy
228
+ value: 0.67578125
229
+ name: Euclidean Accuracy
230
+ - type: euclidean_accuracy_threshold
231
+ value: 9.171283721923828
232
+ name: Euclidean Accuracy Threshold
233
+ - type: euclidean_f1
234
+ value: 0.512
235
+ name: Euclidean F1
236
+ - type: euclidean_f1_threshold
237
+ value: 14.84876823425293
238
+ name: Euclidean F1 Threshold
239
+ - type: euclidean_precision
240
+ value: 0.39143730886850153
241
+ name: Euclidean Precision
242
+ - type: euclidean_recall
243
+ value: 0.7398843930635838
244
+ name: Euclidean Recall
245
+ - type: euclidean_ap
246
+ value: 0.4264736612515921
247
+ name: Euclidean Ap
248
+ - type: max_accuracy
249
+ value: 0.67578125
250
+ name: Max Accuracy
251
+ - type: max_accuracy_threshold
252
+ value: 726.30615234375
253
+ name: Max Accuracy Threshold
254
+ - type: max_f1
255
+ value: 0.512
256
+ name: Max F1
257
+ - type: max_f1_threshold
258
+ value: 658.1103515625
259
+ name: Max F1 Threshold
260
+ - type: max_precision
261
+ value: 0.39143730886850153
262
+ name: Max Precision
263
+ - type: max_recall
264
+ value: 0.9595375722543352
265
+ name: Max Recall
266
+ - type: max_ap
267
+ value: 0.42647535250956575
268
+ name: Max Ap
269
+ - task:
270
+ type: binary-classification
271
+ name: Binary Classification
272
+ dataset:
273
+ name: Qnli dev
274
+ type: Qnli-dev
275
+ metrics:
276
+ - type: cosine_accuracy
277
+ value: 0.634765625
278
+ name: Cosine Accuracy
279
+ - type: cosine_accuracy_threshold
280
+ value: 0.8508153557777405
281
+ name: Cosine Accuracy Threshold
282
+ - type: cosine_f1
283
+ value: 0.6505636070853462
284
+ name: Cosine F1
285
+ - type: cosine_f1_threshold
286
+ value: 0.7770615816116333
287
+ name: Cosine F1 Threshold
288
+ - type: cosine_precision
289
+ value: 0.5246753246753246
290
+ name: Cosine Precision
291
+ - type: cosine_recall
292
+ value: 0.8559322033898306
293
+ name: Cosine Recall
294
+ - type: cosine_ap
295
+ value: 0.6461335447626624
296
+ name: Cosine Ap
297
+ - type: dot_accuracy
298
+ value: 0.634765625
299
+ name: Dot Accuracy
300
+ - type: dot_accuracy_threshold
301
+ value: 653.7443237304688
302
+ name: Dot Accuracy Threshold
303
+ - type: dot_f1
304
+ value: 0.6505636070853462
305
+ name: Dot F1
306
+ - type: dot_f1_threshold
307
+ value: 597.0731811523438
308
+ name: Dot F1 Threshold
309
+ - type: dot_precision
310
+ value: 0.5246753246753246
311
+ name: Dot Precision
312
+ - type: dot_recall
313
+ value: 0.8559322033898306
314
+ name: Dot Recall
315
+ - type: dot_ap
316
+ value: 0.6461682282377894
317
+ name: Dot Ap
318
+ - type: manhattan_accuracy
319
+ value: 0.6328125
320
+ name: Manhattan Accuracy
321
+ - type: manhattan_accuracy_threshold
322
+ value: 331.46282958984375
323
+ name: Manhattan Accuracy Threshold
324
+ - type: manhattan_f1
325
+ value: 0.6501650165016502
326
+ name: Manhattan F1
327
+ - type: manhattan_f1_threshold
328
+ value: 404.6050109863281
329
+ name: Manhattan F1 Threshold
330
+ - type: manhattan_precision
331
+ value: 0.5324324324324324
332
+ name: Manhattan Precision
333
+ - type: manhattan_recall
334
+ value: 0.8347457627118644
335
+ name: Manhattan Recall
336
+ - type: manhattan_ap
337
+ value: 0.6431949026371255
338
+ name: Manhattan Ap
339
+ - type: euclidean_accuracy
340
+ value: 0.634765625
341
+ name: Euclidean Accuracy
342
+ - type: euclidean_accuracy_threshold
343
+ value: 15.141305923461914
344
+ name: Euclidean Accuracy Threshold
345
+ - type: euclidean_f1
346
+ value: 0.6505636070853462
347
+ name: Euclidean F1
348
+ - type: euclidean_f1_threshold
349
+ value: 18.50943946838379
350
+ name: Euclidean F1 Threshold
351
+ - type: euclidean_precision
352
+ value: 0.5246753246753246
353
+ name: Euclidean Precision
354
+ - type: euclidean_recall
355
+ value: 0.8559322033898306
356
+ name: Euclidean Recall
357
+ - type: euclidean_ap
358
+ value: 0.6461382925406688
359
+ name: Euclidean Ap
360
+ - type: max_accuracy
361
+ value: 0.634765625
362
+ name: Max Accuracy
363
+ - type: max_accuracy_threshold
364
+ value: 653.7443237304688
365
+ name: Max Accuracy Threshold
366
+ - type: max_f1
367
+ value: 0.6505636070853462
368
+ name: Max F1
369
+ - type: max_f1_threshold
370
+ value: 597.0731811523438
371
+ name: Max F1 Threshold
372
+ - type: max_precision
373
+ value: 0.5324324324324324
374
+ name: Max Precision
375
+ - type: max_recall
376
+ value: 0.8559322033898306
377
+ name: Max Recall
378
+ - type: max_ap
379
+ value: 0.6461682282377894
380
+ name: Max Ap
381
+ ---
382
+
383
+ # SentenceTransformer based on microsoft/deberta-v3-small
384
+
385
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [microsoft/deberta-v3-small](https://huggingface.co/microsoft/deberta-v3-small). It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
386
+
387
+ ## Model Details
388
+
389
+ ### Model Description
390
+ - **Model Type:** Sentence Transformer
391
+ - **Base model:** [microsoft/deberta-v3-small](https://huggingface.co/microsoft/deberta-v3-small) <!-- at revision a36c739020e01763fe789b4b85e2df55d6180012 -->
392
+ - **Maximum Sequence Length:** 512 tokens
393
+ - **Output Dimensionality:** 768 tokens
394
+ - **Similarity Function:** Cosine Similarity
395
+ <!-- - **Training Dataset:** Unknown -->
396
+ <!-- - **Language:** Unknown -->
397
+ <!-- - **License:** Unknown -->
398
+
399
+ ### Model Sources
400
+
401
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
402
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
403
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
404
+
405
+ ### Full Model Architecture
406
+
407
+ ```
408
+ SentenceTransformer(
409
+ (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: DebertaV2Model
410
+ (1): AdvancedWeightedPooling(
411
+ (alpha_dropout_layer): Dropout(p=0.01, inplace=False)
412
+ (gate_dropout_layer): Dropout(p=0.05, inplace=False)
413
+ (linear_cls_pj): Linear(in_features=768, out_features=768, bias=True)
414
+ (linear_cls_Qpj): Linear(in_features=768, out_features=768, bias=True)
415
+ (linear_mean_pj): Linear(in_features=768, out_features=768, bias=True)
416
+ (linear_attnOut): Linear(in_features=768, out_features=768, bias=True)
417
+ (mha): MultiheadAttention(
418
+ (out_proj): NonDynamicallyQuantizableLinear(in_features=768, out_features=768, bias=True)
419
+ )
420
+ (layernorm_output): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
421
+ (layernorm_weightedPooing): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
422
+ (layernorm_pjCls): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
423
+ (layernorm_pjMean): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
424
+ (layernorm_attnOut): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
425
+ )
426
+ )
427
+ ```
428
+
429
+ ## Usage
430
+
431
+ ### Direct Usage (Sentence Transformers)
432
+
433
+ First install the Sentence Transformers library:
434
+
435
+ ```bash
436
+ pip install -U sentence-transformers
437
+ ```
438
+
439
+ Then you can load this model and run inference.
440
+ ```python
441
+ from sentence_transformers import SentenceTransformer
442
+
443
+ # Download from the 🤗 Hub
444
+ model = SentenceTransformer("bobox/DeBERTa3-s-CustomPoolin-toytest3-step1-checkpoints-tmp")
445
+ # Run inference
446
+ sentences = [
447
+ 'can sweet potatoes cause itching?',
448
+ 'People with a true potato allergy may react immediately after touching, peeling, or eating potatoes. Symptoms may vary from person to person, but typical symptoms of a potato allergy include: rhinitis, including itchy or stinging eyes, a runny or stuffy nose, and sneezing.',
449
+ 'riding a bike does not cause pollution',
450
+ ]
451
+ embeddings = model.encode(sentences)
452
+ print(embeddings.shape)
453
+ # [3, 768]
454
+
455
+ # Get the similarity scores for the embeddings
456
+ similarities = model.similarity(embeddings, embeddings)
457
+ print(similarities.shape)
458
+ # [3, 3]
459
+ ```
460
+
461
+ <!--
462
+ ### Direct Usage (Transformers)
463
+
464
+ <details><summary>Click to see the direct usage in Transformers</summary>
465
+
466
+ </details>
467
+ -->
468
+
469
+ <!--
470
+ ### Downstream Usage (Sentence Transformers)
471
+
472
+ You can finetune this model on your own dataset.
473
+
474
+ <details><summary>Click to expand</summary>
475
+
476
+ </details>
477
+ -->
478
+
479
+ <!--
480
+ ### Out-of-Scope Use
481
+
482
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
483
+ -->
484
+
485
+ ## Evaluation
486
+
487
+ ### Metrics
488
+
489
+ #### Semantic Similarity
490
+ * Dataset: `sts-test`
491
+ * Evaluated with [<code>EmbeddingSimilarityEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.EmbeddingSimilarityEvaluator)
492
+
493
+ | Metric | Value |
494
+ |:--------------------|:-----------|
495
+ | pearson_cosine | 0.275 |
496
+ | **spearman_cosine** | **0.3116** |
497
+ | pearson_manhattan | 0.2924 |
498
+ | spearman_manhattan | 0.3096 |
499
+ | pearson_euclidean | 0.2934 |
500
+ | spearman_euclidean | 0.3116 |
501
+ | pearson_dot | 0.275 |
502
+ | spearman_dot | 0.3114 |
503
+ | pearson_max | 0.2934 |
504
+ | spearman_max | 0.3116 |
505
+
506
+ #### Binary Classification
507
+ * Dataset: `allNLI-dev`
508
+ * Evaluated with [<code>BinaryClassificationEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.BinaryClassificationEvaluator)
509
+
510
+ | Metric | Value |
511
+ |:-----------------------------|:-----------|
512
+ | cosine_accuracy | 0.6758 |
513
+ | cosine_accuracy_threshold | 0.9453 |
514
+ | cosine_f1 | 0.512 |
515
+ | cosine_f1_threshold | 0.8565 |
516
+ | cosine_precision | 0.3914 |
517
+ | cosine_recall | 0.7399 |
518
+ | cosine_ap | 0.4265 |
519
+ | dot_accuracy | 0.6758 |
520
+ | dot_accuracy_threshold | 726.3062 |
521
+ | dot_f1 | 0.512 |
522
+ | dot_f1_threshold | 658.1104 |
523
+ | dot_precision | 0.3914 |
524
+ | dot_recall | 0.7399 |
525
+ | dot_ap | 0.4265 |
526
+ | manhattan_accuracy | 0.6758 |
527
+ | manhattan_accuracy_threshold | 201.4906 |
528
+ | manhattan_f1 | 0.5108 |
529
+ | manhattan_f1_threshold | 417.5273 |
530
+ | manhattan_precision | 0.348 |
531
+ | manhattan_recall | 0.9595 |
532
+ | manhattan_ap | 0.4252 |
533
+ | euclidean_accuracy | 0.6758 |
534
+ | euclidean_accuracy_threshold | 9.1713 |
535
+ | euclidean_f1 | 0.512 |
536
+ | euclidean_f1_threshold | 14.8488 |
537
+ | euclidean_precision | 0.3914 |
538
+ | euclidean_recall | 0.7399 |
539
+ | euclidean_ap | 0.4265 |
540
+ | max_accuracy | 0.6758 |
541
+ | max_accuracy_threshold | 726.3062 |
542
+ | max_f1 | 0.512 |
543
+ | max_f1_threshold | 658.1104 |
544
+ | max_precision | 0.3914 |
545
+ | max_recall | 0.9595 |
546
+ | **max_ap** | **0.4265** |
547
+
548
+ #### Binary Classification
549
+ * Dataset: `Qnli-dev`
550
+ * Evaluated with [<code>BinaryClassificationEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.BinaryClassificationEvaluator)
551
+
552
+ | Metric | Value |
553
+ |:-----------------------------|:-----------|
554
+ | cosine_accuracy | 0.6348 |
555
+ | cosine_accuracy_threshold | 0.8508 |
556
+ | cosine_f1 | 0.6506 |
557
+ | cosine_f1_threshold | 0.7771 |
558
+ | cosine_precision | 0.5247 |
559
+ | cosine_recall | 0.8559 |
560
+ | cosine_ap | 0.6461 |
561
+ | dot_accuracy | 0.6348 |
562
+ | dot_accuracy_threshold | 653.7443 |
563
+ | dot_f1 | 0.6506 |
564
+ | dot_f1_threshold | 597.0732 |
565
+ | dot_precision | 0.5247 |
566
+ | dot_recall | 0.8559 |
567
+ | dot_ap | 0.6462 |
568
+ | manhattan_accuracy | 0.6328 |
569
+ | manhattan_accuracy_threshold | 331.4628 |
570
+ | manhattan_f1 | 0.6502 |
571
+ | manhattan_f1_threshold | 404.605 |
572
+ | manhattan_precision | 0.5324 |
573
+ | manhattan_recall | 0.8347 |
574
+ | manhattan_ap | 0.6432 |
575
+ | euclidean_accuracy | 0.6348 |
576
+ | euclidean_accuracy_threshold | 15.1413 |
577
+ | euclidean_f1 | 0.6506 |
578
+ | euclidean_f1_threshold | 18.5094 |
579
+ | euclidean_precision | 0.5247 |
580
+ | euclidean_recall | 0.8559 |
581
+ | euclidean_ap | 0.6461 |
582
+ | max_accuracy | 0.6348 |
583
+ | max_accuracy_threshold | 653.7443 |
584
+ | max_f1 | 0.6506 |
585
+ | max_f1_threshold | 597.0732 |
586
+ | max_precision | 0.5324 |
587
+ | max_recall | 0.8559 |
588
+ | **max_ap** | **0.6462** |
589
+
590
+ <!--
591
+ ## Bias, Risks and Limitations
592
+
593
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
594
+ -->
595
+
596
+ <!--
597
+ ### Recommendations
598
+
599
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
600
+ -->
601
+
602
+ ## Training Details
603
+
604
+ ### Training Dataset
605
+
606
+ #### Unnamed Dataset
607
+
608
+
609
+ * Size: 32,500 training samples
610
+ * Columns: <code>sentence1</code> and <code>sentence2</code>
611
+ * Approximate statistics based on the first 1000 samples:
612
+ | | sentence1 | sentence2 |
613
+ |:--------|:----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|
614
+ | type | string | string |
615
+ | details | <ul><li>min: 4 tokens</li><li>mean: 29.6 tokens</li><li>max: 369 tokens</li></ul> | <ul><li>min: 2 tokens</li><li>mean: 58.01 tokens</li><li>max: 437 tokens</li></ul> |
616
+ * Samples:
617
+ | sentence1 | sentence2 |
618
+ |:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
619
+ | <code>The song ‘Fashion for His Love’ by Lady Gaga is a tribute to which late fashion designer?</code> | <code>Fashion Of His Love by Lady Gaga Songfacts Fashion Of His Love by Lady Gaga Songfacts Songfacts Gaga explained in a tweet that this track from her Born This Way Special Edition album is about the late Alexander McQueen. The fashion designer committed suicide by hanging on February 11, 2010 and Gaga was deeply affected by the tragic death of McQueen, who was a close personal friend. That same month, she performed at the 2010 Brit Awards wearing one of his couture creations and she also paid tribute to her late friend by setting the date on the prison security cameras in her Telephone video as the same day that McQueen's body was discovered in his London home.</code> |
620
+ | <code>e.&#9;in solids the atoms are closely locked in position and can only vibrate, in liquids the atoms and molecules are more loosely connected and can collide with and move past one another, while in gases the atoms or molecules are free to move independently, colliding frequently.</code> | <code>Within a substance, atoms that collide frequently and move independently of one another are most likely in a gas</code> |
621
+ | <code>Helen Lederer is an English comedian .</code> | <code>Helen Lederer ( born 24 September 1954 ) is an English : //www.scotsman.com/news/now-or-never-1-1396369 comedian , writer and actress who emerged as part of the alternative comedy boom at the beginning of the 1980s .</code> |
622
+ * Loss: [<code>GISTEmbedLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#gistembedloss) with these parameters:
623
+ ```json
624
+ {'guide': SentenceTransformer(
625
+ (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
626
+ (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
627
+ (2): Normalize()
628
+ ), 'temperature': 0.025}
629
+ ```
630
+
631
+ ### Evaluation Dataset
632
+
633
+ #### Unnamed Dataset
634
+
635
+
636
+ * Size: 1,664 evaluation samples
637
+ * Columns: <code>sentence1</code> and <code>sentence2</code>
638
+ * Approximate statistics based on the first 1000 samples:
639
+ | | sentence1 | sentence2 |
640
+ |:--------|:-----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|
641
+ | type | string | string |
642
+ | details | <ul><li>min: 4 tokens</li><li>mean: 29.01 tokens</li><li>max: 367 tokens</li></ul> | <ul><li>min: 2 tokens</li><li>mean: 56.14 tokens</li><li>max: 389 tokens</li></ul> |
643
+ * Samples:
644
+ | sentence1 | sentence2 |
645
+ |:--------------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
646
+ | <code>What planet did the voyager 1 spacecraft visit in 1980?</code> | <code>The Voyager 1 spacecraft visited Saturn in 1980. Voyager 2 followed in 1981. These probes sent back detailed pictures of Saturn, its rings, and some of its moons ( Figure below ). From the Voyager data, we learned what Saturn’s rings are made of. They are particles of water and ice with a little bit of dust. There are several gaps in the rings. These gaps were cleared out by moons within the rings. Gravity attracts dust and gas to the moon from the ring. This leaves a gap in the rings. Other gaps in the rings are caused by the competing forces of Saturn and its moons outside the rings.</code> |
647
+ | <code>Diffusion Diffusion is a process where atoms or molecules move from areas of high concentration to areas of low concentration.</code> | <code>Diffusion is the process in which a substance naturally moves from an area of higher to lower concentration.</code> |
648
+ | <code>Who had an 80s No 1 with Don't You Want Me?</code> | <code>The Human League - Don't You Want Me - YouTube The Human League - Don't You Want Me Want to watch this again later? Sign in to add this video to a playlist. Need to report the video? Sign in to report inappropriate content. Rating is available when the video has been rented. This feature is not available right now. Please try again later. Uploaded on Feb 27, 2009 Music video by The Human League performing Don't You Want Me (2003 Digital Remaster). Category</code> |
649
+ * Loss: [<code>GISTEmbedLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#gistembedloss) with these parameters:
650
+ ```json
651
+ {'guide': SentenceTransformer(
652
+ (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
653
+ (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
654
+ (2): Normalize()
655
+ ), 'temperature': 0.025}
656
+ ```
657
+
658
+ ### Training Hyperparameters
659
+ #### Non-Default Hyperparameters
660
+
661
+ - `eval_strategy`: steps
662
+ - `per_device_train_batch_size`: 32
663
+ - `per_device_eval_batch_size`: 256
664
+ - `lr_scheduler_type`: cosine_with_min_lr
665
+ - `lr_scheduler_kwargs`: {'num_cycles': 0.5, 'min_lr': 3.3333333333333337e-06}
666
+ - `warmup_ratio`: 0.33
667
+ - `save_safetensors`: False
668
+ - `fp16`: True
669
+ - `push_to_hub`: True
670
+ - `hub_model_id`: bobox/DeBERTa3-s-CustomPoolin-toytest3-step1-checkpoints-tmp
671
+ - `hub_strategy`: all_checkpoints
672
+ - `batch_sampler`: no_duplicates
673
+
674
+ #### All Hyperparameters
675
+ <details><summary>Click to expand</summary>
676
+
677
+ - `overwrite_output_dir`: False
678
+ - `do_predict`: False
679
+ - `eval_strategy`: steps
680
+ - `prediction_loss_only`: True
681
+ - `per_device_train_batch_size`: 32
682
+ - `per_device_eval_batch_size`: 256
683
+ - `per_gpu_train_batch_size`: None
684
+ - `per_gpu_eval_batch_size`: None
685
+ - `gradient_accumulation_steps`: 1
686
+ - `eval_accumulation_steps`: None
687
+ - `torch_empty_cache_steps`: None
688
+ - `learning_rate`: 5e-05
689
+ - `weight_decay`: 0.0
690
+ - `adam_beta1`: 0.9
691
+ - `adam_beta2`: 0.999
692
+ - `adam_epsilon`: 1e-08
693
+ - `max_grad_norm`: 1.0
694
+ - `num_train_epochs`: 3
695
+ - `max_steps`: -1
696
+ - `lr_scheduler_type`: cosine_with_min_lr
697
+ - `lr_scheduler_kwargs`: {'num_cycles': 0.5, 'min_lr': 3.3333333333333337e-06}
698
+ - `warmup_ratio`: 0.33
699
+ - `warmup_steps`: 0
700
+ - `log_level`: passive
701
+ - `log_level_replica`: warning
702
+ - `log_on_each_node`: True
703
+ - `logging_nan_inf_filter`: True
704
+ - `save_safetensors`: False
705
+ - `save_on_each_node`: False
706
+ - `save_only_model`: False
707
+ - `restore_callback_states_from_checkpoint`: False
708
+ - `no_cuda`: False
709
+ - `use_cpu`: False
710
+ - `use_mps_device`: False
711
+ - `seed`: 42
712
+ - `data_seed`: None
713
+ - `jit_mode_eval`: False
714
+ - `use_ipex`: False
715
+ - `bf16`: False
716
+ - `fp16`: True
717
+ - `fp16_opt_level`: O1
718
+ - `half_precision_backend`: auto
719
+ - `bf16_full_eval`: False
720
+ - `fp16_full_eval`: False
721
+ - `tf32`: None
722
+ - `local_rank`: 0
723
+ - `ddp_backend`: None
724
+ - `tpu_num_cores`: None
725
+ - `tpu_metrics_debug`: False
726
+ - `debug`: []
727
+ - `dataloader_drop_last`: False
728
+ - `dataloader_num_workers`: 0
729
+ - `dataloader_prefetch_factor`: None
730
+ - `past_index`: -1
731
+ - `disable_tqdm`: False
732
+ - `remove_unused_columns`: True
733
+ - `label_names`: None
734
+ - `load_best_model_at_end`: False
735
+ - `ignore_data_skip`: False
736
+ - `fsdp`: []
737
+ - `fsdp_min_num_params`: 0
738
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
739
+ - `fsdp_transformer_layer_cls_to_wrap`: None
740
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
741
+ - `deepspeed`: None
742
+ - `label_smoothing_factor`: 0.0
743
+ - `optim`: adamw_torch
744
+ - `optim_args`: None
745
+ - `adafactor`: False
746
+ - `group_by_length`: False
747
+ - `length_column_name`: length
748
+ - `ddp_find_unused_parameters`: None
749
+ - `ddp_bucket_cap_mb`: None
750
+ - `ddp_broadcast_buffers`: False
751
+ - `dataloader_pin_memory`: True
752
+ - `dataloader_persistent_workers`: False
753
+ - `skip_memory_metrics`: True
754
+ - `use_legacy_prediction_loop`: False
755
+ - `push_to_hub`: True
756
+ - `resume_from_checkpoint`: None
757
+ - `hub_model_id`: bobox/DeBERTa3-s-CustomPoolin-toytest3-step1-checkpoints-tmp
758
+ - `hub_strategy`: all_checkpoints
759
+ - `hub_private_repo`: False
760
+ - `hub_always_push`: False
761
+ - `gradient_checkpointing`: False
762
+ - `gradient_checkpointing_kwargs`: None
763
+ - `include_inputs_for_metrics`: False
764
+ - `eval_do_concat_batches`: True
765
+ - `fp16_backend`: auto
766
+ - `push_to_hub_model_id`: None
767
+ - `push_to_hub_organization`: None
768
+ - `mp_parameters`:
769
+ - `auto_find_batch_size`: False
770
+ - `full_determinism`: False
771
+ - `torchdynamo`: None
772
+ - `ray_scope`: last
773
+ - `ddp_timeout`: 1800
774
+ - `torch_compile`: False
775
+ - `torch_compile_backend`: None
776
+ - `torch_compile_mode`: None
777
+ - `dispatch_batches`: None
778
+ - `split_batches`: None
779
+ - `include_tokens_per_second`: False
780
+ - `include_num_input_tokens_seen`: False
781
+ - `neftune_noise_alpha`: None
782
+ - `optim_target_modules`: None
783
+ - `batch_eval_metrics`: False
784
+ - `eval_on_start`: False
785
+ - `eval_use_gather_object`: False
786
+ - `batch_sampler`: no_duplicates
787
+ - `multi_dataset_batch_sampler`: proportional
788
+
789
+ </details>
790
+
791
+ ### Training Logs
792
+ <details><summary>Click to expand</summary>
793
+
794
+ | Epoch | Step | Training Loss | Validation Loss | sts-test_spearman_cosine | allNLI-dev_max_ap | Qnli-dev_max_ap |
795
+ |:------:|:----:|:-------------:|:---------------:|:------------------------:|:-----------------:|:---------------:|
796
+ | 0.0010 | 1 | 10.4072 | - | - | - | - |
797
+ | 0.0020 | 2 | 11.0865 | - | - | - | - |
798
+ | 0.0030 | 3 | 9.5114 | - | - | - | - |
799
+ | 0.0039 | 4 | 9.9584 | - | - | - | - |
800
+ | 0.0049 | 5 | 10.068 | - | - | - | - |
801
+ | 0.0059 | 6 | 11.0224 | - | - | - | - |
802
+ | 0.0069 | 7 | 9.7703 | - | - | - | - |
803
+ | 0.0079 | 8 | 10.5005 | - | - | - | - |
804
+ | 0.0089 | 9 | 10.1987 | - | - | - | - |
805
+ | 0.0098 | 10 | 10.0277 | - | - | - | - |
806
+ | 0.0108 | 11 | 10.6965 | - | - | - | - |
807
+ | 0.0118 | 12 | 10.0609 | - | - | - | - |
808
+ | 0.0128 | 13 | 11.6214 | - | - | - | - |
809
+ | 0.0138 | 14 | 9.4053 | - | - | - | - |
810
+ | 0.0148 | 15 | 10.4014 | - | - | - | - |
811
+ | 0.0157 | 16 | 10.4119 | - | - | - | - |
812
+ | 0.0167 | 17 | 9.4658 | - | - | - | - |
813
+ | 0.0177 | 18 | 9.2169 | - | - | - | - |
814
+ | 0.0187 | 19 | 11.2337 | - | - | - | - |
815
+ | 0.0197 | 20 | 11.0572 | - | - | - | - |
816
+ | 0.0207 | 21 | 11.0452 | - | - | - | - |
817
+ | 0.0217 | 22 | 10.31 | - | - | - | - |
818
+ | 0.0226 | 23 | 9.1395 | - | - | - | - |
819
+ | 0.0236 | 24 | 8.4201 | - | - | - | - |
820
+ | 0.0246 | 25 | 8.6036 | - | - | - | - |
821
+ | 0.0256 | 26 | 11.7579 | - | - | - | - |
822
+ | 0.0266 | 27 | 10.1307 | - | - | - | - |
823
+ | 0.0276 | 28 | 9.2915 | - | - | - | - |
824
+ | 0.0285 | 29 | 9.0208 | - | - | - | - |
825
+ | 0.0295 | 30 | 8.6867 | - | - | - | - |
826
+ | 0.0305 | 31 | 8.0925 | - | - | - | - |
827
+ | 0.0315 | 32 | 8.6617 | - | - | - | - |
828
+ | 0.0325 | 33 | 8.3374 | - | - | - | - |
829
+ | 0.0335 | 34 | 7.8566 | - | - | - | - |
830
+ | 0.0344 | 35 | 9.0698 | - | - | - | - |
831
+ | 0.0354 | 36 | 7.7727 | - | - | - | - |
832
+ | 0.0364 | 37 | 7.6128 | - | - | - | - |
833
+ | 0.0374 | 38 | 7.8762 | - | - | - | - |
834
+ | 0.0384 | 39 | 7.5191 | - | - | - | - |
835
+ | 0.0394 | 40 | 7.5638 | - | - | - | - |
836
+ | 0.0404 | 41 | 7.1878 | - | - | - | - |
837
+ | 0.0413 | 42 | 6.8878 | - | - | - | - |
838
+ | 0.0423 | 43 | 7.5775 | - | - | - | - |
839
+ | 0.0433 | 44 | 7.1076 | - | - | - | - |
840
+ | 0.0443 | 45 | 6.5589 | - | - | - | - |
841
+ | 0.0453 | 46 | 7.4456 | - | - | - | - |
842
+ | 0.0463 | 47 | 6.8233 | - | - | - | - |
843
+ | 0.0472 | 48 | 6.7633 | - | - | - | - |
844
+ | 0.0482 | 49 | 6.6024 | - | - | - | - |
845
+ | 0.0492 | 50 | 6.2778 | - | - | - | - |
846
+ | 0.0502 | 51 | 6.1026 | - | - | - | - |
847
+ | 0.0512 | 52 | 6.632 | - | - | - | - |
848
+ | 0.0522 | 53 | 6.6962 | - | - | - | - |
849
+ | 0.0531 | 54 | 5.8514 | - | - | - | - |
850
+ | 0.0541 | 55 | 5.9951 | - | - | - | - |
851
+ | 0.0551 | 56 | 5.4554 | - | - | - | - |
852
+ | 0.0561 | 57 | 6.0147 | - | - | - | - |
853
+ | 0.0571 | 58 | 5.215 | - | - | - | - |
854
+ | 0.0581 | 59 | 6.4525 | - | - | - | - |
855
+ | 0.0591 | 60 | 5.4048 | - | - | - | - |
856
+ | 0.0600 | 61 | 5.0424 | - | - | - | - |
857
+ | 0.0610 | 62 | 6.2646 | - | - | - | - |
858
+ | 0.0620 | 63 | 5.0847 | - | - | - | - |
859
+ | 0.0630 | 64 | 5.4415 | - | - | - | - |
860
+ | 0.0640 | 65 | 5.2469 | - | - | - | - |
861
+ | 0.0650 | 66 | 5.1378 | - | - | - | - |
862
+ | 0.0659 | 67 | 5.1636 | - | - | - | - |
863
+ | 0.0669 | 68 | 5.5596 | - | - | - | - |
864
+ | 0.0679 | 69 | 4.9508 | - | - | - | - |
865
+ | 0.0689 | 70 | 5.2355 | - | - | - | - |
866
+ | 0.0699 | 71 | 4.7359 | - | - | - | - |
867
+ | 0.0709 | 72 | 4.8947 | - | - | - | - |
868
+ | 0.0719 | 73 | 4.6269 | - | - | - | - |
869
+ | 0.0728 | 74 | 4.6072 | - | - | - | - |
870
+ | 0.0738 | 75 | 4.9125 | - | - | - | - |
871
+ | 0.0748 | 76 | 4.5856 | - | - | - | - |
872
+ | 0.0758 | 77 | 4.7879 | - | - | - | - |
873
+ | 0.0768 | 78 | 4.5348 | - | - | - | - |
874
+ | 0.0778 | 79 | 4.3554 | - | - | - | - |
875
+ | 0.0787 | 80 | 4.2984 | - | - | - | - |
876
+ | 0.0797 | 81 | 4.5505 | - | - | - | - |
877
+ | 0.0807 | 82 | 4.5325 | - | - | - | - |
878
+ | 0.0817 | 83 | 4.2725 | - | - | - | - |
879
+ | 0.0827 | 84 | 4.3054 | - | - | - | - |
880
+ | 0.0837 | 85 | 4.5536 | - | - | - | - |
881
+ | 0.0846 | 86 | 4.0265 | - | - | - | - |
882
+ | 0.0856 | 87 | 4.7453 | - | - | - | - |
883
+ | 0.0866 | 88 | 4.071 | - | - | - | - |
884
+ | 0.0876 | 89 | 4.1582 | - | - | - | - |
885
+ | 0.0886 | 90 | 4.1131 | - | - | - | - |
886
+ | 0.0896 | 91 | 3.6582 | - | - | - | - |
887
+ | 0.0906 | 92 | 4.143 | - | - | - | - |
888
+ | 0.0915 | 93 | 4.2273 | - | - | - | - |
889
+ | 0.0925 | 94 | 3.9321 | - | - | - | - |
890
+ | 0.0935 | 95 | 4.2061 | - | - | - | - |
891
+ | 0.0945 | 96 | 4.1042 | - | - | - | - |
892
+ | 0.0955 | 97 | 3.9513 | - | - | - | - |
893
+ | 0.0965 | 98 | 3.8627 | - | - | - | - |
894
+ | 0.0974 | 99 | 4.3613 | - | - | - | - |
895
+ | 0.0984 | 100 | 3.8513 | - | - | - | - |
896
+ | 0.0994 | 101 | 3.5866 | - | - | - | - |
897
+ | 0.1004 | 102 | 3.5239 | - | - | - | - |
898
+ | 0.1014 | 103 | 3.5921 | - | - | - | - |
899
+ | 0.1024 | 104 | 3.5962 | - | - | - | - |
900
+ | 0.1033 | 105 | 4.0001 | - | - | - | - |
901
+ | 0.1043 | 106 | 4.1374 | - | - | - | - |
902
+ | 0.1053 | 107 | 3.9049 | - | - | - | - |
903
+ | 0.1063 | 108 | 3.2511 | - | - | - | - |
904
+ | 0.1073 | 109 | 3.2479 | - | - | - | - |
905
+ | 0.1083 | 110 | 3.6414 | - | - | - | - |
906
+ | 0.1093 | 111 | 3.6429 | - | - | - | - |
907
+ | 0.1102 | 112 | 3.423 | - | - | - | - |
908
+ | 0.1112 | 113 | 3.4967 | - | - | - | - |
909
+ | 0.1122 | 114 | 3.7649 | - | - | - | - |
910
+ | 0.1132 | 115 | 3.2845 | - | - | - | - |
911
+ | 0.1142 | 116 | 3.356 | - | - | - | - |
912
+ | 0.1152 | 117 | 3.2086 | - | - | - | - |
913
+ | 0.1161 | 118 | 3.5561 | - | - | - | - |
914
+ | 0.1171 | 119 | 3.7353 | - | - | - | - |
915
+ | 0.1181 | 120 | 3.403 | - | - | - | - |
916
+ | 0.1191 | 121 | 3.1009 | - | - | - | - |
917
+ | 0.1201 | 122 | 3.2139 | - | - | - | - |
918
+ | 0.1211 | 123 | 3.3339 | - | - | - | - |
919
+ | 0.1220 | 124 | 2.9464 | - | - | - | - |
920
+ | 0.1230 | 125 | 3.3366 | - | - | - | - |
921
+ | 0.1240 | 126 | 3.0618 | - | - | - | - |
922
+ | 0.125 | 127 | 3.0092 | - | - | - | - |
923
+ | 0.1260 | 128 | 2.7152 | - | - | - | - |
924
+ | 0.1270 | 129 | 2.9423 | - | - | - | - |
925
+ | 0.1280 | 130 | 2.6569 | - | - | - | - |
926
+ | 0.1289 | 131 | 2.8469 | - | - | - | - |
927
+ | 0.1299 | 132 | 2.9089 | - | - | - | - |
928
+ | 0.1309 | 133 | 2.5809 | - | - | - | - |
929
+ | 0.1319 | 134 | 2.6987 | - | - | - | - |
930
+ | 0.1329 | 135 | 3.2518 | - | - | - | - |
931
+ | 0.1339 | 136 | 2.9145 | - | - | - | - |
932
+ | 0.1348 | 137 | 2.4809 | - | - | - | - |
933
+ | 0.1358 | 138 | 2.8264 | - | - | - | - |
934
+ | 0.1368 | 139 | 2.5724 | - | - | - | - |
935
+ | 0.1378 | 140 | 2.6949 | - | - | - | - |
936
+ | 0.1388 | 141 | 2.6925 | - | - | - | - |
937
+ | 0.1398 | 142 | 2.9311 | - | - | - | - |
938
+ | 0.1407 | 143 | 2.5667 | - | - | - | - |
939
+ | 0.1417 | 144 | 3.2471 | - | - | - | - |
940
+ | 0.1427 | 145 | 2.2441 | - | - | - | - |
941
+ | 0.1437 | 146 | 2.75 | - | - | - | - |
942
+ | 0.1447 | 147 | 2.9669 | - | - | - | - |
943
+ | 0.1457 | 148 | 2.736 | - | - | - | - |
944
+ | 0.1467 | 149 | 3.104 | - | - | - | - |
945
+ | 0.1476 | 150 | 2.2175 | - | - | - | - |
946
+ | 0.1486 | 151 | 2.7415 | - | - | - | - |
947
+ | 0.1496 | 152 | 1.8707 | - | - | - | - |
948
+ | 0.1506 | 153 | 2.5961 | 2.2653 | 0.3116 | 0.4265 | 0.6462 |
949
+ | 0.1516 | 154 | 3.1149 | - | - | - | - |
950
+ | 0.1526 | 155 | 2.2976 | - | - | - | - |
951
+ | 0.1535 | 156 | 2.4436 | - | - | - | - |
952
+ | 0.1545 | 157 | 2.8826 | - | - | - | - |
953
+ | 0.1555 | 158 | 2.3664 | - | - | - | - |
954
+ | 0.1565 | 159 | 2.2485 | - | - | - | - |
955
+ | 0.1575 | 160 | 2.5167 | - | - | - | - |
956
+ | 0.1585 | 161 | 1.7183 | - | - | - | - |
957
+ | 0.1594 | 162 | 2.1003 | - | - | - | - |
958
+ | 0.1604 | 163 | 2.5785 | - | - | - | - |
959
+ | 0.1614 | 164 | 2.8789 | - | - | - | - |
960
+ | 0.1624 | 165 | 2.3425 | - | - | - | - |
961
+ | 0.1634 | 166 | 2.0966 | - | - | - | - |
962
+ | 0.1644 | 167 | 2.1126 | - | - | - | - |
963
+ | 0.1654 | 168 | 2.1824 | - | - | - | - |
964
+ | 0.1663 | 169 | 2.2009 | - | - | - | - |
965
+ | 0.1673 | 170 | 2.3796 | - | - | - | - |
966
+ | 0.1683 | 171 | 2.3096 | - | - | - | - |
967
+ | 0.1693 | 172 | 2.7897 | - | - | - | - |
968
+ | 0.1703 | 173 | 2.2097 | - | - | - | - |
969
+ | 0.1713 | 174 | 1.7508 | - | - | - | - |
970
+ | 0.1722 | 175 | 2.353 | - | - | - | - |
971
+ | 0.1732 | 176 | 2.4276 | - | - | - | - |
972
+ | 0.1742 | 177 | 2.1016 | - | - | - | - |
973
+ | 0.1752 | 178 | 1.8461 | - | - | - | - |
974
+ | 0.1762 | 179 | 1.8104 | - | - | - | - |
975
+ | 0.1772 | 180 | 2.6023 | - | - | - | - |
976
+ | 0.1781 | 181 | 2.5261 | - | - | - | - |
977
+ | 0.1791 | 182 | 2.1053 | - | - | - | - |
978
+ | 0.1801 | 183 | 1.9712 | - | - | - | - |
979
+ | 0.1811 | 184 | 2.4693 | - | - | - | - |
980
+ | 0.1821 | 185 | 2.1119 | - | - | - | - |
981
+ | 0.1831 | 186 | 2.4797 | - | - | - | - |
982
+ | 0.1841 | 187 | 2.1587 | - | - | - | - |
983
+ | 0.1850 | 188 | 1.9578 | - | - | - | - |
984
+ | 0.1860 | 189 | 2.1368 | - | - | - | - |
985
+ | 0.1870 | 190 | 2.4212 | - | - | - | - |
986
+ | 0.1880 | 191 | 1.9591 | - | - | - | - |
987
+ | 0.1890 | 192 | 1.5816 | - | - | - | - |
988
+ | 0.1900 | 193 | 1.4029 | - | - | - | - |
989
+ | 0.1909 | 194 | 1.9385 | - | - | - | - |
990
+ | 0.1919 | 195 | 1.5596 | - | - | - | - |
991
+ | 0.1929 | 196 | 1.6663 | - | - | - | - |
992
+ | 0.1939 | 197 | 2.0026 | - | - | - | - |
993
+ | 0.1949 | 198 | 2.0046 | - | - | - | - |
994
+ | 0.1959 | 199 | 1.5016 | - | - | - | - |
995
+ | 0.1969 | 200 | 2.184 | - | - | - | - |
996
+ | 0.1978 | 201 | 2.3442 | - | - | - | - |
997
+ | 0.1988 | 202 | 2.6981 | - | - | - | - |
998
+ | 0.1998 | 203 | 2.5481 | - | - | - | - |
999
+ | 0.2008 | 204 | 2.9798 | - | - | - | - |
1000
+ | 0.2018 | 205 | 2.287 | - | - | - | - |
1001
+ | 0.2028 | 206 | 1.9393 | - | - | - | - |
1002
+ | 0.2037 | 207 | 2.892 | - | - | - | - |
1003
+ | 0.2047 | 208 | 2.26 | - | - | - | - |
1004
+ | 0.2057 | 209 | 2.5911 | - | - | - | - |
1005
+ | 0.2067 | 210 | 2.1239 | - | - | - | - |
1006
+ | 0.2077 | 211 | 2.0683 | - | - | - | - |
1007
+ | 0.2087 | 212 | 1.768 | - | - | - | - |
1008
+ | 0.2096 | 213 | 2.5468 | - | - | - | - |
1009
+ | 0.2106 | 214 | 1.8956 | - | - | - | - |
1010
+ | 0.2116 | 215 | 2.044 | - | - | - | - |
1011
+ | 0.2126 | 216 | 1.5721 | - | - | - | - |
1012
+ | 0.2136 | 217 | 1.6278 | - | - | - | - |
1013
+ | 0.2146 | 218 | 1.7754 | - | - | - | - |
1014
+ | 0.2156 | 219 | 1.8594 | - | - | - | - |
1015
+ | 0.2165 | 220 | 1.8309 | - | - | - | - |
1016
+ | 0.2175 | 221 | 2.0619 | - | - | - | - |
1017
+ | 0.2185 | 222 | 2.3335 | - | - | - | - |
1018
+ | 0.2195 | 223 | 2.023 | - | - | - | - |
1019
+ | 0.2205 | 224 | 2.1975 | - | - | - | - |
1020
+ | 0.2215 | 225 | 1.9228 | - | - | - | - |
1021
+ | 0.2224 | 226 | 2.3565 | - | - | - | - |
1022
+ | 0.2234 | 227 | 1.896 | - | - | - | - |
1023
+ | 0.2244 | 228 | 2.0912 | - | - | - | - |
1024
+ | 0.2254 | 229 | 2.7703 | - | - | - | - |
1025
+ | 0.2264 | 230 | 1.6988 | - | - | - | - |
1026
+ | 0.2274 | 231 | 2.0406 | - | - | - | - |
1027
+ | 0.2283 | 232 | 1.9288 | - | - | - | - |
1028
+ | 0.2293 | 233 | 2.0457 | - | - | - | - |
1029
+ | 0.2303 | 234 | 1.7061 | - | - | - | - |
1030
+ | 0.2313 | 235 | 1.6244 | - | - | - | - |
1031
+ | 0.2323 | 236 | 2.0241 | - | - | - | - |
1032
+ | 0.2333 | 237 | 1.567 | - | - | - | - |
1033
+ | 0.2343 | 238 | 1.8084 | - | - | - | - |
1034
+ | 0.2352 | 239 | 2.4363 | - | - | - | - |
1035
+ | 0.2362 | 240 | 1.7532 | - | - | - | - |
1036
+ | 0.2372 | 241 | 2.0797 | - | - | - | - |
1037
+ | 0.2382 | 242 | 1.9562 | - | - | - | - |
1038
+ | 0.2392 | 243 | 1.6751 | - | - | - | - |
1039
+ | 0.2402 | 244 | 2.0265 | - | - | - | - |
1040
+ | 0.2411 | 245 | 1.6065 | - | - | - | - |
1041
+ | 0.2421 | 246 | 1.7439 | - | - | - | - |
1042
+ | 0.2431 | 247 | 2.0237 | - | - | - | - |
1043
+ | 0.2441 | 248 | 1.6128 | - | - | - | - |
1044
+ | 0.2451 | 249 | 1.6581 | - | - | - | - |
1045
+ | 0.2461 | 250 | 2.1538 | - | - | - | - |
1046
+ | 0.2470 | 251 | 2.049 | - | - | - | - |
1047
+ | 0.2480 | 252 | 1.2573 | - | - | - | - |
1048
+ | 0.2490 | 253 | 1.5619 | - | - | - | - |
1049
+ | 0.25 | 254 | 1.2611 | - | - | - | - |
1050
+ | 0.2510 | 255 | 1.3443 | - | - | - | - |
1051
+ | 0.2520 | 256 | 1.3436 | - | - | - | - |
1052
+ | 0.2530 | 257 | 2.8117 | - | - | - | - |
1053
+ | 0.2539 | 258 | 1.7563 | - | - | - | - |
1054
+ | 0.2549 | 259 | 1.3148 | - | - | - | - |
1055
+ | 0.2559 | 260 | 2.0278 | - | - | - | - |
1056
+ | 0.2569 | 261 | 1.2403 | - | - | - | - |
1057
+ | 0.2579 | 262 | 1.588 | - | - | - | - |
1058
+ | 0.2589 | 263 | 2.0071 | - | - | - | - |
1059
+ | 0.2598 | 264 | 1.5312 | - | - | - | - |
1060
+ | 0.2608 | 265 | 1.8641 | - | - | - | - |
1061
+ | 0.2618 | 266 | 1.2933 | - | - | - | - |
1062
+ | 0.2628 | 267 | 1.6262 | - | - | - | - |
1063
+ | 0.2638 | 268 | 1.721 | - | - | - | - |
1064
+ | 0.2648 | 269 | 1.4713 | - | - | - | - |
1065
+ | 0.2657 | 270 | 1.4625 | - | - | - | - |
1066
+ | 0.2667 | 271 | 1.7254 | - | - | - | - |
1067
+ | 0.2677 | 272 | 1.5108 | - | - | - | - |
1068
+ | 0.2687 | 273 | 2.1126 | - | - | - | - |
1069
+ | 0.2697 | 274 | 1.3967 | - | - | - | - |
1070
+ | 0.2707 | 275 | 1.7067 | - | - | - | - |
1071
+ | 0.2717 | 276 | 1.4847 | - | - | - | - |
1072
+ | 0.2726 | 277 | 1.6515 | - | - | - | - |
1073
+ | 0.2736 | 278 | 0.9367 | - | - | - | - |
1074
+ | 0.2746 | 279 | 2.0267 | - | - | - | - |
1075
+ | 0.2756 | 280 | 1.5023 | - | - | - | - |
1076
+ | 0.2766 | 281 | 1.1248 | - | - | - | - |
1077
+ | 0.2776 | 282 | 1.6224 | - | - | - | - |
1078
+ | 0.2785 | 283 | 1.7969 | - | - | - | - |
1079
+ | 0.2795 | 284 | 2.2498 | - | - | - | - |
1080
+ | 0.2805 | 285 | 1.7477 | - | - | - | - |
1081
+ | 0.2815 | 286 | 1.6261 | - | - | - | - |
1082
+ | 0.2825 | 287 | 2.0911 | - | - | - | - |
1083
+ | 0.2835 | 288 | 1.9519 | - | - | - | - |
1084
+ | 0.2844 | 289 | 1.3132 | - | - | - | - |
1085
+ | 0.2854 | 290 | 2.3292 | - | - | - | - |
1086
+ | 0.2864 | 291 | 1.3781 | - | - | - | - |
1087
+ | 0.2874 | 292 | 1.5753 | - | - | - | - |
1088
+ | 0.2884 | 293 | 1.4158 | - | - | - | - |
1089
+ | 0.2894 | 294 | 2.1661 | - | - | - | - |
1090
+ | 0.2904 | 295 | 1.4928 | - | - | - | - |
1091
+ | 0.2913 | 296 | 2.2825 | - | - | - | - |
1092
+ | 0.2923 | 297 | 1.7261 | - | - | - | - |
1093
+ | 0.2933 | 298 | 1.8635 | - | - | - | - |
1094
+ | 0.2943 | 299 | 0.974 | - | - | - | - |
1095
+ | 0.2953 | 300 | 1.53 | - | - | - | - |
1096
+ | 0.2963 | 301 | 1.5985 | - | - | - | - |
1097
+ | 0.2972 | 302 | 1.2169 | - | - | - | - |
1098
+ | 0.2982 | 303 | 1.771 | - | - | - | - |
1099
+ | 0.2992 | 304 | 1.4506 | - | - | - | - |
1100
+ | 0.3002 | 305 | 1.9496 | - | - | - | - |
1101
+
1102
+ </details>
1103
+
1104
+ ### Framework Versions
1105
+ - Python: 3.10.12
1106
+ - Sentence Transformers: 3.2.1
1107
+ - Transformers: 4.44.2
1108
+ - PyTorch: 2.5.0+cu121
1109
+ - Accelerate: 0.34.2
1110
+ - Datasets: 3.0.2
1111
+ - Tokenizers: 0.19.1
1112
+
1113
+ ## Citation
1114
+
1115
+ ### BibTeX
1116
+
1117
+ #### Sentence Transformers
1118
+ ```bibtex
1119
+ @inproceedings{reimers-2019-sentence-bert,
1120
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
1121
+ author = "Reimers, Nils and Gurevych, Iryna",
1122
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
1123
+ month = "11",
1124
+ year = "2019",
1125
+ publisher = "Association for Computational Linguistics",
1126
+ url = "https://arxiv.org/abs/1908.10084",
1127
+ }
1128
+ ```
1129
+
1130
+ #### GISTEmbedLoss
1131
+ ```bibtex
1132
+ @misc{solatorio2024gistembed,
1133
+ title={GISTEmbed: Guided In-sample Selection of Training Negatives for Text Embedding Fine-tuning},
1134
+ author={Aivin V. Solatorio},
1135
+ year={2024},
1136
+ eprint={2402.16829},
1137
+ archivePrefix={arXiv},
1138
+ primaryClass={cs.LG}
1139
+ }
1140
+ ```
1141
+
1142
+ <!--
1143
+ ## Glossary
1144
+
1145
+ *Clearly define terms in order to be accessible across audiences.*
1146
+ -->
1147
+
1148
+ <!--
1149
+ ## Model Card Authors
1150
+
1151
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
1152
+ -->
1153
+
1154
+ <!--
1155
+ ## Model Card Contact
1156
+
1157
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
1158
+ -->
checkpoint-305/added_tokens.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ {
2
+ "[MASK]": 128000
3
+ }
checkpoint-305/config.json ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "microsoft/deberta-v3-small",
3
+ "architectures": [
4
+ "DebertaV2Model"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "hidden_act": "gelu",
8
+ "hidden_dropout_prob": 0.1,
9
+ "hidden_size": 768,
10
+ "initializer_range": 0.02,
11
+ "intermediate_size": 3072,
12
+ "layer_norm_eps": 1e-07,
13
+ "max_position_embeddings": 512,
14
+ "max_relative_positions": -1,
15
+ "model_type": "deberta-v2",
16
+ "norm_rel_ebd": "layer_norm",
17
+ "num_attention_heads": 12,
18
+ "num_hidden_layers": 6,
19
+ "pad_token_id": 0,
20
+ "pooler_dropout": 0,
21
+ "pooler_hidden_act": "gelu",
22
+ "pooler_hidden_size": 768,
23
+ "pos_att_type": [
24
+ "p2c",
25
+ "c2p"
26
+ ],
27
+ "position_biased_input": false,
28
+ "position_buckets": 256,
29
+ "relative_attention": true,
30
+ "share_att_key": true,
31
+ "torch_dtype": "float32",
32
+ "transformers_version": "4.44.2",
33
+ "type_vocab_size": 0,
34
+ "vocab_size": 128100
35
+ }
checkpoint-305/config_sentence_transformers.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "3.2.1",
4
+ "transformers": "4.44.2",
5
+ "pytorch": "2.5.0+cu121"
6
+ },
7
+ "prompts": {},
8
+ "default_prompt_name": null,
9
+ "similarity_fn_name": null
10
+ }
checkpoint-305/modules.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_AdvancedWeightedPooling",
12
+ "type": "__main__.AdvancedWeightedPooling"
13
+ }
14
+ ]
checkpoint-305/optimizer.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b4216ea492e90da5d326e24f088ae9ed8f53c5c1cd07159e79ede7f19608ce70
3
+ size 151305210
checkpoint-305/pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:094972df8cec1361c68660c6023958f3a6d599dd4aa6eb3d97fcf636926c7a61
3
+ size 565251810
checkpoint-305/rng_state.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4c0d2348e42208a7e9d06f1d7141b6a824eaed568a8c8d1acd23d0ef3cb67228
3
+ size 14180
checkpoint-305/scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6d921808ffb17e7f3747de1868df7274036d36213181d5b4bcf1a8abc48c88b9
3
+ size 1256
checkpoint-305/sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 512,
3
+ "do_lower_case": false
4
+ }
checkpoint-305/special_tokens_map.json ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": "[CLS]",
3
+ "cls_token": "[CLS]",
4
+ "eos_token": "[SEP]",
5
+ "mask_token": "[MASK]",
6
+ "pad_token": "[PAD]",
7
+ "sep_token": "[SEP]",
8
+ "unk_token": {
9
+ "content": "[UNK]",
10
+ "lstrip": false,
11
+ "normalized": true,
12
+ "rstrip": false,
13
+ "single_word": false
14
+ }
15
+ }
checkpoint-305/spm.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c679fbf93643d19aab7ee10c0b99e460bdbc02fedf34b92b05af343b4af586fd
3
+ size 2464616
checkpoint-305/tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
checkpoint-305/tokenizer_config.json ADDED
@@ -0,0 +1,58 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "1": {
12
+ "content": "[CLS]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "2": {
20
+ "content": "[SEP]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "3": {
28
+ "content": "[UNK]",
29
+ "lstrip": false,
30
+ "normalized": true,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "128000": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "bos_token": "[CLS]",
45
+ "clean_up_tokenization_spaces": true,
46
+ "cls_token": "[CLS]",
47
+ "do_lower_case": false,
48
+ "eos_token": "[SEP]",
49
+ "mask_token": "[MASK]",
50
+ "model_max_length": 1000000000000000019884624838656,
51
+ "pad_token": "[PAD]",
52
+ "sep_token": "[SEP]",
53
+ "sp_model_kwargs": {},
54
+ "split_by_punct": false,
55
+ "tokenizer_class": "DebertaV2Tokenizer",
56
+ "unk_token": "[UNK]",
57
+ "vocab_type": "spm"
58
+ }
checkpoint-305/trainer_state.json ADDED
@@ -0,0 +1,2257 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": null,
3
+ "best_model_checkpoint": null,
4
+ "epoch": 0.3001968503937008,
5
+ "eval_steps": 153,
6
+ "global_step": 305,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 0.000984251968503937,
13
+ "grad_norm": NaN,
14
+ "learning_rate": 0.0,
15
+ "loss": 10.4072,
16
+ "step": 1
17
+ },
18
+ {
19
+ "epoch": 0.001968503937007874,
20
+ "grad_norm": NaN,
21
+ "learning_rate": 0.0,
22
+ "loss": 11.0865,
23
+ "step": 2
24
+ },
25
+ {
26
+ "epoch": 0.002952755905511811,
27
+ "grad_norm": 60.69786071777344,
28
+ "learning_rate": 9.940357852882705e-10,
29
+ "loss": 9.5114,
30
+ "step": 3
31
+ },
32
+ {
33
+ "epoch": 0.003937007874015748,
34
+ "grad_norm": 59.147647857666016,
35
+ "learning_rate": 1.988071570576541e-09,
36
+ "loss": 9.9584,
37
+ "step": 4
38
+ },
39
+ {
40
+ "epoch": 0.004921259842519685,
41
+ "grad_norm": NaN,
42
+ "learning_rate": 1.988071570576541e-09,
43
+ "loss": 10.068,
44
+ "step": 5
45
+ },
46
+ {
47
+ "epoch": 0.005905511811023622,
48
+ "grad_norm": 65.82320404052734,
49
+ "learning_rate": 2.9821073558648116e-09,
50
+ "loss": 11.0224,
51
+ "step": 6
52
+ },
53
+ {
54
+ "epoch": 0.006889763779527559,
55
+ "grad_norm": 59.096107482910156,
56
+ "learning_rate": 3.976143141153082e-09,
57
+ "loss": 9.7703,
58
+ "step": 7
59
+ },
60
+ {
61
+ "epoch": 0.007874015748031496,
62
+ "grad_norm": 61.43330764770508,
63
+ "learning_rate": 4.970178926441353e-09,
64
+ "loss": 10.5005,
65
+ "step": 8
66
+ },
67
+ {
68
+ "epoch": 0.008858267716535433,
69
+ "grad_norm": 61.25212860107422,
70
+ "learning_rate": 5.964214711729623e-09,
71
+ "loss": 10.1987,
72
+ "step": 9
73
+ },
74
+ {
75
+ "epoch": 0.00984251968503937,
76
+ "grad_norm": 61.6600341796875,
77
+ "learning_rate": 6.9582504970178946e-09,
78
+ "loss": 10.0277,
79
+ "step": 10
80
+ },
81
+ {
82
+ "epoch": 0.010826771653543307,
83
+ "grad_norm": 61.71075439453125,
84
+ "learning_rate": 7.952286282306164e-09,
85
+ "loss": 10.6965,
86
+ "step": 11
87
+ },
88
+ {
89
+ "epoch": 0.011811023622047244,
90
+ "grad_norm": 59.34035873413086,
91
+ "learning_rate": 8.946322067594435e-09,
92
+ "loss": 10.0609,
93
+ "step": 12
94
+ },
95
+ {
96
+ "epoch": 0.012795275590551181,
97
+ "grad_norm": 64.14221954345703,
98
+ "learning_rate": 9.940357852882705e-09,
99
+ "loss": 11.6214,
100
+ "step": 13
101
+ },
102
+ {
103
+ "epoch": 0.013779527559055118,
104
+ "grad_norm": 57.682823181152344,
105
+ "learning_rate": 1.0934393638170978e-08,
106
+ "loss": 9.4053,
107
+ "step": 14
108
+ },
109
+ {
110
+ "epoch": 0.014763779527559055,
111
+ "grad_norm": 62.256858825683594,
112
+ "learning_rate": 1.1928429423459246e-08,
113
+ "loss": 10.4014,
114
+ "step": 15
115
+ },
116
+ {
117
+ "epoch": 0.015748031496062992,
118
+ "grad_norm": 64.56417083740234,
119
+ "learning_rate": 1.2922465208747517e-08,
120
+ "loss": 10.4119,
121
+ "step": 16
122
+ },
123
+ {
124
+ "epoch": 0.01673228346456693,
125
+ "grad_norm": 56.612098693847656,
126
+ "learning_rate": 1.3916500994035789e-08,
127
+ "loss": 9.4658,
128
+ "step": 17
129
+ },
130
+ {
131
+ "epoch": 0.017716535433070866,
132
+ "grad_norm": 60.02314758300781,
133
+ "learning_rate": 1.4910536779324056e-08,
134
+ "loss": 9.2169,
135
+ "step": 18
136
+ },
137
+ {
138
+ "epoch": 0.018700787401574805,
139
+ "grad_norm": 66.38670349121094,
140
+ "learning_rate": 1.590457256461233e-08,
141
+ "loss": 11.2337,
142
+ "step": 19
143
+ },
144
+ {
145
+ "epoch": 0.01968503937007874,
146
+ "grad_norm": 64.9875259399414,
147
+ "learning_rate": 1.68986083499006e-08,
148
+ "loss": 11.0572,
149
+ "step": 20
150
+ },
151
+ {
152
+ "epoch": 0.02066929133858268,
153
+ "grad_norm": 59.36448287963867,
154
+ "learning_rate": 1.789264413518887e-08,
155
+ "loss": 11.0452,
156
+ "step": 21
157
+ },
158
+ {
159
+ "epoch": 0.021653543307086614,
160
+ "grad_norm": 59.18537902832031,
161
+ "learning_rate": 1.888667992047714e-08,
162
+ "loss": 10.31,
163
+ "step": 22
164
+ },
165
+ {
166
+ "epoch": 0.022637795275590553,
167
+ "grad_norm": 54.053524017333984,
168
+ "learning_rate": 1.988071570576541e-08,
169
+ "loss": 9.1395,
170
+ "step": 23
171
+ },
172
+ {
173
+ "epoch": 0.023622047244094488,
174
+ "grad_norm": 49.979618072509766,
175
+ "learning_rate": 2.087475149105368e-08,
176
+ "loss": 8.4201,
177
+ "step": 24
178
+ },
179
+ {
180
+ "epoch": 0.024606299212598427,
181
+ "grad_norm": 51.654518127441406,
182
+ "learning_rate": 2.1868787276341955e-08,
183
+ "loss": 8.6036,
184
+ "step": 25
185
+ },
186
+ {
187
+ "epoch": 0.025590551181102362,
188
+ "grad_norm": 63.55083465576172,
189
+ "learning_rate": 2.2862823061630224e-08,
190
+ "loss": 11.7579,
191
+ "step": 26
192
+ },
193
+ {
194
+ "epoch": 0.0265748031496063,
195
+ "grad_norm": 59.30263137817383,
196
+ "learning_rate": 2.3856858846918493e-08,
197
+ "loss": 10.1307,
198
+ "step": 27
199
+ },
200
+ {
201
+ "epoch": 0.027559055118110236,
202
+ "grad_norm": 50.75270462036133,
203
+ "learning_rate": 2.4850894632206765e-08,
204
+ "loss": 9.2915,
205
+ "step": 28
206
+ },
207
+ {
208
+ "epoch": 0.028543307086614175,
209
+ "grad_norm": 51.7747917175293,
210
+ "learning_rate": 2.5844930417495034e-08,
211
+ "loss": 9.0208,
212
+ "step": 29
213
+ },
214
+ {
215
+ "epoch": 0.02952755905511811,
216
+ "grad_norm": 46.81666564941406,
217
+ "learning_rate": 2.6838966202783303e-08,
218
+ "loss": 8.6867,
219
+ "step": 30
220
+ },
221
+ {
222
+ "epoch": 0.03051181102362205,
223
+ "grad_norm": 44.82905578613281,
224
+ "learning_rate": 2.7833001988071578e-08,
225
+ "loss": 8.0925,
226
+ "step": 31
227
+ },
228
+ {
229
+ "epoch": 0.031496062992125984,
230
+ "grad_norm": 47.148406982421875,
231
+ "learning_rate": 2.8827037773359847e-08,
232
+ "loss": 8.6617,
233
+ "step": 32
234
+ },
235
+ {
236
+ "epoch": 0.03248031496062992,
237
+ "grad_norm": 47.153053283691406,
238
+ "learning_rate": 2.982107355864811e-08,
239
+ "loss": 8.3374,
240
+ "step": 33
241
+ },
242
+ {
243
+ "epoch": 0.03346456692913386,
244
+ "grad_norm": 45.4912223815918,
245
+ "learning_rate": 3.081510934393639e-08,
246
+ "loss": 7.8566,
247
+ "step": 34
248
+ },
249
+ {
250
+ "epoch": 0.0344488188976378,
251
+ "grad_norm": 51.92588806152344,
252
+ "learning_rate": 3.180914512922466e-08,
253
+ "loss": 9.0698,
254
+ "step": 35
255
+ },
256
+ {
257
+ "epoch": 0.03543307086614173,
258
+ "grad_norm": 43.18679428100586,
259
+ "learning_rate": 3.280318091451293e-08,
260
+ "loss": 7.7727,
261
+ "step": 36
262
+ },
263
+ {
264
+ "epoch": 0.03641732283464567,
265
+ "grad_norm": 43.12812805175781,
266
+ "learning_rate": 3.37972166998012e-08,
267
+ "loss": 7.6128,
268
+ "step": 37
269
+ },
270
+ {
271
+ "epoch": 0.03740157480314961,
272
+ "grad_norm": 44.7684211730957,
273
+ "learning_rate": 3.479125248508947e-08,
274
+ "loss": 7.8762,
275
+ "step": 38
276
+ },
277
+ {
278
+ "epoch": 0.038385826771653545,
279
+ "grad_norm": 36.34556198120117,
280
+ "learning_rate": 3.578528827037774e-08,
281
+ "loss": 7.5191,
282
+ "step": 39
283
+ },
284
+ {
285
+ "epoch": 0.03937007874015748,
286
+ "grad_norm": 39.9330940246582,
287
+ "learning_rate": 3.6779324055666005e-08,
288
+ "loss": 7.5638,
289
+ "step": 40
290
+ },
291
+ {
292
+ "epoch": 0.040354330708661415,
293
+ "grad_norm": 39.49993133544922,
294
+ "learning_rate": 3.777335984095428e-08,
295
+ "loss": 7.1878,
296
+ "step": 41
297
+ },
298
+ {
299
+ "epoch": 0.04133858267716536,
300
+ "grad_norm": 34.661251068115234,
301
+ "learning_rate": 3.8767395626242556e-08,
302
+ "loss": 6.8878,
303
+ "step": 42
304
+ },
305
+ {
306
+ "epoch": 0.04232283464566929,
307
+ "grad_norm": 40.655540466308594,
308
+ "learning_rate": 3.976143141153082e-08,
309
+ "loss": 7.5775,
310
+ "step": 43
311
+ },
312
+ {
313
+ "epoch": 0.04330708661417323,
314
+ "grad_norm": 36.12670135498047,
315
+ "learning_rate": 4.0755467196819094e-08,
316
+ "loss": 7.1076,
317
+ "step": 44
318
+ },
319
+ {
320
+ "epoch": 0.04429133858267716,
321
+ "grad_norm": 34.178226470947266,
322
+ "learning_rate": 4.174950298210736e-08,
323
+ "loss": 6.5589,
324
+ "step": 45
325
+ },
326
+ {
327
+ "epoch": 0.045275590551181105,
328
+ "grad_norm": 36.6577033996582,
329
+ "learning_rate": 4.274353876739563e-08,
330
+ "loss": 7.4456,
331
+ "step": 46
332
+ },
333
+ {
334
+ "epoch": 0.04625984251968504,
335
+ "grad_norm": 32.548709869384766,
336
+ "learning_rate": 4.373757455268391e-08,
337
+ "loss": 6.8233,
338
+ "step": 47
339
+ },
340
+ {
341
+ "epoch": 0.047244094488188976,
342
+ "grad_norm": 34.000553131103516,
343
+ "learning_rate": 4.4731610337972176e-08,
344
+ "loss": 6.7633,
345
+ "step": 48
346
+ },
347
+ {
348
+ "epoch": 0.04822834645669291,
349
+ "grad_norm": 32.247859954833984,
350
+ "learning_rate": 4.572564612326045e-08,
351
+ "loss": 6.6024,
352
+ "step": 49
353
+ },
354
+ {
355
+ "epoch": 0.04921259842519685,
356
+ "grad_norm": 28.947786331176758,
357
+ "learning_rate": 4.6719681908548713e-08,
358
+ "loss": 6.2778,
359
+ "step": 50
360
+ },
361
+ {
362
+ "epoch": 0.05019685039370079,
363
+ "grad_norm": 30.279062271118164,
364
+ "learning_rate": 4.7713717693836986e-08,
365
+ "loss": 6.1026,
366
+ "step": 51
367
+ },
368
+ {
369
+ "epoch": 0.051181102362204724,
370
+ "grad_norm": 31.13785171508789,
371
+ "learning_rate": 4.870775347912525e-08,
372
+ "loss": 6.632,
373
+ "step": 52
374
+ },
375
+ {
376
+ "epoch": 0.05216535433070866,
377
+ "grad_norm": 29.648237228393555,
378
+ "learning_rate": 4.970178926441353e-08,
379
+ "loss": 6.6962,
380
+ "step": 53
381
+ },
382
+ {
383
+ "epoch": 0.0531496062992126,
384
+ "grad_norm": 28.224645614624023,
385
+ "learning_rate": 5.06958250497018e-08,
386
+ "loss": 5.8514,
387
+ "step": 54
388
+ },
389
+ {
390
+ "epoch": 0.054133858267716536,
391
+ "grad_norm": 28.693328857421875,
392
+ "learning_rate": 5.168986083499007e-08,
393
+ "loss": 5.9951,
394
+ "step": 55
395
+ },
396
+ {
397
+ "epoch": 0.05511811023622047,
398
+ "grad_norm": 24.777812957763672,
399
+ "learning_rate": 5.268389662027834e-08,
400
+ "loss": 5.4554,
401
+ "step": 56
402
+ },
403
+ {
404
+ "epoch": 0.05610236220472441,
405
+ "grad_norm": 26.11226463317871,
406
+ "learning_rate": 5.3677932405566605e-08,
407
+ "loss": 6.0147,
408
+ "step": 57
409
+ },
410
+ {
411
+ "epoch": 0.05708661417322835,
412
+ "grad_norm": 24.698375701904297,
413
+ "learning_rate": 5.467196819085488e-08,
414
+ "loss": 5.215,
415
+ "step": 58
416
+ },
417
+ {
418
+ "epoch": 0.058070866141732284,
419
+ "grad_norm": 26.616317749023438,
420
+ "learning_rate": 5.5666003976143156e-08,
421
+ "loss": 6.4525,
422
+ "step": 59
423
+ },
424
+ {
425
+ "epoch": 0.05905511811023622,
426
+ "grad_norm": 26.09321403503418,
427
+ "learning_rate": 5.666003976143142e-08,
428
+ "loss": 5.4048,
429
+ "step": 60
430
+ },
431
+ {
432
+ "epoch": 0.060039370078740155,
433
+ "grad_norm": 19.83713150024414,
434
+ "learning_rate": 5.7654075546719694e-08,
435
+ "loss": 5.0424,
436
+ "step": 61
437
+ },
438
+ {
439
+ "epoch": 0.0610236220472441,
440
+ "grad_norm": 26.56923484802246,
441
+ "learning_rate": 5.864811133200796e-08,
442
+ "loss": 6.2646,
443
+ "step": 62
444
+ },
445
+ {
446
+ "epoch": 0.06200787401574803,
447
+ "grad_norm": 23.580089569091797,
448
+ "learning_rate": 5.964214711729623e-08,
449
+ "loss": 5.0847,
450
+ "step": 63
451
+ },
452
+ {
453
+ "epoch": 0.06299212598425197,
454
+ "grad_norm": 23.453126907348633,
455
+ "learning_rate": 6.06361829025845e-08,
456
+ "loss": 5.4415,
457
+ "step": 64
458
+ },
459
+ {
460
+ "epoch": 0.0639763779527559,
461
+ "grad_norm": 21.229190826416016,
462
+ "learning_rate": 6.163021868787278e-08,
463
+ "loss": 5.2469,
464
+ "step": 65
465
+ },
466
+ {
467
+ "epoch": 0.06496062992125984,
468
+ "grad_norm": 19.477190017700195,
469
+ "learning_rate": 6.262425447316104e-08,
470
+ "loss": 5.1378,
471
+ "step": 66
472
+ },
473
+ {
474
+ "epoch": 0.06594488188976377,
475
+ "grad_norm": 19.40647315979004,
476
+ "learning_rate": 6.361829025844931e-08,
477
+ "loss": 5.1636,
478
+ "step": 67
479
+ },
480
+ {
481
+ "epoch": 0.06692913385826772,
482
+ "grad_norm": 22.20977210998535,
483
+ "learning_rate": 6.461232604373759e-08,
484
+ "loss": 5.5596,
485
+ "step": 68
486
+ },
487
+ {
488
+ "epoch": 0.06791338582677166,
489
+ "grad_norm": 19.186826705932617,
490
+ "learning_rate": 6.560636182902586e-08,
491
+ "loss": 4.9508,
492
+ "step": 69
493
+ },
494
+ {
495
+ "epoch": 0.0688976377952756,
496
+ "grad_norm": 20.190908432006836,
497
+ "learning_rate": 6.660039761431412e-08,
498
+ "loss": 5.2355,
499
+ "step": 70
500
+ },
501
+ {
502
+ "epoch": 0.06988188976377953,
503
+ "grad_norm": 18.122196197509766,
504
+ "learning_rate": 6.75944333996024e-08,
505
+ "loss": 4.7359,
506
+ "step": 71
507
+ },
508
+ {
509
+ "epoch": 0.07086614173228346,
510
+ "grad_norm": 17.524765014648438,
511
+ "learning_rate": 6.858846918489067e-08,
512
+ "loss": 4.8947,
513
+ "step": 72
514
+ },
515
+ {
516
+ "epoch": 0.0718503937007874,
517
+ "grad_norm": 18.821767807006836,
518
+ "learning_rate": 6.958250497017893e-08,
519
+ "loss": 4.6269,
520
+ "step": 73
521
+ },
522
+ {
523
+ "epoch": 0.07283464566929133,
524
+ "grad_norm": 18.19922637939453,
525
+ "learning_rate": 7.057654075546721e-08,
526
+ "loss": 4.6072,
527
+ "step": 74
528
+ },
529
+ {
530
+ "epoch": 0.07381889763779527,
531
+ "grad_norm": 16.908899307250977,
532
+ "learning_rate": 7.157057654075548e-08,
533
+ "loss": 4.9125,
534
+ "step": 75
535
+ },
536
+ {
537
+ "epoch": 0.07480314960629922,
538
+ "grad_norm": 19.90263557434082,
539
+ "learning_rate": 7.256461232604374e-08,
540
+ "loss": 4.5856,
541
+ "step": 76
542
+ },
543
+ {
544
+ "epoch": 0.07578740157480315,
545
+ "grad_norm": 17.92584800720215,
546
+ "learning_rate": 7.355864811133201e-08,
547
+ "loss": 4.7879,
548
+ "step": 77
549
+ },
550
+ {
551
+ "epoch": 0.07677165354330709,
552
+ "grad_norm": 16.29261589050293,
553
+ "learning_rate": 7.455268389662029e-08,
554
+ "loss": 4.5348,
555
+ "step": 78
556
+ },
557
+ {
558
+ "epoch": 0.07775590551181102,
559
+ "grad_norm": 16.3350887298584,
560
+ "learning_rate": 7.554671968190855e-08,
561
+ "loss": 4.3554,
562
+ "step": 79
563
+ },
564
+ {
565
+ "epoch": 0.07874015748031496,
566
+ "grad_norm": 14.408184051513672,
567
+ "learning_rate": 7.654075546719683e-08,
568
+ "loss": 4.2984,
569
+ "step": 80
570
+ },
571
+ {
572
+ "epoch": 0.0797244094488189,
573
+ "grad_norm": 16.71326446533203,
574
+ "learning_rate": 7.753479125248511e-08,
575
+ "loss": 4.5505,
576
+ "step": 81
577
+ },
578
+ {
579
+ "epoch": 0.08070866141732283,
580
+ "grad_norm": 14.62590217590332,
581
+ "learning_rate": 7.852882703777338e-08,
582
+ "loss": 4.5325,
583
+ "step": 82
584
+ },
585
+ {
586
+ "epoch": 0.08169291338582677,
587
+ "grad_norm": 17.189268112182617,
588
+ "learning_rate": 7.952286282306164e-08,
589
+ "loss": 4.2725,
590
+ "step": 83
591
+ },
592
+ {
593
+ "epoch": 0.08267716535433071,
594
+ "grad_norm": 16.960248947143555,
595
+ "learning_rate": 8.051689860834992e-08,
596
+ "loss": 4.3054,
597
+ "step": 84
598
+ },
599
+ {
600
+ "epoch": 0.08366141732283465,
601
+ "grad_norm": 15.114398956298828,
602
+ "learning_rate": 8.151093439363819e-08,
603
+ "loss": 4.5536,
604
+ "step": 85
605
+ },
606
+ {
607
+ "epoch": 0.08464566929133858,
608
+ "grad_norm": 16.153371810913086,
609
+ "learning_rate": 8.250497017892645e-08,
610
+ "loss": 4.0265,
611
+ "step": 86
612
+ },
613
+ {
614
+ "epoch": 0.08562992125984252,
615
+ "grad_norm": 15.731820106506348,
616
+ "learning_rate": 8.349900596421472e-08,
617
+ "loss": 4.7453,
618
+ "step": 87
619
+ },
620
+ {
621
+ "epoch": 0.08661417322834646,
622
+ "grad_norm": 14.69382381439209,
623
+ "learning_rate": 8.4493041749503e-08,
624
+ "loss": 4.071,
625
+ "step": 88
626
+ },
627
+ {
628
+ "epoch": 0.08759842519685039,
629
+ "grad_norm": 13.735575675964355,
630
+ "learning_rate": 8.548707753479126e-08,
631
+ "loss": 4.1582,
632
+ "step": 89
633
+ },
634
+ {
635
+ "epoch": 0.08858267716535433,
636
+ "grad_norm": 16.017065048217773,
637
+ "learning_rate": 8.648111332007953e-08,
638
+ "loss": 4.1131,
639
+ "step": 90
640
+ },
641
+ {
642
+ "epoch": 0.08956692913385826,
643
+ "grad_norm": 17.237276077270508,
644
+ "learning_rate": 8.747514910536782e-08,
645
+ "loss": 3.6582,
646
+ "step": 91
647
+ },
648
+ {
649
+ "epoch": 0.09055118110236221,
650
+ "grad_norm": 15.59334945678711,
651
+ "learning_rate": 8.846918489065609e-08,
652
+ "loss": 4.143,
653
+ "step": 92
654
+ },
655
+ {
656
+ "epoch": 0.09153543307086615,
657
+ "grad_norm": 14.918270111083984,
658
+ "learning_rate": 8.946322067594435e-08,
659
+ "loss": 4.2273,
660
+ "step": 93
661
+ },
662
+ {
663
+ "epoch": 0.09251968503937008,
664
+ "grad_norm": 14.899909019470215,
665
+ "learning_rate": 9.045725646123262e-08,
666
+ "loss": 3.9321,
667
+ "step": 94
668
+ },
669
+ {
670
+ "epoch": 0.09350393700787402,
671
+ "grad_norm": 18.112892150878906,
672
+ "learning_rate": 9.14512922465209e-08,
673
+ "loss": 4.2061,
674
+ "step": 95
675
+ },
676
+ {
677
+ "epoch": 0.09448818897637795,
678
+ "grad_norm": 15.854629516601562,
679
+ "learning_rate": 9.244532803180916e-08,
680
+ "loss": 4.1042,
681
+ "step": 96
682
+ },
683
+ {
684
+ "epoch": 0.09547244094488189,
685
+ "grad_norm": 16.44801139831543,
686
+ "learning_rate": 9.343936381709743e-08,
687
+ "loss": 3.9513,
688
+ "step": 97
689
+ },
690
+ {
691
+ "epoch": 0.09645669291338582,
692
+ "grad_norm": 14.854127883911133,
693
+ "learning_rate": 9.44333996023857e-08,
694
+ "loss": 3.8627,
695
+ "step": 98
696
+ },
697
+ {
698
+ "epoch": 0.09744094488188976,
699
+ "grad_norm": 17.02035903930664,
700
+ "learning_rate": 9.542743538767397e-08,
701
+ "loss": 4.3613,
702
+ "step": 99
703
+ },
704
+ {
705
+ "epoch": 0.0984251968503937,
706
+ "grad_norm": 15.354338645935059,
707
+ "learning_rate": 9.642147117296224e-08,
708
+ "loss": 3.8513,
709
+ "step": 100
710
+ },
711
+ {
712
+ "epoch": 0.09940944881889764,
713
+ "grad_norm": 16.35565757751465,
714
+ "learning_rate": 9.74155069582505e-08,
715
+ "loss": 3.5866,
716
+ "step": 101
717
+ },
718
+ {
719
+ "epoch": 0.10039370078740158,
720
+ "grad_norm": 16.66194725036621,
721
+ "learning_rate": 9.840954274353878e-08,
722
+ "loss": 3.5239,
723
+ "step": 102
724
+ },
725
+ {
726
+ "epoch": 0.10137795275590551,
727
+ "grad_norm": 15.875446319580078,
728
+ "learning_rate": 9.940357852882706e-08,
729
+ "loss": 3.5921,
730
+ "step": 103
731
+ },
732
+ {
733
+ "epoch": 0.10236220472440945,
734
+ "grad_norm": 14.344114303588867,
735
+ "learning_rate": 1.0039761431411533e-07,
736
+ "loss": 3.5962,
737
+ "step": 104
738
+ },
739
+ {
740
+ "epoch": 0.10334645669291338,
741
+ "grad_norm": 18.503963470458984,
742
+ "learning_rate": 1.013916500994036e-07,
743
+ "loss": 4.0001,
744
+ "step": 105
745
+ },
746
+ {
747
+ "epoch": 0.10433070866141732,
748
+ "grad_norm": 16.944435119628906,
749
+ "learning_rate": 1.0238568588469187e-07,
750
+ "loss": 4.1374,
751
+ "step": 106
752
+ },
753
+ {
754
+ "epoch": 0.10531496062992125,
755
+ "grad_norm": 16.46833038330078,
756
+ "learning_rate": 1.0337972166998014e-07,
757
+ "loss": 3.9049,
758
+ "step": 107
759
+ },
760
+ {
761
+ "epoch": 0.1062992125984252,
762
+ "grad_norm": 14.921700477600098,
763
+ "learning_rate": 1.043737574552684e-07,
764
+ "loss": 3.2511,
765
+ "step": 108
766
+ },
767
+ {
768
+ "epoch": 0.10728346456692914,
769
+ "grad_norm": 15.574972152709961,
770
+ "learning_rate": 1.0536779324055668e-07,
771
+ "loss": 3.2479,
772
+ "step": 109
773
+ },
774
+ {
775
+ "epoch": 0.10826771653543307,
776
+ "grad_norm": 16.810884475708008,
777
+ "learning_rate": 1.0636182902584495e-07,
778
+ "loss": 3.6414,
779
+ "step": 110
780
+ },
781
+ {
782
+ "epoch": 0.10925196850393701,
783
+ "grad_norm": 17.074661254882812,
784
+ "learning_rate": 1.0735586481113321e-07,
785
+ "loss": 3.6429,
786
+ "step": 111
787
+ },
788
+ {
789
+ "epoch": 0.11023622047244094,
790
+ "grad_norm": 18.52947235107422,
791
+ "learning_rate": 1.0834990059642149e-07,
792
+ "loss": 3.423,
793
+ "step": 112
794
+ },
795
+ {
796
+ "epoch": 0.11122047244094488,
797
+ "grad_norm": 18.681869506835938,
798
+ "learning_rate": 1.0934393638170976e-07,
799
+ "loss": 3.4967,
800
+ "step": 113
801
+ },
802
+ {
803
+ "epoch": 0.11220472440944881,
804
+ "grad_norm": 20.385292053222656,
805
+ "learning_rate": 1.1033797216699802e-07,
806
+ "loss": 3.7649,
807
+ "step": 114
808
+ },
809
+ {
810
+ "epoch": 0.11318897637795275,
811
+ "grad_norm": 18.912538528442383,
812
+ "learning_rate": 1.1133200795228631e-07,
813
+ "loss": 3.2845,
814
+ "step": 115
815
+ },
816
+ {
817
+ "epoch": 0.1141732283464567,
818
+ "grad_norm": 17.856229782104492,
819
+ "learning_rate": 1.1232604373757458e-07,
820
+ "loss": 3.356,
821
+ "step": 116
822
+ },
823
+ {
824
+ "epoch": 0.11515748031496063,
825
+ "grad_norm": 17.08562469482422,
826
+ "learning_rate": 1.1332007952286284e-07,
827
+ "loss": 3.2086,
828
+ "step": 117
829
+ },
830
+ {
831
+ "epoch": 0.11614173228346457,
832
+ "grad_norm": 17.54237937927246,
833
+ "learning_rate": 1.1431411530815111e-07,
834
+ "loss": 3.5561,
835
+ "step": 118
836
+ },
837
+ {
838
+ "epoch": 0.1171259842519685,
839
+ "grad_norm": 19.936498641967773,
840
+ "learning_rate": 1.1530815109343939e-07,
841
+ "loss": 3.7353,
842
+ "step": 119
843
+ },
844
+ {
845
+ "epoch": 0.11811023622047244,
846
+ "grad_norm": 17.135496139526367,
847
+ "learning_rate": 1.1630218687872765e-07,
848
+ "loss": 3.403,
849
+ "step": 120
850
+ },
851
+ {
852
+ "epoch": 0.11909448818897637,
853
+ "grad_norm": 17.260093688964844,
854
+ "learning_rate": 1.1729622266401592e-07,
855
+ "loss": 3.1009,
856
+ "step": 121
857
+ },
858
+ {
859
+ "epoch": 0.12007874015748031,
860
+ "grad_norm": 17.075611114501953,
861
+ "learning_rate": 1.182902584493042e-07,
862
+ "loss": 3.2139,
863
+ "step": 122
864
+ },
865
+ {
866
+ "epoch": 0.12106299212598425,
867
+ "grad_norm": 23.433874130249023,
868
+ "learning_rate": 1.1928429423459245e-07,
869
+ "loss": 3.3339,
870
+ "step": 123
871
+ },
872
+ {
873
+ "epoch": 0.1220472440944882,
874
+ "grad_norm": 18.25501251220703,
875
+ "learning_rate": 1.2027833001988073e-07,
876
+ "loss": 2.9464,
877
+ "step": 124
878
+ },
879
+ {
880
+ "epoch": 0.12303149606299213,
881
+ "grad_norm": 18.079578399658203,
882
+ "learning_rate": 1.21272365805169e-07,
883
+ "loss": 3.3366,
884
+ "step": 125
885
+ },
886
+ {
887
+ "epoch": 0.12401574803149606,
888
+ "grad_norm": 16.392736434936523,
889
+ "learning_rate": 1.222664015904573e-07,
890
+ "loss": 3.0618,
891
+ "step": 126
892
+ },
893
+ {
894
+ "epoch": 0.125,
895
+ "grad_norm": 15.782499313354492,
896
+ "learning_rate": 1.2326043737574557e-07,
897
+ "loss": 3.0092,
898
+ "step": 127
899
+ },
900
+ {
901
+ "epoch": 0.12598425196850394,
902
+ "grad_norm": 14.74819278717041,
903
+ "learning_rate": 1.2425447316103382e-07,
904
+ "loss": 2.7152,
905
+ "step": 128
906
+ },
907
+ {
908
+ "epoch": 0.12696850393700787,
909
+ "grad_norm": 17.743946075439453,
910
+ "learning_rate": 1.2524850894632207e-07,
911
+ "loss": 2.9423,
912
+ "step": 129
913
+ },
914
+ {
915
+ "epoch": 0.1279527559055118,
916
+ "grad_norm": 15.759135246276855,
917
+ "learning_rate": 1.2624254473161035e-07,
918
+ "loss": 2.6569,
919
+ "step": 130
920
+ },
921
+ {
922
+ "epoch": 0.12893700787401574,
923
+ "grad_norm": 18.54253387451172,
924
+ "learning_rate": 1.2723658051689863e-07,
925
+ "loss": 2.8469,
926
+ "step": 131
927
+ },
928
+ {
929
+ "epoch": 0.12992125984251968,
930
+ "grad_norm": 18.318775177001953,
931
+ "learning_rate": 1.282306163021869e-07,
932
+ "loss": 2.9089,
933
+ "step": 132
934
+ },
935
+ {
936
+ "epoch": 0.1309055118110236,
937
+ "grad_norm": 17.751266479492188,
938
+ "learning_rate": 1.2922465208747519e-07,
939
+ "loss": 2.5809,
940
+ "step": 133
941
+ },
942
+ {
943
+ "epoch": 0.13188976377952755,
944
+ "grad_norm": 21.29975128173828,
945
+ "learning_rate": 1.3021868787276344e-07,
946
+ "loss": 2.6987,
947
+ "step": 134
948
+ },
949
+ {
950
+ "epoch": 0.1328740157480315,
951
+ "grad_norm": 19.9519100189209,
952
+ "learning_rate": 1.3121272365805172e-07,
953
+ "loss": 3.2518,
954
+ "step": 135
955
+ },
956
+ {
957
+ "epoch": 0.13385826771653545,
958
+ "grad_norm": 19.99887466430664,
959
+ "learning_rate": 1.3220675944333997e-07,
960
+ "loss": 2.9145,
961
+ "step": 136
962
+ },
963
+ {
964
+ "epoch": 0.13484251968503938,
965
+ "grad_norm": 17.0355167388916,
966
+ "learning_rate": 1.3320079522862825e-07,
967
+ "loss": 2.4809,
968
+ "step": 137
969
+ },
970
+ {
971
+ "epoch": 0.13582677165354332,
972
+ "grad_norm": 18.628467559814453,
973
+ "learning_rate": 1.3419483101391653e-07,
974
+ "loss": 2.8264,
975
+ "step": 138
976
+ },
977
+ {
978
+ "epoch": 0.13681102362204725,
979
+ "grad_norm": 19.511394500732422,
980
+ "learning_rate": 1.351888667992048e-07,
981
+ "loss": 2.5724,
982
+ "step": 139
983
+ },
984
+ {
985
+ "epoch": 0.1377952755905512,
986
+ "grad_norm": 21.264480590820312,
987
+ "learning_rate": 1.3618290258449306e-07,
988
+ "loss": 2.6949,
989
+ "step": 140
990
+ },
991
+ {
992
+ "epoch": 0.13877952755905512,
993
+ "grad_norm": 19.76134490966797,
994
+ "learning_rate": 1.3717693836978134e-07,
995
+ "loss": 2.6925,
996
+ "step": 141
997
+ },
998
+ {
999
+ "epoch": 0.13976377952755906,
1000
+ "grad_norm": 20.930673599243164,
1001
+ "learning_rate": 1.381709741550696e-07,
1002
+ "loss": 2.9311,
1003
+ "step": 142
1004
+ },
1005
+ {
1006
+ "epoch": 0.140748031496063,
1007
+ "grad_norm": 21.966018676757812,
1008
+ "learning_rate": 1.3916500994035787e-07,
1009
+ "loss": 2.5667,
1010
+ "step": 143
1011
+ },
1012
+ {
1013
+ "epoch": 0.14173228346456693,
1014
+ "grad_norm": 21.916505813598633,
1015
+ "learning_rate": 1.4015904572564615e-07,
1016
+ "loss": 3.2471,
1017
+ "step": 144
1018
+ },
1019
+ {
1020
+ "epoch": 0.14271653543307086,
1021
+ "grad_norm": 20.081771850585938,
1022
+ "learning_rate": 1.4115308151093443e-07,
1023
+ "loss": 2.2441,
1024
+ "step": 145
1025
+ },
1026
+ {
1027
+ "epoch": 0.1437007874015748,
1028
+ "grad_norm": 22.893489837646484,
1029
+ "learning_rate": 1.421471172962227e-07,
1030
+ "loss": 2.75,
1031
+ "step": 146
1032
+ },
1033
+ {
1034
+ "epoch": 0.14468503937007873,
1035
+ "grad_norm": 23.95358657836914,
1036
+ "learning_rate": 1.4314115308151096e-07,
1037
+ "loss": 2.9669,
1038
+ "step": 147
1039
+ },
1040
+ {
1041
+ "epoch": 0.14566929133858267,
1042
+ "grad_norm": 21.101062774658203,
1043
+ "learning_rate": 1.4413518886679924e-07,
1044
+ "loss": 2.736,
1045
+ "step": 148
1046
+ },
1047
+ {
1048
+ "epoch": 0.1466535433070866,
1049
+ "grad_norm": 25.240341186523438,
1050
+ "learning_rate": 1.451292246520875e-07,
1051
+ "loss": 3.104,
1052
+ "step": 149
1053
+ },
1054
+ {
1055
+ "epoch": 0.14763779527559054,
1056
+ "grad_norm": 18.358688354492188,
1057
+ "learning_rate": 1.4612326043737577e-07,
1058
+ "loss": 2.2175,
1059
+ "step": 150
1060
+ },
1061
+ {
1062
+ "epoch": 0.1486220472440945,
1063
+ "grad_norm": 21.986661911010742,
1064
+ "learning_rate": 1.4711729622266402e-07,
1065
+ "loss": 2.7415,
1066
+ "step": 151
1067
+ },
1068
+ {
1069
+ "epoch": 0.14960629921259844,
1070
+ "grad_norm": 20.64093017578125,
1071
+ "learning_rate": 1.4811133200795232e-07,
1072
+ "loss": 1.8707,
1073
+ "step": 152
1074
+ },
1075
+ {
1076
+ "epoch": 0.15059055118110237,
1077
+ "grad_norm": 20.602142333984375,
1078
+ "learning_rate": 1.4910536779324058e-07,
1079
+ "loss": 2.5961,
1080
+ "step": 153
1081
+ },
1082
+ {
1083
+ "epoch": 0.15059055118110237,
1084
+ "eval_Qnli-dev_cosine_accuracy": 0.634765625,
1085
+ "eval_Qnli-dev_cosine_accuracy_threshold": 0.8508153557777405,
1086
+ "eval_Qnli-dev_cosine_ap": 0.6461335447626624,
1087
+ "eval_Qnli-dev_cosine_f1": 0.6505636070853462,
1088
+ "eval_Qnli-dev_cosine_f1_threshold": 0.7770615816116333,
1089
+ "eval_Qnli-dev_cosine_precision": 0.5246753246753246,
1090
+ "eval_Qnli-dev_cosine_recall": 0.8559322033898306,
1091
+ "eval_Qnli-dev_dot_accuracy": 0.634765625,
1092
+ "eval_Qnli-dev_dot_accuracy_threshold": 653.7443237304688,
1093
+ "eval_Qnli-dev_dot_ap": 0.6461682282377894,
1094
+ "eval_Qnli-dev_dot_f1": 0.6505636070853462,
1095
+ "eval_Qnli-dev_dot_f1_threshold": 597.0731811523438,
1096
+ "eval_Qnli-dev_dot_precision": 0.5246753246753246,
1097
+ "eval_Qnli-dev_dot_recall": 0.8559322033898306,
1098
+ "eval_Qnli-dev_euclidean_accuracy": 0.634765625,
1099
+ "eval_Qnli-dev_euclidean_accuracy_threshold": 15.141305923461914,
1100
+ "eval_Qnli-dev_euclidean_ap": 0.6461382925406688,
1101
+ "eval_Qnli-dev_euclidean_f1": 0.6505636070853462,
1102
+ "eval_Qnli-dev_euclidean_f1_threshold": 18.50943946838379,
1103
+ "eval_Qnli-dev_euclidean_precision": 0.5246753246753246,
1104
+ "eval_Qnli-dev_euclidean_recall": 0.8559322033898306,
1105
+ "eval_Qnli-dev_manhattan_accuracy": 0.6328125,
1106
+ "eval_Qnli-dev_manhattan_accuracy_threshold": 331.46282958984375,
1107
+ "eval_Qnli-dev_manhattan_ap": 0.6431949026371255,
1108
+ "eval_Qnli-dev_manhattan_f1": 0.6501650165016502,
1109
+ "eval_Qnli-dev_manhattan_f1_threshold": 404.6050109863281,
1110
+ "eval_Qnli-dev_manhattan_precision": 0.5324324324324324,
1111
+ "eval_Qnli-dev_manhattan_recall": 0.8347457627118644,
1112
+ "eval_Qnli-dev_max_accuracy": 0.634765625,
1113
+ "eval_Qnli-dev_max_accuracy_threshold": 653.7443237304688,
1114
+ "eval_Qnli-dev_max_ap": 0.6461682282377894,
1115
+ "eval_Qnli-dev_max_f1": 0.6505636070853462,
1116
+ "eval_Qnli-dev_max_f1_threshold": 597.0731811523438,
1117
+ "eval_Qnli-dev_max_precision": 0.5324324324324324,
1118
+ "eval_Qnli-dev_max_recall": 0.8559322033898306,
1119
+ "eval_allNLI-dev_cosine_accuracy": 0.67578125,
1120
+ "eval_allNLI-dev_cosine_accuracy_threshold": 0.9452645182609558,
1121
+ "eval_allNLI-dev_cosine_ap": 0.4264736612515921,
1122
+ "eval_allNLI-dev_cosine_f1": 0.512,
1123
+ "eval_allNLI-dev_cosine_f1_threshold": 0.8565204739570618,
1124
+ "eval_allNLI-dev_cosine_precision": 0.39143730886850153,
1125
+ "eval_allNLI-dev_cosine_recall": 0.7398843930635838,
1126
+ "eval_allNLI-dev_dot_accuracy": 0.67578125,
1127
+ "eval_allNLI-dev_dot_accuracy_threshold": 726.30615234375,
1128
+ "eval_allNLI-dev_dot_ap": 0.42647535250956575,
1129
+ "eval_allNLI-dev_dot_f1": 0.512,
1130
+ "eval_allNLI-dev_dot_f1_threshold": 658.1103515625,
1131
+ "eval_allNLI-dev_dot_precision": 0.39143730886850153,
1132
+ "eval_allNLI-dev_dot_recall": 0.7398843930635838,
1133
+ "eval_allNLI-dev_euclidean_accuracy": 0.67578125,
1134
+ "eval_allNLI-dev_euclidean_accuracy_threshold": 9.171283721923828,
1135
+ "eval_allNLI-dev_euclidean_ap": 0.4264736612515921,
1136
+ "eval_allNLI-dev_euclidean_f1": 0.512,
1137
+ "eval_allNLI-dev_euclidean_f1_threshold": 14.84876823425293,
1138
+ "eval_allNLI-dev_euclidean_precision": 0.39143730886850153,
1139
+ "eval_allNLI-dev_euclidean_recall": 0.7398843930635838,
1140
+ "eval_allNLI-dev_manhattan_accuracy": 0.67578125,
1141
+ "eval_allNLI-dev_manhattan_accuracy_threshold": 201.49061584472656,
1142
+ "eval_allNLI-dev_manhattan_ap": 0.4252213828672732,
1143
+ "eval_allNLI-dev_manhattan_f1": 0.5107692307692308,
1144
+ "eval_allNLI-dev_manhattan_f1_threshold": 417.52728271484375,
1145
+ "eval_allNLI-dev_manhattan_precision": 0.3480083857442348,
1146
+ "eval_allNLI-dev_manhattan_recall": 0.9595375722543352,
1147
+ "eval_allNLI-dev_max_accuracy": 0.67578125,
1148
+ "eval_allNLI-dev_max_accuracy_threshold": 726.30615234375,
1149
+ "eval_allNLI-dev_max_ap": 0.42647535250956575,
1150
+ "eval_allNLI-dev_max_f1": 0.512,
1151
+ "eval_allNLI-dev_max_f1_threshold": 658.1103515625,
1152
+ "eval_allNLI-dev_max_precision": 0.39143730886850153,
1153
+ "eval_allNLI-dev_max_recall": 0.9595375722543352,
1154
+ "eval_loss": 2.2652623653411865,
1155
+ "eval_runtime": 50.7627,
1156
+ "eval_samples_per_second": 32.78,
1157
+ "eval_sequential_score": 0.6461682282377894,
1158
+ "eval_steps_per_second": 0.138,
1159
+ "eval_sts-test_pearson_cosine": 0.2749904272806095,
1160
+ "eval_sts-test_pearson_dot": 0.27496363262371837,
1161
+ "eval_sts-test_pearson_euclidean": 0.2934483033082174,
1162
+ "eval_sts-test_pearson_manhattan": 0.2923996087310511,
1163
+ "eval_sts-test_pearson_max": 0.2934483033082174,
1164
+ "eval_sts-test_spearman_cosine": 0.31159390381099095,
1165
+ "eval_sts-test_spearman_dot": 0.31138581044552094,
1166
+ "eval_sts-test_spearman_euclidean": 0.3115817314678925,
1167
+ "eval_sts-test_spearman_manhattan": 0.3095556181083969,
1168
+ "eval_sts-test_spearman_max": 0.31159390381099095,
1169
+ "step": 153
1170
+ },
1171
+ {
1172
+ "epoch": 0.1515748031496063,
1173
+ "grad_norm": 22.330442428588867,
1174
+ "learning_rate": 1.5009940357852886e-07,
1175
+ "loss": 3.1149,
1176
+ "step": 154
1177
+ },
1178
+ {
1179
+ "epoch": 0.15255905511811024,
1180
+ "grad_norm": 23.656953811645508,
1181
+ "learning_rate": 1.510934393638171e-07,
1182
+ "loss": 2.2976,
1183
+ "step": 155
1184
+ },
1185
+ {
1186
+ "epoch": 0.15354330708661418,
1187
+ "grad_norm": 20.271608352661133,
1188
+ "learning_rate": 1.5208747514910539e-07,
1189
+ "loss": 2.4436,
1190
+ "step": 156
1191
+ },
1192
+ {
1193
+ "epoch": 0.1545275590551181,
1194
+ "grad_norm": 25.410293579101562,
1195
+ "learning_rate": 1.5308151093439367e-07,
1196
+ "loss": 2.8826,
1197
+ "step": 157
1198
+ },
1199
+ {
1200
+ "epoch": 0.15551181102362205,
1201
+ "grad_norm": 23.772783279418945,
1202
+ "learning_rate": 1.5407554671968192e-07,
1203
+ "loss": 2.3664,
1204
+ "step": 158
1205
+ },
1206
+ {
1207
+ "epoch": 0.15649606299212598,
1208
+ "grad_norm": 23.44937515258789,
1209
+ "learning_rate": 1.5506958250497022e-07,
1210
+ "loss": 2.2485,
1211
+ "step": 159
1212
+ },
1213
+ {
1214
+ "epoch": 0.15748031496062992,
1215
+ "grad_norm": 23.024261474609375,
1216
+ "learning_rate": 1.5606361829025848e-07,
1217
+ "loss": 2.5167,
1218
+ "step": 160
1219
+ },
1220
+ {
1221
+ "epoch": 0.15846456692913385,
1222
+ "grad_norm": 20.63090705871582,
1223
+ "learning_rate": 1.5705765407554675e-07,
1224
+ "loss": 1.7183,
1225
+ "step": 161
1226
+ },
1227
+ {
1228
+ "epoch": 0.1594488188976378,
1229
+ "grad_norm": 21.573190689086914,
1230
+ "learning_rate": 1.58051689860835e-07,
1231
+ "loss": 2.1003,
1232
+ "step": 162
1233
+ },
1234
+ {
1235
+ "epoch": 0.16043307086614172,
1236
+ "grad_norm": 23.774974822998047,
1237
+ "learning_rate": 1.5904572564612329e-07,
1238
+ "loss": 2.5785,
1239
+ "step": 163
1240
+ },
1241
+ {
1242
+ "epoch": 0.16141732283464566,
1243
+ "grad_norm": 27.151123046875,
1244
+ "learning_rate": 1.6003976143141154e-07,
1245
+ "loss": 2.8789,
1246
+ "step": 164
1247
+ },
1248
+ {
1249
+ "epoch": 0.1624015748031496,
1250
+ "grad_norm": 23.50958824157715,
1251
+ "learning_rate": 1.6103379721669984e-07,
1252
+ "loss": 2.3425,
1253
+ "step": 165
1254
+ },
1255
+ {
1256
+ "epoch": 0.16338582677165353,
1257
+ "grad_norm": 23.77661895751953,
1258
+ "learning_rate": 1.620278330019881e-07,
1259
+ "loss": 2.0966,
1260
+ "step": 166
1261
+ },
1262
+ {
1263
+ "epoch": 0.1643700787401575,
1264
+ "grad_norm": 22.070526123046875,
1265
+ "learning_rate": 1.6302186878727637e-07,
1266
+ "loss": 2.1126,
1267
+ "step": 167
1268
+ },
1269
+ {
1270
+ "epoch": 0.16535433070866143,
1271
+ "grad_norm": 22.653602600097656,
1272
+ "learning_rate": 1.6401590457256465e-07,
1273
+ "loss": 2.1824,
1274
+ "step": 168
1275
+ },
1276
+ {
1277
+ "epoch": 0.16633858267716536,
1278
+ "grad_norm": 21.470808029174805,
1279
+ "learning_rate": 1.650099403578529e-07,
1280
+ "loss": 2.2009,
1281
+ "step": 169
1282
+ },
1283
+ {
1284
+ "epoch": 0.1673228346456693,
1285
+ "grad_norm": 25.822694778442383,
1286
+ "learning_rate": 1.6600397614314118e-07,
1287
+ "loss": 2.3796,
1288
+ "step": 170
1289
+ },
1290
+ {
1291
+ "epoch": 0.16830708661417323,
1292
+ "grad_norm": 22.609458923339844,
1293
+ "learning_rate": 1.6699801192842944e-07,
1294
+ "loss": 2.3096,
1295
+ "step": 171
1296
+ },
1297
+ {
1298
+ "epoch": 0.16929133858267717,
1299
+ "grad_norm": 24.10075569152832,
1300
+ "learning_rate": 1.6799204771371774e-07,
1301
+ "loss": 2.7897,
1302
+ "step": 172
1303
+ },
1304
+ {
1305
+ "epoch": 0.1702755905511811,
1306
+ "grad_norm": 22.21641731262207,
1307
+ "learning_rate": 1.68986083499006e-07,
1308
+ "loss": 2.2097,
1309
+ "step": 173
1310
+ },
1311
+ {
1312
+ "epoch": 0.17125984251968504,
1313
+ "grad_norm": 17.717933654785156,
1314
+ "learning_rate": 1.6998011928429427e-07,
1315
+ "loss": 1.7508,
1316
+ "step": 174
1317
+ },
1318
+ {
1319
+ "epoch": 0.17224409448818898,
1320
+ "grad_norm": 22.352798461914062,
1321
+ "learning_rate": 1.7097415506958253e-07,
1322
+ "loss": 2.353,
1323
+ "step": 175
1324
+ },
1325
+ {
1326
+ "epoch": 0.1732283464566929,
1327
+ "grad_norm": 23.421472549438477,
1328
+ "learning_rate": 1.719681908548708e-07,
1329
+ "loss": 2.4276,
1330
+ "step": 176
1331
+ },
1332
+ {
1333
+ "epoch": 0.17421259842519685,
1334
+ "grad_norm": 20.41706657409668,
1335
+ "learning_rate": 1.7296222664015906e-07,
1336
+ "loss": 2.1016,
1337
+ "step": 177
1338
+ },
1339
+ {
1340
+ "epoch": 0.17519685039370078,
1341
+ "grad_norm": 19.39253807067871,
1342
+ "learning_rate": 1.7395626242544734e-07,
1343
+ "loss": 1.8461,
1344
+ "step": 178
1345
+ },
1346
+ {
1347
+ "epoch": 0.17618110236220472,
1348
+ "grad_norm": 19.994935989379883,
1349
+ "learning_rate": 1.7495029821073564e-07,
1350
+ "loss": 1.8104,
1351
+ "step": 179
1352
+ },
1353
+ {
1354
+ "epoch": 0.17716535433070865,
1355
+ "grad_norm": 24.119571685791016,
1356
+ "learning_rate": 1.759443339960239e-07,
1357
+ "loss": 2.6023,
1358
+ "step": 180
1359
+ },
1360
+ {
1361
+ "epoch": 0.1781496062992126,
1362
+ "grad_norm": 28.111419677734375,
1363
+ "learning_rate": 1.7693836978131217e-07,
1364
+ "loss": 2.5261,
1365
+ "step": 181
1366
+ },
1367
+ {
1368
+ "epoch": 0.17913385826771652,
1369
+ "grad_norm": 26.244142532348633,
1370
+ "learning_rate": 1.7793240556660042e-07,
1371
+ "loss": 2.1053,
1372
+ "step": 182
1373
+ },
1374
+ {
1375
+ "epoch": 0.18011811023622049,
1376
+ "grad_norm": 24.29507827758789,
1377
+ "learning_rate": 1.789264413518887e-07,
1378
+ "loss": 1.9712,
1379
+ "step": 183
1380
+ },
1381
+ {
1382
+ "epoch": 0.18110236220472442,
1383
+ "grad_norm": 26.753236770629883,
1384
+ "learning_rate": 1.7992047713717695e-07,
1385
+ "loss": 2.4693,
1386
+ "step": 184
1387
+ },
1388
+ {
1389
+ "epoch": 0.18208661417322836,
1390
+ "grad_norm": 23.031953811645508,
1391
+ "learning_rate": 1.8091451292246523e-07,
1392
+ "loss": 2.1119,
1393
+ "step": 185
1394
+ },
1395
+ {
1396
+ "epoch": 0.1830708661417323,
1397
+ "grad_norm": 24.926044464111328,
1398
+ "learning_rate": 1.819085487077535e-07,
1399
+ "loss": 2.4797,
1400
+ "step": 186
1401
+ },
1402
+ {
1403
+ "epoch": 0.18405511811023623,
1404
+ "grad_norm": 19.047605514526367,
1405
+ "learning_rate": 1.829025844930418e-07,
1406
+ "loss": 2.1587,
1407
+ "step": 187
1408
+ },
1409
+ {
1410
+ "epoch": 0.18503937007874016,
1411
+ "grad_norm": 26.785459518432617,
1412
+ "learning_rate": 1.8389662027833004e-07,
1413
+ "loss": 1.9578,
1414
+ "step": 188
1415
+ },
1416
+ {
1417
+ "epoch": 0.1860236220472441,
1418
+ "grad_norm": 22.257556915283203,
1419
+ "learning_rate": 1.8489065606361832e-07,
1420
+ "loss": 2.1368,
1421
+ "step": 189
1422
+ },
1423
+ {
1424
+ "epoch": 0.18700787401574803,
1425
+ "grad_norm": 24.006427764892578,
1426
+ "learning_rate": 1.8588469184890657e-07,
1427
+ "loss": 2.4212,
1428
+ "step": 190
1429
+ },
1430
+ {
1431
+ "epoch": 0.18799212598425197,
1432
+ "grad_norm": 22.14805793762207,
1433
+ "learning_rate": 1.8687872763419485e-07,
1434
+ "loss": 1.9591,
1435
+ "step": 191
1436
+ },
1437
+ {
1438
+ "epoch": 0.1889763779527559,
1439
+ "grad_norm": 19.438581466674805,
1440
+ "learning_rate": 1.8787276341948313e-07,
1441
+ "loss": 1.5816,
1442
+ "step": 192
1443
+ },
1444
+ {
1445
+ "epoch": 0.18996062992125984,
1446
+ "grad_norm": 19.473068237304688,
1447
+ "learning_rate": 1.888667992047714e-07,
1448
+ "loss": 1.4029,
1449
+ "step": 193
1450
+ },
1451
+ {
1452
+ "epoch": 0.19094488188976377,
1453
+ "grad_norm": 22.895261764526367,
1454
+ "learning_rate": 1.898608349900597e-07,
1455
+ "loss": 1.9385,
1456
+ "step": 194
1457
+ },
1458
+ {
1459
+ "epoch": 0.1919291338582677,
1460
+ "grad_norm": 22.117504119873047,
1461
+ "learning_rate": 1.9085487077534794e-07,
1462
+ "loss": 1.5596,
1463
+ "step": 195
1464
+ },
1465
+ {
1466
+ "epoch": 0.19291338582677164,
1467
+ "grad_norm": 25.059682846069336,
1468
+ "learning_rate": 1.9184890656063622e-07,
1469
+ "loss": 1.6663,
1470
+ "step": 196
1471
+ },
1472
+ {
1473
+ "epoch": 0.19389763779527558,
1474
+ "grad_norm": 31.338993072509766,
1475
+ "learning_rate": 1.9284294234592447e-07,
1476
+ "loss": 2.0026,
1477
+ "step": 197
1478
+ },
1479
+ {
1480
+ "epoch": 0.19488188976377951,
1481
+ "grad_norm": 25.27398681640625,
1482
+ "learning_rate": 1.9383697813121275e-07,
1483
+ "loss": 2.0046,
1484
+ "step": 198
1485
+ },
1486
+ {
1487
+ "epoch": 0.19586614173228348,
1488
+ "grad_norm": 22.62946128845215,
1489
+ "learning_rate": 1.94831013916501e-07,
1490
+ "loss": 1.5016,
1491
+ "step": 199
1492
+ },
1493
+ {
1494
+ "epoch": 0.1968503937007874,
1495
+ "grad_norm": 26.856422424316406,
1496
+ "learning_rate": 1.958250497017893e-07,
1497
+ "loss": 2.184,
1498
+ "step": 200
1499
+ },
1500
+ {
1501
+ "epoch": 0.19783464566929135,
1502
+ "grad_norm": 27.561872482299805,
1503
+ "learning_rate": 1.9681908548707756e-07,
1504
+ "loss": 2.3442,
1505
+ "step": 201
1506
+ },
1507
+ {
1508
+ "epoch": 0.19881889763779528,
1509
+ "grad_norm": 29.985654830932617,
1510
+ "learning_rate": 1.9781312127236584e-07,
1511
+ "loss": 2.6981,
1512
+ "step": 202
1513
+ },
1514
+ {
1515
+ "epoch": 0.19980314960629922,
1516
+ "grad_norm": 22.171070098876953,
1517
+ "learning_rate": 1.9880715705765412e-07,
1518
+ "loss": 2.5481,
1519
+ "step": 203
1520
+ },
1521
+ {
1522
+ "epoch": 0.20078740157480315,
1523
+ "grad_norm": 26.214073181152344,
1524
+ "learning_rate": 1.9980119284294237e-07,
1525
+ "loss": 2.9798,
1526
+ "step": 204
1527
+ },
1528
+ {
1529
+ "epoch": 0.2017716535433071,
1530
+ "grad_norm": 22.690643310546875,
1531
+ "learning_rate": 2.0079522862823065e-07,
1532
+ "loss": 2.287,
1533
+ "step": 205
1534
+ },
1535
+ {
1536
+ "epoch": 0.20275590551181102,
1537
+ "grad_norm": 21.528181076049805,
1538
+ "learning_rate": 2.017892644135189e-07,
1539
+ "loss": 1.9393,
1540
+ "step": 206
1541
+ },
1542
+ {
1543
+ "epoch": 0.20374015748031496,
1544
+ "grad_norm": 24.61495018005371,
1545
+ "learning_rate": 2.027833001988072e-07,
1546
+ "loss": 2.892,
1547
+ "step": 207
1548
+ },
1549
+ {
1550
+ "epoch": 0.2047244094488189,
1551
+ "grad_norm": 21.661781311035156,
1552
+ "learning_rate": 2.0377733598409546e-07,
1553
+ "loss": 2.26,
1554
+ "step": 208
1555
+ },
1556
+ {
1557
+ "epoch": 0.20570866141732283,
1558
+ "grad_norm": 21.17841339111328,
1559
+ "learning_rate": 2.0477137176938374e-07,
1560
+ "loss": 2.5911,
1561
+ "step": 209
1562
+ },
1563
+ {
1564
+ "epoch": 0.20669291338582677,
1565
+ "grad_norm": 22.297616958618164,
1566
+ "learning_rate": 2.05765407554672e-07,
1567
+ "loss": 2.1239,
1568
+ "step": 210
1569
+ },
1570
+ {
1571
+ "epoch": 0.2076771653543307,
1572
+ "grad_norm": 18.447376251220703,
1573
+ "learning_rate": 2.0675944333996027e-07,
1574
+ "loss": 2.0683,
1575
+ "step": 211
1576
+ },
1577
+ {
1578
+ "epoch": 0.20866141732283464,
1579
+ "grad_norm": 19.692792892456055,
1580
+ "learning_rate": 2.0775347912524852e-07,
1581
+ "loss": 1.768,
1582
+ "step": 212
1583
+ },
1584
+ {
1585
+ "epoch": 0.20964566929133857,
1586
+ "grad_norm": 24.793012619018555,
1587
+ "learning_rate": 2.087475149105368e-07,
1588
+ "loss": 2.5468,
1589
+ "step": 213
1590
+ },
1591
+ {
1592
+ "epoch": 0.2106299212598425,
1593
+ "grad_norm": 21.60909652709961,
1594
+ "learning_rate": 2.097415506958251e-07,
1595
+ "loss": 1.8956,
1596
+ "step": 214
1597
+ },
1598
+ {
1599
+ "epoch": 0.21161417322834647,
1600
+ "grad_norm": 22.114286422729492,
1601
+ "learning_rate": 2.1073558648111336e-07,
1602
+ "loss": 2.044,
1603
+ "step": 215
1604
+ },
1605
+ {
1606
+ "epoch": 0.2125984251968504,
1607
+ "grad_norm": 19.02837562561035,
1608
+ "learning_rate": 2.1172962226640164e-07,
1609
+ "loss": 1.5721,
1610
+ "step": 216
1611
+ },
1612
+ {
1613
+ "epoch": 0.21358267716535434,
1614
+ "grad_norm": 19.785751342773438,
1615
+ "learning_rate": 2.127236580516899e-07,
1616
+ "loss": 1.6278,
1617
+ "step": 217
1618
+ },
1619
+ {
1620
+ "epoch": 0.21456692913385828,
1621
+ "grad_norm": 21.3282470703125,
1622
+ "learning_rate": 2.1371769383697817e-07,
1623
+ "loss": 1.7754,
1624
+ "step": 218
1625
+ },
1626
+ {
1627
+ "epoch": 0.2155511811023622,
1628
+ "grad_norm": 25.80916404724121,
1629
+ "learning_rate": 2.1471172962226642e-07,
1630
+ "loss": 1.8594,
1631
+ "step": 219
1632
+ },
1633
+ {
1634
+ "epoch": 0.21653543307086615,
1635
+ "grad_norm": 21.912315368652344,
1636
+ "learning_rate": 2.1570576540755473e-07,
1637
+ "loss": 1.8309,
1638
+ "step": 220
1639
+ },
1640
+ {
1641
+ "epoch": 0.21751968503937008,
1642
+ "grad_norm": 24.051366806030273,
1643
+ "learning_rate": 2.1669980119284298e-07,
1644
+ "loss": 2.0619,
1645
+ "step": 221
1646
+ },
1647
+ {
1648
+ "epoch": 0.21850393700787402,
1649
+ "grad_norm": 24.29237174987793,
1650
+ "learning_rate": 2.1769383697813126e-07,
1651
+ "loss": 2.3335,
1652
+ "step": 222
1653
+ },
1654
+ {
1655
+ "epoch": 0.21948818897637795,
1656
+ "grad_norm": 25.850160598754883,
1657
+ "learning_rate": 2.186878727634195e-07,
1658
+ "loss": 2.023,
1659
+ "step": 223
1660
+ },
1661
+ {
1662
+ "epoch": 0.2204724409448819,
1663
+ "grad_norm": 27.208112716674805,
1664
+ "learning_rate": 2.196819085487078e-07,
1665
+ "loss": 2.1975,
1666
+ "step": 224
1667
+ },
1668
+ {
1669
+ "epoch": 0.22145669291338582,
1670
+ "grad_norm": 22.276878356933594,
1671
+ "learning_rate": 2.2067594433399604e-07,
1672
+ "loss": 1.9228,
1673
+ "step": 225
1674
+ },
1675
+ {
1676
+ "epoch": 0.22244094488188976,
1677
+ "grad_norm": 30.213895797729492,
1678
+ "learning_rate": 2.2166998011928432e-07,
1679
+ "loss": 2.3565,
1680
+ "step": 226
1681
+ },
1682
+ {
1683
+ "epoch": 0.2234251968503937,
1684
+ "grad_norm": 22.16749382019043,
1685
+ "learning_rate": 2.2266401590457263e-07,
1686
+ "loss": 1.896,
1687
+ "step": 227
1688
+ },
1689
+ {
1690
+ "epoch": 0.22440944881889763,
1691
+ "grad_norm": 21.131744384765625,
1692
+ "learning_rate": 2.2365805168986088e-07,
1693
+ "loss": 2.0912,
1694
+ "step": 228
1695
+ },
1696
+ {
1697
+ "epoch": 0.22539370078740156,
1698
+ "grad_norm": 25.1036376953125,
1699
+ "learning_rate": 2.2465208747514916e-07,
1700
+ "loss": 2.7703,
1701
+ "step": 229
1702
+ },
1703
+ {
1704
+ "epoch": 0.2263779527559055,
1705
+ "grad_norm": 25.316814422607422,
1706
+ "learning_rate": 2.256461232604374e-07,
1707
+ "loss": 1.6988,
1708
+ "step": 230
1709
+ },
1710
+ {
1711
+ "epoch": 0.22736220472440946,
1712
+ "grad_norm": 25.363996505737305,
1713
+ "learning_rate": 2.266401590457257e-07,
1714
+ "loss": 2.0406,
1715
+ "step": 231
1716
+ },
1717
+ {
1718
+ "epoch": 0.2283464566929134,
1719
+ "grad_norm": 21.906835556030273,
1720
+ "learning_rate": 2.2763419483101394e-07,
1721
+ "loss": 1.9288,
1722
+ "step": 232
1723
+ },
1724
+ {
1725
+ "epoch": 0.22933070866141733,
1726
+ "grad_norm": 21.407150268554688,
1727
+ "learning_rate": 2.2862823061630222e-07,
1728
+ "loss": 2.0457,
1729
+ "step": 233
1730
+ },
1731
+ {
1732
+ "epoch": 0.23031496062992127,
1733
+ "grad_norm": 21.2374210357666,
1734
+ "learning_rate": 2.296222664015905e-07,
1735
+ "loss": 1.7061,
1736
+ "step": 234
1737
+ },
1738
+ {
1739
+ "epoch": 0.2312992125984252,
1740
+ "grad_norm": 20.94179344177246,
1741
+ "learning_rate": 2.3061630218687878e-07,
1742
+ "loss": 1.6244,
1743
+ "step": 235
1744
+ },
1745
+ {
1746
+ "epoch": 0.23228346456692914,
1747
+ "grad_norm": 21.845712661743164,
1748
+ "learning_rate": 2.3161033797216703e-07,
1749
+ "loss": 2.0241,
1750
+ "step": 236
1751
+ },
1752
+ {
1753
+ "epoch": 0.23326771653543307,
1754
+ "grad_norm": 19.496191024780273,
1755
+ "learning_rate": 2.326043737574553e-07,
1756
+ "loss": 1.567,
1757
+ "step": 237
1758
+ },
1759
+ {
1760
+ "epoch": 0.234251968503937,
1761
+ "grad_norm": 21.819353103637695,
1762
+ "learning_rate": 2.335984095427436e-07,
1763
+ "loss": 1.8084,
1764
+ "step": 238
1765
+ },
1766
+ {
1767
+ "epoch": 0.23523622047244094,
1768
+ "grad_norm": 27.17051124572754,
1769
+ "learning_rate": 2.3459244532803184e-07,
1770
+ "loss": 2.4363,
1771
+ "step": 239
1772
+ },
1773
+ {
1774
+ "epoch": 0.23622047244094488,
1775
+ "grad_norm": 24.850723266601562,
1776
+ "learning_rate": 2.3558648111332012e-07,
1777
+ "loss": 1.7532,
1778
+ "step": 240
1779
+ },
1780
+ {
1781
+ "epoch": 0.2372047244094488,
1782
+ "grad_norm": 24.120052337646484,
1783
+ "learning_rate": 2.365805168986084e-07,
1784
+ "loss": 2.0797,
1785
+ "step": 241
1786
+ },
1787
+ {
1788
+ "epoch": 0.23818897637795275,
1789
+ "grad_norm": 23.708179473876953,
1790
+ "learning_rate": 2.3757455268389668e-07,
1791
+ "loss": 1.9562,
1792
+ "step": 242
1793
+ },
1794
+ {
1795
+ "epoch": 0.23917322834645668,
1796
+ "grad_norm": 20.58576774597168,
1797
+ "learning_rate": 2.385685884691849e-07,
1798
+ "loss": 1.6751,
1799
+ "step": 243
1800
+ },
1801
+ {
1802
+ "epoch": 0.24015748031496062,
1803
+ "grad_norm": 25.161970138549805,
1804
+ "learning_rate": 2.395626242544732e-07,
1805
+ "loss": 2.0265,
1806
+ "step": 244
1807
+ },
1808
+ {
1809
+ "epoch": 0.24114173228346455,
1810
+ "grad_norm": 22.079387664794922,
1811
+ "learning_rate": 2.4055666003976146e-07,
1812
+ "loss": 1.6065,
1813
+ "step": 245
1814
+ },
1815
+ {
1816
+ "epoch": 0.2421259842519685,
1817
+ "grad_norm": 20.125717163085938,
1818
+ "learning_rate": 2.4155069582504976e-07,
1819
+ "loss": 1.7439,
1820
+ "step": 246
1821
+ },
1822
+ {
1823
+ "epoch": 0.24311023622047245,
1824
+ "grad_norm": 22.855445861816406,
1825
+ "learning_rate": 2.42544731610338e-07,
1826
+ "loss": 2.0237,
1827
+ "step": 247
1828
+ },
1829
+ {
1830
+ "epoch": 0.2440944881889764,
1831
+ "grad_norm": 22.16312026977539,
1832
+ "learning_rate": 2.4353876739562627e-07,
1833
+ "loss": 1.6128,
1834
+ "step": 248
1835
+ },
1836
+ {
1837
+ "epoch": 0.24507874015748032,
1838
+ "grad_norm": 21.942440032958984,
1839
+ "learning_rate": 2.445328031809146e-07,
1840
+ "loss": 1.6581,
1841
+ "step": 249
1842
+ },
1843
+ {
1844
+ "epoch": 0.24606299212598426,
1845
+ "grad_norm": 24.027729034423828,
1846
+ "learning_rate": 2.4552683896620283e-07,
1847
+ "loss": 2.1538,
1848
+ "step": 250
1849
+ },
1850
+ {
1851
+ "epoch": 0.2470472440944882,
1852
+ "grad_norm": 23.1455078125,
1853
+ "learning_rate": 2.4652087475149113e-07,
1854
+ "loss": 2.049,
1855
+ "step": 251
1856
+ },
1857
+ {
1858
+ "epoch": 0.24803149606299213,
1859
+ "grad_norm": 22.017520904541016,
1860
+ "learning_rate": 2.475149105367794e-07,
1861
+ "loss": 1.2573,
1862
+ "step": 252
1863
+ },
1864
+ {
1865
+ "epoch": 0.24901574803149606,
1866
+ "grad_norm": 18.747554779052734,
1867
+ "learning_rate": 2.4850894632206764e-07,
1868
+ "loss": 1.5619,
1869
+ "step": 253
1870
+ },
1871
+ {
1872
+ "epoch": 0.25,
1873
+ "grad_norm": 18.277063369750977,
1874
+ "learning_rate": 2.495029821073559e-07,
1875
+ "loss": 1.2611,
1876
+ "step": 254
1877
+ },
1878
+ {
1879
+ "epoch": 0.25098425196850394,
1880
+ "grad_norm": 19.781038284301758,
1881
+ "learning_rate": 2.5049701789264414e-07,
1882
+ "loss": 1.3443,
1883
+ "step": 255
1884
+ },
1885
+ {
1886
+ "epoch": 0.25196850393700787,
1887
+ "grad_norm": 21.605199813842773,
1888
+ "learning_rate": 2.5149105367793245e-07,
1889
+ "loss": 1.3436,
1890
+ "step": 256
1891
+ },
1892
+ {
1893
+ "epoch": 0.2529527559055118,
1894
+ "grad_norm": 29.64748764038086,
1895
+ "learning_rate": 2.524850894632207e-07,
1896
+ "loss": 2.8117,
1897
+ "step": 257
1898
+ },
1899
+ {
1900
+ "epoch": 0.25393700787401574,
1901
+ "grad_norm": 19.245962142944336,
1902
+ "learning_rate": 2.53479125248509e-07,
1903
+ "loss": 1.7563,
1904
+ "step": 258
1905
+ },
1906
+ {
1907
+ "epoch": 0.2549212598425197,
1908
+ "grad_norm": 18.752405166625977,
1909
+ "learning_rate": 2.5447316103379726e-07,
1910
+ "loss": 1.3148,
1911
+ "step": 259
1912
+ },
1913
+ {
1914
+ "epoch": 0.2559055118110236,
1915
+ "grad_norm": 24.133914947509766,
1916
+ "learning_rate": 2.554671968190855e-07,
1917
+ "loss": 2.0278,
1918
+ "step": 260
1919
+ },
1920
+ {
1921
+ "epoch": 0.25688976377952755,
1922
+ "grad_norm": 19.490673065185547,
1923
+ "learning_rate": 2.564612326043738e-07,
1924
+ "loss": 1.2403,
1925
+ "step": 261
1926
+ },
1927
+ {
1928
+ "epoch": 0.2578740157480315,
1929
+ "grad_norm": 20.14169692993164,
1930
+ "learning_rate": 2.5745526838966207e-07,
1931
+ "loss": 1.588,
1932
+ "step": 262
1933
+ },
1934
+ {
1935
+ "epoch": 0.2588582677165354,
1936
+ "grad_norm": 26.948959350585938,
1937
+ "learning_rate": 2.5844930417495037e-07,
1938
+ "loss": 2.0071,
1939
+ "step": 263
1940
+ },
1941
+ {
1942
+ "epoch": 0.25984251968503935,
1943
+ "grad_norm": 22.330224990844727,
1944
+ "learning_rate": 2.5944333996023857e-07,
1945
+ "loss": 1.5312,
1946
+ "step": 264
1947
+ },
1948
+ {
1949
+ "epoch": 0.2608267716535433,
1950
+ "grad_norm": 21.379629135131836,
1951
+ "learning_rate": 2.604373757455269e-07,
1952
+ "loss": 1.8641,
1953
+ "step": 265
1954
+ },
1955
+ {
1956
+ "epoch": 0.2618110236220472,
1957
+ "grad_norm": 20.703140258789062,
1958
+ "learning_rate": 2.614314115308152e-07,
1959
+ "loss": 1.2933,
1960
+ "step": 266
1961
+ },
1962
+ {
1963
+ "epoch": 0.26279527559055116,
1964
+ "grad_norm": 22.9117488861084,
1965
+ "learning_rate": 2.6242544731610343e-07,
1966
+ "loss": 1.6262,
1967
+ "step": 267
1968
+ },
1969
+ {
1970
+ "epoch": 0.2637795275590551,
1971
+ "grad_norm": 23.842002868652344,
1972
+ "learning_rate": 2.634194831013917e-07,
1973
+ "loss": 1.721,
1974
+ "step": 268
1975
+ },
1976
+ {
1977
+ "epoch": 0.26476377952755903,
1978
+ "grad_norm": 20.449384689331055,
1979
+ "learning_rate": 2.6441351888667994e-07,
1980
+ "loss": 1.4713,
1981
+ "step": 269
1982
+ },
1983
+ {
1984
+ "epoch": 0.265748031496063,
1985
+ "grad_norm": 25.885969161987305,
1986
+ "learning_rate": 2.6540755467196824e-07,
1987
+ "loss": 1.4625,
1988
+ "step": 270
1989
+ },
1990
+ {
1991
+ "epoch": 0.26673228346456695,
1992
+ "grad_norm": 24.52666473388672,
1993
+ "learning_rate": 2.664015904572565e-07,
1994
+ "loss": 1.7254,
1995
+ "step": 271
1996
+ },
1997
+ {
1998
+ "epoch": 0.2677165354330709,
1999
+ "grad_norm": 23.4957275390625,
2000
+ "learning_rate": 2.6739562624254475e-07,
2001
+ "loss": 1.5108,
2002
+ "step": 272
2003
+ },
2004
+ {
2005
+ "epoch": 0.2687007874015748,
2006
+ "grad_norm": 23.828855514526367,
2007
+ "learning_rate": 2.6838966202783305e-07,
2008
+ "loss": 2.1126,
2009
+ "step": 273
2010
+ },
2011
+ {
2012
+ "epoch": 0.26968503937007876,
2013
+ "grad_norm": 21.05967903137207,
2014
+ "learning_rate": 2.693836978131213e-07,
2015
+ "loss": 1.3967,
2016
+ "step": 274
2017
+ },
2018
+ {
2019
+ "epoch": 0.2706692913385827,
2020
+ "grad_norm": 24.555776596069336,
2021
+ "learning_rate": 2.703777335984096e-07,
2022
+ "loss": 1.7067,
2023
+ "step": 275
2024
+ },
2025
+ {
2026
+ "epoch": 0.27165354330708663,
2027
+ "grad_norm": 22.135860443115234,
2028
+ "learning_rate": 2.7137176938369786e-07,
2029
+ "loss": 1.4847,
2030
+ "step": 276
2031
+ },
2032
+ {
2033
+ "epoch": 0.27263779527559057,
2034
+ "grad_norm": 22.105913162231445,
2035
+ "learning_rate": 2.723658051689861e-07,
2036
+ "loss": 1.6515,
2037
+ "step": 277
2038
+ },
2039
+ {
2040
+ "epoch": 0.2736220472440945,
2041
+ "grad_norm": 16.617225646972656,
2042
+ "learning_rate": 2.7335984095427437e-07,
2043
+ "loss": 0.9367,
2044
+ "step": 278
2045
+ },
2046
+ {
2047
+ "epoch": 0.27460629921259844,
2048
+ "grad_norm": 26.01727867126465,
2049
+ "learning_rate": 2.743538767395627e-07,
2050
+ "loss": 2.0267,
2051
+ "step": 279
2052
+ },
2053
+ {
2054
+ "epoch": 0.2755905511811024,
2055
+ "grad_norm": 22.522462844848633,
2056
+ "learning_rate": 2.75347912524851e-07,
2057
+ "loss": 1.5023,
2058
+ "step": 280
2059
+ },
2060
+ {
2061
+ "epoch": 0.2765748031496063,
2062
+ "grad_norm": 20.646358489990234,
2063
+ "learning_rate": 2.763419483101392e-07,
2064
+ "loss": 1.1248,
2065
+ "step": 281
2066
+ },
2067
+ {
2068
+ "epoch": 0.27755905511811024,
2069
+ "grad_norm": 23.3087158203125,
2070
+ "learning_rate": 2.773359840954275e-07,
2071
+ "loss": 1.6224,
2072
+ "step": 282
2073
+ },
2074
+ {
2075
+ "epoch": 0.2785433070866142,
2076
+ "grad_norm": 24.115968704223633,
2077
+ "learning_rate": 2.7833001988071574e-07,
2078
+ "loss": 1.7969,
2079
+ "step": 283
2080
+ },
2081
+ {
2082
+ "epoch": 0.2795275590551181,
2083
+ "grad_norm": 27.229188919067383,
2084
+ "learning_rate": 2.7932405566600404e-07,
2085
+ "loss": 2.2498,
2086
+ "step": 284
2087
+ },
2088
+ {
2089
+ "epoch": 0.28051181102362205,
2090
+ "grad_norm": 25.797216415405273,
2091
+ "learning_rate": 2.803180914512923e-07,
2092
+ "loss": 1.7477,
2093
+ "step": 285
2094
+ },
2095
+ {
2096
+ "epoch": 0.281496062992126,
2097
+ "grad_norm": 23.666858673095703,
2098
+ "learning_rate": 2.8131212723658055e-07,
2099
+ "loss": 1.6261,
2100
+ "step": 286
2101
+ },
2102
+ {
2103
+ "epoch": 0.2824803149606299,
2104
+ "grad_norm": 26.826387405395508,
2105
+ "learning_rate": 2.8230616302186885e-07,
2106
+ "loss": 2.0911,
2107
+ "step": 287
2108
+ },
2109
+ {
2110
+ "epoch": 0.28346456692913385,
2111
+ "grad_norm": 23.288511276245117,
2112
+ "learning_rate": 2.833001988071571e-07,
2113
+ "loss": 1.9519,
2114
+ "step": 288
2115
+ },
2116
+ {
2117
+ "epoch": 0.2844488188976378,
2118
+ "grad_norm": 19.76810073852539,
2119
+ "learning_rate": 2.842942345924454e-07,
2120
+ "loss": 1.3132,
2121
+ "step": 289
2122
+ },
2123
+ {
2124
+ "epoch": 0.2854330708661417,
2125
+ "grad_norm": 24.65778923034668,
2126
+ "learning_rate": 2.852882703777336e-07,
2127
+ "loss": 2.3292,
2128
+ "step": 290
2129
+ },
2130
+ {
2131
+ "epoch": 0.28641732283464566,
2132
+ "grad_norm": 19.230430603027344,
2133
+ "learning_rate": 2.862823061630219e-07,
2134
+ "loss": 1.3781,
2135
+ "step": 291
2136
+ },
2137
+ {
2138
+ "epoch": 0.2874015748031496,
2139
+ "grad_norm": 20.665143966674805,
2140
+ "learning_rate": 2.8727634194831017e-07,
2141
+ "loss": 1.5753,
2142
+ "step": 292
2143
+ },
2144
+ {
2145
+ "epoch": 0.28838582677165353,
2146
+ "grad_norm": 18.584993362426758,
2147
+ "learning_rate": 2.8827037773359847e-07,
2148
+ "loss": 1.4158,
2149
+ "step": 293
2150
+ },
2151
+ {
2152
+ "epoch": 0.28937007874015747,
2153
+ "grad_norm": 20.731081008911133,
2154
+ "learning_rate": 2.892644135188867e-07,
2155
+ "loss": 2.1661,
2156
+ "step": 294
2157
+ },
2158
+ {
2159
+ "epoch": 0.2903543307086614,
2160
+ "grad_norm": 20.006755828857422,
2161
+ "learning_rate": 2.90258449304175e-07,
2162
+ "loss": 1.4928,
2163
+ "step": 295
2164
+ },
2165
+ {
2166
+ "epoch": 0.29133858267716534,
2167
+ "grad_norm": 20.01453971862793,
2168
+ "learning_rate": 2.912524850894633e-07,
2169
+ "loss": 2.2825,
2170
+ "step": 296
2171
+ },
2172
+ {
2173
+ "epoch": 0.29232283464566927,
2174
+ "grad_norm": 18.137697219848633,
2175
+ "learning_rate": 2.9224652087475153e-07,
2176
+ "loss": 1.7261,
2177
+ "step": 297
2178
+ },
2179
+ {
2180
+ "epoch": 0.2933070866141732,
2181
+ "grad_norm": 20.046411514282227,
2182
+ "learning_rate": 2.9324055666003984e-07,
2183
+ "loss": 1.8635,
2184
+ "step": 298
2185
+ },
2186
+ {
2187
+ "epoch": 0.29429133858267714,
2188
+ "grad_norm": 15.859444618225098,
2189
+ "learning_rate": 2.9423459244532804e-07,
2190
+ "loss": 0.974,
2191
+ "step": 299
2192
+ },
2193
+ {
2194
+ "epoch": 0.2952755905511811,
2195
+ "grad_norm": 17.976015090942383,
2196
+ "learning_rate": 2.9522862823061634e-07,
2197
+ "loss": 1.53,
2198
+ "step": 300
2199
+ },
2200
+ {
2201
+ "epoch": 0.296259842519685,
2202
+ "grad_norm": 20.52704429626465,
2203
+ "learning_rate": 2.9622266401590465e-07,
2204
+ "loss": 1.5985,
2205
+ "step": 301
2206
+ },
2207
+ {
2208
+ "epoch": 0.297244094488189,
2209
+ "grad_norm": 19.45134925842285,
2210
+ "learning_rate": 2.972166998011929e-07,
2211
+ "loss": 1.2169,
2212
+ "step": 302
2213
+ },
2214
+ {
2215
+ "epoch": 0.29822834645669294,
2216
+ "grad_norm": 22.01141357421875,
2217
+ "learning_rate": 2.9821073558648115e-07,
2218
+ "loss": 1.771,
2219
+ "step": 303
2220
+ },
2221
+ {
2222
+ "epoch": 0.2992125984251969,
2223
+ "grad_norm": 17.591617584228516,
2224
+ "learning_rate": 2.992047713717694e-07,
2225
+ "loss": 1.4506,
2226
+ "step": 304
2227
+ },
2228
+ {
2229
+ "epoch": 0.3001968503937008,
2230
+ "grad_norm": 25.382244110107422,
2231
+ "learning_rate": 3.001988071570577e-07,
2232
+ "loss": 1.9496,
2233
+ "step": 305
2234
+ }
2235
+ ],
2236
+ "logging_steps": 1,
2237
+ "max_steps": 3048,
2238
+ "num_input_tokens_seen": 0,
2239
+ "num_train_epochs": 3,
2240
+ "save_steps": 305,
2241
+ "stateful_callbacks": {
2242
+ "TrainerControl": {
2243
+ "args": {
2244
+ "should_epoch_stop": false,
2245
+ "should_evaluate": false,
2246
+ "should_log": false,
2247
+ "should_save": true,
2248
+ "should_training_stop": false
2249
+ },
2250
+ "attributes": {}
2251
+ }
2252
+ },
2253
+ "total_flos": 0.0,
2254
+ "train_batch_size": 32,
2255
+ "trial_name": null,
2256
+ "trial_params": null
2257
+ }
checkpoint-305/training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7ef082ecfeb35653388a04cfe07d1e8fd3de2824ada0763dd5244bf3b856da9a
3
+ size 5688