bobox commited on
Commit
596bcdc
·
verified ·
1 Parent(s): 1d7d22e

Training in progress, step 160, checkpoint

Browse files
checkpoint-160/1_AdvancedWeightedPooling/config.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "embed_dim": 768,
3
+ "num_heads": 8,
4
+ "dropout": 0.0,
5
+ "bias": true,
6
+ "gate_min": 0.2,
7
+ "gate_max": 0.8
8
+ }
checkpoint-160/1_AdvancedWeightedPooling/pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a424755aae7f58fa33dfecafe67e488934aa1c52ed81c9370768f1e742544cee
3
+ size 11828367
checkpoint-160/README.md ADDED
@@ -0,0 +1,982 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: microsoft/deberta-v3-small
3
+ datasets:
4
+ - tals/vitaminc
5
+ language:
6
+ - en
7
+ library_name: sentence-transformers
8
+ metrics:
9
+ - pearson_cosine
10
+ - spearman_cosine
11
+ - pearson_manhattan
12
+ - spearman_manhattan
13
+ - pearson_euclidean
14
+ - spearman_euclidean
15
+ - pearson_dot
16
+ - spearman_dot
17
+ - pearson_max
18
+ - spearman_max
19
+ - cosine_accuracy
20
+ - cosine_accuracy_threshold
21
+ - cosine_f1
22
+ - cosine_f1_threshold
23
+ - cosine_precision
24
+ - cosine_recall
25
+ - cosine_ap
26
+ - dot_accuracy
27
+ - dot_accuracy_threshold
28
+ - dot_f1
29
+ - dot_f1_threshold
30
+ - dot_precision
31
+ - dot_recall
32
+ - dot_ap
33
+ - manhattan_accuracy
34
+ - manhattan_accuracy_threshold
35
+ - manhattan_f1
36
+ - manhattan_f1_threshold
37
+ - manhattan_precision
38
+ - manhattan_recall
39
+ - manhattan_ap
40
+ - euclidean_accuracy
41
+ - euclidean_accuracy_threshold
42
+ - euclidean_f1
43
+ - euclidean_f1_threshold
44
+ - euclidean_precision
45
+ - euclidean_recall
46
+ - euclidean_ap
47
+ - max_accuracy
48
+ - max_accuracy_threshold
49
+ - max_f1
50
+ - max_f1_threshold
51
+ - max_precision
52
+ - max_recall
53
+ - max_ap
54
+ pipeline_tag: sentence-similarity
55
+ tags:
56
+ - sentence-transformers
57
+ - sentence-similarity
58
+ - feature-extraction
59
+ - generated_from_trainer
60
+ - dataset_size:225247
61
+ - loss:CachedGISTEmbedLoss
62
+ widget:
63
+ - source_sentence: how long to grill boneless skinless chicken breasts in oven
64
+ sentences:
65
+ - "[ syll. a-ka-hi, ak-ahi ] The baby boy name Akahi is also used as a girl name.\
66
+ \ Its pronunciation is AA K AA HHiy â\x80 . Akahi's origin, as well as its use,\
67
+ \ is in the Hawaiian language. The name's meaning is never before. Akahi is infrequently\
68
+ \ used as a baby name for boys."
69
+ - October consists of 31 days. November has 30 days. When you add both together
70
+ they have 61 days.
71
+ - Heat a grill or grill pan. When the grill is hot, place the chicken on the grill
72
+ and cook for about 4 minutes per side, or until cooked through. You can also bake
73
+ the thawed chicken in a 375 degree F oven for 15 minutes, or until cooked through.
74
+ - source_sentence: More than 273 people have died from the 2019-20 coronavirus outside
75
+ mainland China .
76
+ sentences:
77
+ - 'More than 3,700 people have died : around 3,100 in mainland China and around
78
+ 550 in all other countries combined .'
79
+ - 'More than 3,200 people have died : almost 3,000 in mainland China and around
80
+ 275 in other countries .'
81
+ - more than 4,900 deaths have been attributed to COVID-19 .
82
+ - source_sentence: Most red algae species live in oceans.
83
+ sentences:
84
+ - Where do most red algae species live?
85
+ - Which layer of the earth is molten?
86
+ - As a diver descends, the increase in pressure causes the body’s air pockets in
87
+ the ears and lungs to do what?
88
+ - source_sentence: Binary compounds of carbon with less electronegative elements are
89
+ called carbides.
90
+ sentences:
91
+ - What are four children born at one birth called?
92
+ - Binary compounds of carbon with less electronegative elements are called what?
93
+ - The water cycle involves movement of water between air and what?
94
+ - source_sentence: What is the basic monetary unit of Iceland?
95
+ sentences:
96
+ - 'Ao dai - Vietnamese traditional dress - YouTube Ao dai - Vietnamese traditional
97
+ dress Want to watch this again later? Sign in to add this video to a playlist.
98
+ Need to report the video? Sign in to report inappropriate content. Rating is available
99
+ when the video has been rented. This feature is not available right now. Please
100
+ try again later. Uploaded on Jul 8, 2009 Simple, yet charming, graceful and elegant,
101
+ áo dài was designed to praise the slender beauty of Vietnamese women. The dress
102
+ is a genius combination of ancient and modern. It shows every curve on the girl''s
103
+ body, creating sexiness for the wearer, yet it still preserves the traditional
104
+ feminine grace of Vietnamese women with its charming flowing flaps. The simplicity
105
+ of áo dài makes it convenient and practical, something that other Asian traditional
106
+ clothes lack. The waist-length slits of the flaps allow every movement of the
107
+ legs: walking, running, riding a bicycle, climbing a tree, doing high kicks. The
108
+ looseness of the pants allows comfortability. As a girl walks in áo dài, the movements
109
+ of the flaps make it seem like she''s not walking but floating in the air. This
110
+ breath-taking beautiful image of a Vietnamese girl walking in áo dài has been
111
+ an inspiration for generations of Vietnamese poets, novelists, artists and has
112
+ left a deep impression for every foreigner who has visited the country. Category'
113
+ - 'Icelandic monetary unit - definition of Icelandic monetary unit by The Free Dictionary
114
+ Icelandic monetary unit - definition of Icelandic monetary unit by The Free Dictionary
115
+ http://www.thefreedictionary.com/Icelandic+monetary+unit Related to Icelandic
116
+ monetary unit: Icelandic Old Krona ThesaurusAntonymsRelated WordsSynonymsLegend:
117
+ monetary unit - a unit of money Icelandic krona , krona - the basic unit of money
118
+ in Iceland eyrir - 100 aurar equal 1 krona in Iceland Want to thank TFD for its
119
+ existence? Tell a friend about us , add a link to this page, or visit the webmaster''s
120
+ page for free fun content . Link to this page: Copyright © 2003-2017 Farlex, Inc
121
+ Disclaimer All content on this website, including dictionary, thesaurus, literature,
122
+ geography, and other reference data is for informational purposes only. This information
123
+ should not be considered complete, up to date, and is not intended to be used
124
+ in place of a visit, consultation, or advice of a legal, medical, or any other
125
+ professional.'
126
+ - 'Food-Info.net : E-numbers : E140: Chlorophyll CI 75810, Natural Green 3, Chlorophyll
127
+ A, Magnesium chlorophyll Origin: Natural green colour, present in all plants and
128
+ algae. Commercially extracted from nettles, grass and alfalfa. Function & characteristics:'
129
+ model-index:
130
+ - name: SentenceTransformer based on microsoft/deberta-v3-small
131
+ results:
132
+ - task:
133
+ type: semantic-similarity
134
+ name: Semantic Similarity
135
+ dataset:
136
+ name: sts test
137
+ type: sts-test
138
+ metrics:
139
+ - type: pearson_cosine
140
+ value: 0.2853943019391156
141
+ name: Pearson Cosine
142
+ - type: spearman_cosine
143
+ value: 0.31414239162305135
144
+ name: Spearman Cosine
145
+ - type: pearson_manhattan
146
+ value: 0.3110310476615048
147
+ name: Pearson Manhattan
148
+ - type: spearman_manhattan
149
+ value: 0.3366243060620438
150
+ name: Spearman Manhattan
151
+ - type: pearson_euclidean
152
+ value: 0.29405773952219494
153
+ name: Pearson Euclidean
154
+ - type: spearman_euclidean
155
+ value: 0.3141516551339523
156
+ name: Spearman Euclidean
157
+ - type: pearson_dot
158
+ value: 0.28526334639473966
159
+ name: Pearson Dot
160
+ - type: spearman_dot
161
+ value: 0.31380407209449446
162
+ name: Spearman Dot
163
+ - type: pearson_max
164
+ value: 0.3110310476615048
165
+ name: Pearson Max
166
+ - type: spearman_max
167
+ value: 0.3366243060620438
168
+ name: Spearman Max
169
+ - task:
170
+ type: binary-classification
171
+ name: Binary Classification
172
+ dataset:
173
+ name: allNLI dev
174
+ type: allNLI-dev
175
+ metrics:
176
+ - type: cosine_accuracy
177
+ value: 0.66796875
178
+ name: Cosine Accuracy
179
+ - type: cosine_accuracy_threshold
180
+ value: 0.9767438173294067
181
+ name: Cosine Accuracy Threshold
182
+ - type: cosine_f1
183
+ value: 0.5100182149362477
184
+ name: Cosine F1
185
+ - type: cosine_f1_threshold
186
+ value: 0.8540960550308228
187
+ name: Cosine F1 Threshold
188
+ - type: cosine_precision
189
+ value: 0.3723404255319149
190
+ name: Cosine Precision
191
+ - type: cosine_recall
192
+ value: 0.8092485549132948
193
+ name: Cosine Recall
194
+ - type: cosine_ap
195
+ value: 0.38624833037583434
196
+ name: Cosine Ap
197
+ - type: dot_accuracy
198
+ value: 0.66796875
199
+ name: Dot Accuracy
200
+ - type: dot_accuracy_threshold
201
+ value: 750.345458984375
202
+ name: Dot Accuracy Threshold
203
+ - type: dot_f1
204
+ value: 0.5100182149362477
205
+ name: Dot F1
206
+ - type: dot_f1_threshold
207
+ value: 656.0940551757812
208
+ name: Dot F1 Threshold
209
+ - type: dot_precision
210
+ value: 0.3723404255319149
211
+ name: Dot Precision
212
+ - type: dot_recall
213
+ value: 0.8092485549132948
214
+ name: Dot Recall
215
+ - type: dot_ap
216
+ value: 0.3862261253421553
217
+ name: Dot Ap
218
+ - type: manhattan_accuracy
219
+ value: 0.6640625
220
+ name: Manhattan Accuracy
221
+ - type: manhattan_accuracy_threshold
222
+ value: 78.52637481689453
223
+ name: Manhattan Accuracy Threshold
224
+ - type: manhattan_f1
225
+ value: 0.5062388591800357
226
+ name: Manhattan F1
227
+ - type: manhattan_f1_threshold
228
+ value: 285.7745361328125
229
+ name: Manhattan F1 Threshold
230
+ - type: manhattan_precision
231
+ value: 0.36597938144329895
232
+ name: Manhattan Precision
233
+ - type: manhattan_recall
234
+ value: 0.8208092485549133
235
+ name: Manhattan Recall
236
+ - type: manhattan_ap
237
+ value: 0.3898187083180651
238
+ name: Manhattan Ap
239
+ - type: euclidean_accuracy
240
+ value: 0.66796875
241
+ name: Euclidean Accuracy
242
+ - type: euclidean_accuracy_threshold
243
+ value: 5.977196216583252
244
+ name: Euclidean Accuracy Threshold
245
+ - type: euclidean_f1
246
+ value: 0.5100182149362477
247
+ name: Euclidean F1
248
+ - type: euclidean_f1_threshold
249
+ value: 14.971920013427734
250
+ name: Euclidean F1 Threshold
251
+ - type: euclidean_precision
252
+ value: 0.3723404255319149
253
+ name: Euclidean Precision
254
+ - type: euclidean_recall
255
+ value: 0.8092485549132948
256
+ name: Euclidean Recall
257
+ - type: euclidean_ap
258
+ value: 0.38624380046547035
259
+ name: Euclidean Ap
260
+ - type: max_accuracy
261
+ value: 0.66796875
262
+ name: Max Accuracy
263
+ - type: max_accuracy_threshold
264
+ value: 750.345458984375
265
+ name: Max Accuracy Threshold
266
+ - type: max_f1
267
+ value: 0.5100182149362477
268
+ name: Max F1
269
+ - type: max_f1_threshold
270
+ value: 656.0940551757812
271
+ name: Max F1 Threshold
272
+ - type: max_precision
273
+ value: 0.3723404255319149
274
+ name: Max Precision
275
+ - type: max_recall
276
+ value: 0.8208092485549133
277
+ name: Max Recall
278
+ - type: max_ap
279
+ value: 0.3898187083180651
280
+ name: Max Ap
281
+ - task:
282
+ type: binary-classification
283
+ name: Binary Classification
284
+ dataset:
285
+ name: Qnli dev
286
+ type: Qnli-dev
287
+ metrics:
288
+ - type: cosine_accuracy
289
+ value: 0.62890625
290
+ name: Cosine Accuracy
291
+ - type: cosine_accuracy_threshold
292
+ value: 0.9045097827911377
293
+ name: Cosine Accuracy Threshold
294
+ - type: cosine_f1
295
+ value: 0.6397415185783522
296
+ name: Cosine F1
297
+ - type: cosine_f1_threshold
298
+ value: 0.8351442813873291
299
+ name: Cosine F1 Threshold
300
+ - type: cosine_precision
301
+ value: 0.5169712793733682
302
+ name: Cosine Precision
303
+ - type: cosine_recall
304
+ value: 0.8389830508474576
305
+ name: Cosine Recall
306
+ - type: cosine_ap
307
+ value: 0.6193527955003784
308
+ name: Cosine Ap
309
+ - type: dot_accuracy
310
+ value: 0.62890625
311
+ name: Dot Accuracy
312
+ - type: dot_accuracy_threshold
313
+ value: 694.7778930664062
314
+ name: Dot Accuracy Threshold
315
+ - type: dot_f1
316
+ value: 0.6397415185783522
317
+ name: Dot F1
318
+ - type: dot_f1_threshold
319
+ value: 641.4969482421875
320
+ name: Dot F1 Threshold
321
+ - type: dot_precision
322
+ value: 0.5169712793733682
323
+ name: Dot Precision
324
+ - type: dot_recall
325
+ value: 0.8389830508474576
326
+ name: Dot Recall
327
+ - type: dot_ap
328
+ value: 0.6194150916988216
329
+ name: Dot Ap
330
+ - type: manhattan_accuracy
331
+ value: 0.646484375
332
+ name: Manhattan Accuracy
333
+ - type: manhattan_accuracy_threshold
334
+ value: 245.2164306640625
335
+ name: Manhattan Accuracy Threshold
336
+ - type: manhattan_f1
337
+ value: 0.6521060842433698
338
+ name: Manhattan F1
339
+ - type: manhattan_f1_threshold
340
+ value: 303.317626953125
341
+ name: Manhattan F1 Threshold
342
+ - type: manhattan_precision
343
+ value: 0.5160493827160494
344
+ name: Manhattan Precision
345
+ - type: manhattan_recall
346
+ value: 0.885593220338983
347
+ name: Manhattan Recall
348
+ - type: manhattan_ap
349
+ value: 0.6417015148414534
350
+ name: Manhattan Ap
351
+ - type: euclidean_accuracy
352
+ value: 0.62890625
353
+ name: Euclidean Accuracy
354
+ - type: euclidean_accuracy_threshold
355
+ value: 12.111844062805176
356
+ name: Euclidean Accuracy Threshold
357
+ - type: euclidean_f1
358
+ value: 0.6397415185783522
359
+ name: Euclidean F1
360
+ - type: euclidean_f1_threshold
361
+ value: 15.914146423339844
362
+ name: Euclidean F1 Threshold
363
+ - type: euclidean_precision
364
+ value: 0.5169712793733682
365
+ name: Euclidean Precision
366
+ - type: euclidean_recall
367
+ value: 0.8389830508474576
368
+ name: Euclidean Recall
369
+ - type: euclidean_ap
370
+ value: 0.6193576186776235
371
+ name: Euclidean Ap
372
+ - type: max_accuracy
373
+ value: 0.646484375
374
+ name: Max Accuracy
375
+ - type: max_accuracy_threshold
376
+ value: 694.7778930664062
377
+ name: Max Accuracy Threshold
378
+ - type: max_f1
379
+ value: 0.6521060842433698
380
+ name: Max F1
381
+ - type: max_f1_threshold
382
+ value: 641.4969482421875
383
+ name: Max F1 Threshold
384
+ - type: max_precision
385
+ value: 0.5169712793733682
386
+ name: Max Precision
387
+ - type: max_recall
388
+ value: 0.885593220338983
389
+ name: Max Recall
390
+ - type: max_ap
391
+ value: 0.6417015148414534
392
+ name: Max Ap
393
+ ---
394
+
395
+ # SentenceTransformer based on microsoft/deberta-v3-small
396
+
397
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [microsoft/deberta-v3-small](https://huggingface.co/microsoft/deberta-v3-small). It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
398
+
399
+ ## Model Details
400
+
401
+ ### Model Description
402
+ - **Model Type:** Sentence Transformer
403
+ - **Base model:** [microsoft/deberta-v3-small](https://huggingface.co/microsoft/deberta-v3-small) <!-- at revision a36c739020e01763fe789b4b85e2df55d6180012 -->
404
+ - **Maximum Sequence Length:** 512 tokens
405
+ - **Output Dimensionality:** 768 tokens
406
+ - **Similarity Function:** Cosine Similarity
407
+ <!-- - **Training Dataset:** Unknown -->
408
+ - **Language:** en
409
+ <!-- - **License:** Unknown -->
410
+
411
+ ### Model Sources
412
+
413
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
414
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
415
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
416
+
417
+ ### Full Model Architecture
418
+
419
+ ```
420
+ SentenceTransformer(
421
+ (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: DebertaV2Model
422
+ (1): AdvancedWeightedPooling(
423
+ (linear_cls): Linear(in_features=768, out_features=768, bias=True)
424
+ (mha): MultiheadAttention(
425
+ (out_proj): NonDynamicallyQuantizableLinear(in_features=768, out_features=768, bias=True)
426
+ )
427
+ (layernorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
428
+ (layernorm2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
429
+ )
430
+ )
431
+ ```
432
+
433
+ ## Usage
434
+
435
+ ### Direct Usage (Sentence Transformers)
436
+
437
+ First install the Sentence Transformers library:
438
+
439
+ ```bash
440
+ pip install -U sentence-transformers
441
+ ```
442
+
443
+ Then you can load this model and run inference.
444
+ ```python
445
+ from sentence_transformers import SentenceTransformer
446
+
447
+ # Download from the 🤗 Hub
448
+ model = SentenceTransformer("bobox/DeBERTa3-s-CustomPooling-test1-checkpoints-tmp")
449
+ # Run inference
450
+ sentences = [
451
+ 'What is the basic monetary unit of Iceland?',
452
+ "Icelandic monetary unit - definition of Icelandic monetary unit by The Free Dictionary Icelandic monetary unit - definition of Icelandic monetary unit by The Free Dictionary http://www.thefreedictionary.com/Icelandic+monetary+unit Related to Icelandic monetary unit: Icelandic Old Krona ThesaurusAntonymsRelated WordsSynonymsLegend: monetary unit - a unit of money Icelandic krona , krona - the basic unit of money in Iceland eyrir - 100 aurar equal 1 krona in Iceland Want to thank TFD for its existence? Tell a friend about us , add a link to this page, or visit the webmaster's page for free fun content . Link to this page: Copyright © 2003-2017 Farlex, Inc Disclaimer All content on this website, including dictionary, thesaurus, literature, geography, and other reference data is for informational purposes only. This information should not be considered complete, up to date, and is not intended to be used in place of a visit, consultation, or advice of a legal, medical, or any other professional.",
453
+ 'Food-Info.net : E-numbers : E140: Chlorophyll CI 75810, Natural Green 3, Chlorophyll A, Magnesium chlorophyll Origin: Natural green colour, present in all plants and algae. Commercially extracted from nettles, grass and alfalfa. Function & characteristics:',
454
+ ]
455
+ embeddings = model.encode(sentences)
456
+ print(embeddings.shape)
457
+ # [3, 768]
458
+
459
+ # Get the similarity scores for the embeddings
460
+ similarities = model.similarity(embeddings, embeddings)
461
+ print(similarities.shape)
462
+ # [3, 3]
463
+ ```
464
+
465
+ <!--
466
+ ### Direct Usage (Transformers)
467
+
468
+ <details><summary>Click to see the direct usage in Transformers</summary>
469
+
470
+ </details>
471
+ -->
472
+
473
+ <!--
474
+ ### Downstream Usage (Sentence Transformers)
475
+
476
+ You can finetune this model on your own dataset.
477
+
478
+ <details><summary>Click to expand</summary>
479
+
480
+ </details>
481
+ -->
482
+
483
+ <!--
484
+ ### Out-of-Scope Use
485
+
486
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
487
+ -->
488
+
489
+ ## Evaluation
490
+
491
+ ### Metrics
492
+
493
+ #### Semantic Similarity
494
+ * Dataset: `sts-test`
495
+ * Evaluated with [<code>EmbeddingSimilarityEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.EmbeddingSimilarityEvaluator)
496
+
497
+ | Metric | Value |
498
+ |:--------------------|:-----------|
499
+ | pearson_cosine | 0.2854 |
500
+ | **spearman_cosine** | **0.3141** |
501
+ | pearson_manhattan | 0.311 |
502
+ | spearman_manhattan | 0.3366 |
503
+ | pearson_euclidean | 0.2941 |
504
+ | spearman_euclidean | 0.3142 |
505
+ | pearson_dot | 0.2853 |
506
+ | spearman_dot | 0.3138 |
507
+ | pearson_max | 0.311 |
508
+ | spearman_max | 0.3366 |
509
+
510
+ #### Binary Classification
511
+ * Dataset: `allNLI-dev`
512
+ * Evaluated with [<code>BinaryClassificationEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.BinaryClassificationEvaluator)
513
+
514
+ | Metric | Value |
515
+ |:-----------------------------|:-----------|
516
+ | cosine_accuracy | 0.668 |
517
+ | cosine_accuracy_threshold | 0.9767 |
518
+ | cosine_f1 | 0.51 |
519
+ | cosine_f1_threshold | 0.8541 |
520
+ | cosine_precision | 0.3723 |
521
+ | cosine_recall | 0.8092 |
522
+ | cosine_ap | 0.3862 |
523
+ | dot_accuracy | 0.668 |
524
+ | dot_accuracy_threshold | 750.3455 |
525
+ | dot_f1 | 0.51 |
526
+ | dot_f1_threshold | 656.0941 |
527
+ | dot_precision | 0.3723 |
528
+ | dot_recall | 0.8092 |
529
+ | dot_ap | 0.3862 |
530
+ | manhattan_accuracy | 0.6641 |
531
+ | manhattan_accuracy_threshold | 78.5264 |
532
+ | manhattan_f1 | 0.5062 |
533
+ | manhattan_f1_threshold | 285.7745 |
534
+ | manhattan_precision | 0.366 |
535
+ | manhattan_recall | 0.8208 |
536
+ | manhattan_ap | 0.3898 |
537
+ | euclidean_accuracy | 0.668 |
538
+ | euclidean_accuracy_threshold | 5.9772 |
539
+ | euclidean_f1 | 0.51 |
540
+ | euclidean_f1_threshold | 14.9719 |
541
+ | euclidean_precision | 0.3723 |
542
+ | euclidean_recall | 0.8092 |
543
+ | euclidean_ap | 0.3862 |
544
+ | max_accuracy | 0.668 |
545
+ | max_accuracy_threshold | 750.3455 |
546
+ | max_f1 | 0.51 |
547
+ | max_f1_threshold | 656.0941 |
548
+ | max_precision | 0.3723 |
549
+ | max_recall | 0.8208 |
550
+ | **max_ap** | **0.3898** |
551
+
552
+ #### Binary Classification
553
+ * Dataset: `Qnli-dev`
554
+ * Evaluated with [<code>BinaryClassificationEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.BinaryClassificationEvaluator)
555
+
556
+ | Metric | Value |
557
+ |:-----------------------------|:-----------|
558
+ | cosine_accuracy | 0.6289 |
559
+ | cosine_accuracy_threshold | 0.9045 |
560
+ | cosine_f1 | 0.6397 |
561
+ | cosine_f1_threshold | 0.8351 |
562
+ | cosine_precision | 0.517 |
563
+ | cosine_recall | 0.839 |
564
+ | cosine_ap | 0.6194 |
565
+ | dot_accuracy | 0.6289 |
566
+ | dot_accuracy_threshold | 694.7779 |
567
+ | dot_f1 | 0.6397 |
568
+ | dot_f1_threshold | 641.4969 |
569
+ | dot_precision | 0.517 |
570
+ | dot_recall | 0.839 |
571
+ | dot_ap | 0.6194 |
572
+ | manhattan_accuracy | 0.6465 |
573
+ | manhattan_accuracy_threshold | 245.2164 |
574
+ | manhattan_f1 | 0.6521 |
575
+ | manhattan_f1_threshold | 303.3176 |
576
+ | manhattan_precision | 0.516 |
577
+ | manhattan_recall | 0.8856 |
578
+ | manhattan_ap | 0.6417 |
579
+ | euclidean_accuracy | 0.6289 |
580
+ | euclidean_accuracy_threshold | 12.1118 |
581
+ | euclidean_f1 | 0.6397 |
582
+ | euclidean_f1_threshold | 15.9141 |
583
+ | euclidean_precision | 0.517 |
584
+ | euclidean_recall | 0.839 |
585
+ | euclidean_ap | 0.6194 |
586
+ | max_accuracy | 0.6465 |
587
+ | max_accuracy_threshold | 694.7779 |
588
+ | max_f1 | 0.6521 |
589
+ | max_f1_threshold | 641.4969 |
590
+ | max_precision | 0.517 |
591
+ | max_recall | 0.8856 |
592
+ | **max_ap** | **0.6417** |
593
+
594
+ <!--
595
+ ## Bias, Risks and Limitations
596
+
597
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
598
+ -->
599
+
600
+ <!--
601
+ ### Recommendations
602
+
603
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
604
+ -->
605
+
606
+ ## Training Details
607
+
608
+ ### Evaluation Dataset
609
+
610
+ #### vitaminc-pairs
611
+
612
+ * Dataset: [vitaminc-pairs](https://huggingface.co/datasets/tals/vitaminc) at [be6febb](https://huggingface.co/datasets/tals/vitaminc/tree/be6febb761b0b2807687e61e0b5282e459df2fa0)
613
+ * Size: 128 evaluation samples
614
+ * Columns: <code>claim</code> and <code>evidence</code>
615
+ * Approximate statistics based on the first 128 samples:
616
+ | | claim | evidence |
617
+ |:--------|:----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|
618
+ | type | string | string |
619
+ | details | <ul><li>min: 9 tokens</li><li>mean: 21.42 tokens</li><li>max: 41 tokens</li></ul> | <ul><li>min: 11 tokens</li><li>mean: 35.55 tokens</li><li>max: 79 tokens</li></ul> |
620
+ * Samples:
621
+ | claim | evidence |
622
+ |:------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
623
+ | <code>Dragon Con had over 5000 guests .</code> | <code>Among the more than 6000 guests and musical performers at the 2009 convention were such notables as Patrick Stewart , William Shatner , Leonard Nimoy , Terry Gilliam , Bruce Boxleitner , James Marsters , and Mary McDonnell .</code> |
624
+ | <code>COVID-19 has reached more than 185 countries .</code> | <code>As of , more than cases of COVID-19 have been reported in more than 190 countries and 200 territories , resulting in more than deaths .</code> |
625
+ | <code>In March , Italy had 3.6x times more cases of coronavirus than China .</code> | <code>As of 12 March , among nations with at least one million citizens , Italy has the world 's highest per capita rate of positive coronavirus cases at 206.1 cases per million people ( 3.6x times the rate of China ) and is the country with the second-highest number of positive cases as well as of deaths in the world , after China .</code> |
626
+ * Loss: [<code>CachedGISTEmbedLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#cachedgistembedloss) with these parameters:
627
+ ```json
628
+ {'guide': SentenceTransformer(
629
+ (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
630
+ (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
631
+ (2): Normalize()
632
+ ), 'temperature': 0.025}
633
+ ```
634
+
635
+ ### Training Hyperparameters
636
+ #### Non-Default Hyperparameters
637
+
638
+ - `eval_strategy`: steps
639
+ - `per_device_train_batch_size`: 42
640
+ - `per_device_eval_batch_size`: 128
641
+ - `gradient_accumulation_steps`: 2
642
+ - `learning_rate`: 3e-05
643
+ - `weight_decay`: 0.001
644
+ - `lr_scheduler_type`: cosine_with_min_lr
645
+ - `lr_scheduler_kwargs`: {'num_cycles': 0.5, 'min_lr': 1e-05}
646
+ - `warmup_ratio`: 0.25
647
+ - `save_safetensors`: False
648
+ - `fp16`: True
649
+ - `push_to_hub`: True
650
+ - `hub_model_id`: bobox/DeBERTa3-s-CustomPooling-test1-checkpoints-tmp
651
+ - `hub_strategy`: all_checkpoints
652
+ - `batch_sampler`: no_duplicates
653
+
654
+ #### All Hyperparameters
655
+ <details><summary>Click to expand</summary>
656
+
657
+ - `overwrite_output_dir`: False
658
+ - `do_predict`: False
659
+ - `eval_strategy`: steps
660
+ - `prediction_loss_only`: True
661
+ - `per_device_train_batch_size`: 42
662
+ - `per_device_eval_batch_size`: 128
663
+ - `per_gpu_train_batch_size`: None
664
+ - `per_gpu_eval_batch_size`: None
665
+ - `gradient_accumulation_steps`: 2
666
+ - `eval_accumulation_steps`: None
667
+ - `torch_empty_cache_steps`: None
668
+ - `learning_rate`: 3e-05
669
+ - `weight_decay`: 0.001
670
+ - `adam_beta1`: 0.9
671
+ - `adam_beta2`: 0.999
672
+ - `adam_epsilon`: 1e-08
673
+ - `max_grad_norm`: 1.0
674
+ - `num_train_epochs`: 3
675
+ - `max_steps`: -1
676
+ - `lr_scheduler_type`: cosine_with_min_lr
677
+ - `lr_scheduler_kwargs`: {'num_cycles': 0.5, 'min_lr': 1e-05}
678
+ - `warmup_ratio`: 0.25
679
+ - `warmup_steps`: 0
680
+ - `log_level`: passive
681
+ - `log_level_replica`: warning
682
+ - `log_on_each_node`: True
683
+ - `logging_nan_inf_filter`: True
684
+ - `save_safetensors`: False
685
+ - `save_on_each_node`: False
686
+ - `save_only_model`: False
687
+ - `restore_callback_states_from_checkpoint`: False
688
+ - `no_cuda`: False
689
+ - `use_cpu`: False
690
+ - `use_mps_device`: False
691
+ - `seed`: 42
692
+ - `data_seed`: None
693
+ - `jit_mode_eval`: False
694
+ - `use_ipex`: False
695
+ - `bf16`: False
696
+ - `fp16`: True
697
+ - `fp16_opt_level`: O1
698
+ - `half_precision_backend`: auto
699
+ - `bf16_full_eval`: False
700
+ - `fp16_full_eval`: False
701
+ - `tf32`: None
702
+ - `local_rank`: 0
703
+ - `ddp_backend`: None
704
+ - `tpu_num_cores`: None
705
+ - `tpu_metrics_debug`: False
706
+ - `debug`: []
707
+ - `dataloader_drop_last`: False
708
+ - `dataloader_num_workers`: 0
709
+ - `dataloader_prefetch_factor`: None
710
+ - `past_index`: -1
711
+ - `disable_tqdm`: False
712
+ - `remove_unused_columns`: True
713
+ - `label_names`: None
714
+ - `load_best_model_at_end`: False
715
+ - `ignore_data_skip`: False
716
+ - `fsdp`: []
717
+ - `fsdp_min_num_params`: 0
718
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
719
+ - `fsdp_transformer_layer_cls_to_wrap`: None
720
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
721
+ - `deepspeed`: None
722
+ - `label_smoothing_factor`: 0.0
723
+ - `optim`: adamw_torch
724
+ - `optim_args`: None
725
+ - `adafactor`: False
726
+ - `group_by_length`: False
727
+ - `length_column_name`: length
728
+ - `ddp_find_unused_parameters`: None
729
+ - `ddp_bucket_cap_mb`: None
730
+ - `ddp_broadcast_buffers`: False
731
+ - `dataloader_pin_memory`: True
732
+ - `dataloader_persistent_workers`: False
733
+ - `skip_memory_metrics`: True
734
+ - `use_legacy_prediction_loop`: False
735
+ - `push_to_hub`: True
736
+ - `resume_from_checkpoint`: None
737
+ - `hub_model_id`: bobox/DeBERTa3-s-CustomPooling-test1-checkpoints-tmp
738
+ - `hub_strategy`: all_checkpoints
739
+ - `hub_private_repo`: False
740
+ - `hub_always_push`: False
741
+ - `gradient_checkpointing`: False
742
+ - `gradient_checkpointing_kwargs`: None
743
+ - `include_inputs_for_metrics`: False
744
+ - `eval_do_concat_batches`: True
745
+ - `fp16_backend`: auto
746
+ - `push_to_hub_model_id`: None
747
+ - `push_to_hub_organization`: None
748
+ - `mp_parameters`:
749
+ - `auto_find_batch_size`: False
750
+ - `full_determinism`: False
751
+ - `torchdynamo`: None
752
+ - `ray_scope`: last
753
+ - `ddp_timeout`: 1800
754
+ - `torch_compile`: False
755
+ - `torch_compile_backend`: None
756
+ - `torch_compile_mode`: None
757
+ - `dispatch_batches`: None
758
+ - `split_batches`: None
759
+ - `include_tokens_per_second`: False
760
+ - `include_num_input_tokens_seen`: False
761
+ - `neftune_noise_alpha`: None
762
+ - `optim_target_modules`: None
763
+ - `batch_eval_metrics`: False
764
+ - `eval_on_start`: False
765
+ - `use_liger_kernel`: False
766
+ - `eval_use_gather_object`: False
767
+ - `batch_sampler`: no_duplicates
768
+ - `multi_dataset_batch_sampler`: proportional
769
+
770
+ </details>
771
+
772
+ ### Training Logs
773
+ <details><summary>Click to expand</summary>
774
+
775
+ | Epoch | Step | Training Loss | vitaminc-pairs loss | negation-triplets loss | scitail-pairs-pos loss | scitail-pairs-qa loss | xsum-pairs loss | sciq pairs loss | qasc pairs loss | openbookqa pairs loss | msmarco pairs loss | nq pairs loss | trivia pairs loss | gooaq pairs loss | paws-pos loss | global dataset loss | sts-test_spearman_cosine | allNLI-dev_max_ap | Qnli-dev_max_ap |
776
+ |:------:|:----:|:-------------:|:-------------------:|:----------------------:|:----------------------:|:---------------------:|:---------------:|:---------------:|:---------------:|:---------------------:|:------------------:|:-------------:|:-----------------:|:----------------:|:-------------:|:-------------------:|:------------------------:|:-----------------:|:---------------:|
777
+ | 0.0009 | 1 | 5.8564 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
778
+ | 0.0018 | 2 | 7.1716 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
779
+ | 0.0027 | 3 | 5.9095 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
780
+ | 0.0035 | 4 | 5.0841 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
781
+ | 0.0044 | 5 | 4.0184 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
782
+ | 0.0053 | 6 | 6.2191 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
783
+ | 0.0062 | 7 | 5.6124 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
784
+ | 0.0071 | 8 | 3.9544 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
785
+ | 0.0080 | 9 | 4.7149 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
786
+ | 0.0088 | 10 | 4.9616 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
787
+ | 0.0097 | 11 | 5.2794 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
788
+ | 0.0106 | 12 | 8.8704 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
789
+ | 0.0115 | 13 | 6.0707 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
790
+ | 0.0124 | 14 | 5.4071 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
791
+ | 0.0133 | 15 | 6.9104 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
792
+ | 0.0142 | 16 | 6.0276 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
793
+ | 0.0150 | 17 | 6.737 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
794
+ | 0.0159 | 18 | 6.5354 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
795
+ | 0.0168 | 19 | 5.206 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
796
+ | 0.0177 | 20 | 5.2469 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
797
+ | 0.0186 | 21 | 5.3771 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
798
+ | 0.0195 | 22 | 4.979 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
799
+ | 0.0204 | 23 | 4.7909 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
800
+ | 0.0212 | 24 | 4.9086 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
801
+ | 0.0221 | 25 | 4.8826 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
802
+ | 0.0230 | 26 | 8.2266 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
803
+ | 0.0239 | 27 | 8.3024 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
804
+ | 0.0248 | 28 | 5.8745 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
805
+ | 0.0257 | 29 | 4.7298 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
806
+ | 0.0265 | 30 | 5.4614 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
807
+ | 0.0274 | 31 | 5.8594 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
808
+ | 0.0283 | 32 | 5.2401 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
809
+ | 0.0292 | 33 | 5.1579 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
810
+ | 0.0301 | 34 | 5.2181 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
811
+ | 0.0310 | 35 | 4.6328 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
812
+ | 0.0319 | 36 | 2.121 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
813
+ | 0.0327 | 37 | 5.9026 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
814
+ | 0.0336 | 38 | 7.3796 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
815
+ | 0.0345 | 39 | 5.5361 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
816
+ | 0.0354 | 40 | 4.0243 | 2.9018 | 5.6903 | 2.1136 | 2.8052 | 6.5831 | 0.8882 | 4.1148 | 5.0966 | 10.3911 | 10.9032 | 7.1904 | 8.1935 | 1.3943 | 5.6716 | 0.1879 | 0.3385 | 0.5781 |
817
+ | 0.0363 | 41 | 4.9072 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
818
+ | 0.0372 | 42 | 3.4439 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
819
+ | 0.0381 | 43 | 4.9787 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
820
+ | 0.0389 | 44 | 5.8318 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
821
+ | 0.0398 | 45 | 5.3226 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
822
+ | 0.0407 | 46 | 5.1181 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
823
+ | 0.0416 | 47 | 4.7834 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
824
+ | 0.0425 | 48 | 6.6303 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
825
+ | 0.0434 | 49 | 5.8171 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
826
+ | 0.0442 | 50 | 5.1962 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
827
+ | 0.0451 | 51 | 5.2096 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
828
+ | 0.0460 | 52 | 5.0943 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
829
+ | 0.0469 | 53 | 4.9038 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
830
+ | 0.0478 | 54 | 4.6479 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
831
+ | 0.0487 | 55 | 5.5098 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
832
+ | 0.0496 | 56 | 4.6979 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
833
+ | 0.0504 | 57 | 3.1969 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
834
+ | 0.0513 | 58 | 4.4127 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
835
+ | 0.0522 | 59 | 3.7746 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
836
+ | 0.0531 | 60 | 4.5378 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
837
+ | 0.0540 | 61 | 5.0209 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
838
+ | 0.0549 | 62 | 6.5936 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
839
+ | 0.0558 | 63 | 4.2315 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
840
+ | 0.0566 | 64 | 6.4269 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
841
+ | 0.0575 | 65 | 4.2644 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
842
+ | 0.0584 | 66 | 5.1388 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
843
+ | 0.0593 | 67 | 5.1852 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
844
+ | 0.0602 | 68 | 4.8057 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
845
+ | 0.0611 | 69 | 3.1725 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
846
+ | 0.0619 | 70 | 3.3322 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
847
+ | 0.0628 | 71 | 5.139 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
848
+ | 0.0637 | 72 | 4.307 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
849
+ | 0.0646 | 73 | 5.0133 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
850
+ | 0.0655 | 74 | 4.0507 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
851
+ | 0.0664 | 75 | 3.3895 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
852
+ | 0.0673 | 76 | 5.6736 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
853
+ | 0.0681 | 77 | 4.2572 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
854
+ | 0.0690 | 78 | 3.0796 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
855
+ | 0.0699 | 79 | 5.0199 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
856
+ | 0.0708 | 80 | 4.1414 | 2.7794 | 4.8890 | 1.8997 | 2.6761 | 6.2096 | 0.7622 | 3.3129 | 4.5498 | 7.2056 | 7.6809 | 6.3792 | 6.6567 | 1.3848 | 5.0030 | 0.2480 | 0.3513 | 0.5898 |
857
+ | 0.0717 | 81 | 5.8604 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
858
+ | 0.0726 | 82 | 4.3003 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
859
+ | 0.0735 | 83 | 4.4568 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
860
+ | 0.0743 | 84 | 4.2747 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
861
+ | 0.0752 | 85 | 5.52 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
862
+ | 0.0761 | 86 | 2.7767 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
863
+ | 0.0770 | 87 | 4.397 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
864
+ | 0.0779 | 88 | 5.4449 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
865
+ | 0.0788 | 89 | 4.2706 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
866
+ | 0.0796 | 90 | 6.4759 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
867
+ | 0.0805 | 91 | 4.1951 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
868
+ | 0.0814 | 92 | 4.6328 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
869
+ | 0.0823 | 93 | 4.1278 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
870
+ | 0.0832 | 94 | 4.1787 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
871
+ | 0.0841 | 95 | 5.2156 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
872
+ | 0.0850 | 96 | 3.1403 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
873
+ | 0.0858 | 97 | 4.0273 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
874
+ | 0.0867 | 98 | 3.0624 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
875
+ | 0.0876 | 99 | 4.6786 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
876
+ | 0.0885 | 100 | 4.1505 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
877
+ | 0.0894 | 101 | 2.9529 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
878
+ | 0.0903 | 102 | 4.7048 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
879
+ | 0.0912 | 103 | 4.7388 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
880
+ | 0.0920 | 104 | 3.7879 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
881
+ | 0.0929 | 105 | 4.0311 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
882
+ | 0.0938 | 106 | 4.1314 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
883
+ | 0.0947 | 107 | 4.9411 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
884
+ | 0.0956 | 108 | 4.1118 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
885
+ | 0.0965 | 109 | 3.6971 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
886
+ | 0.0973 | 110 | 5.605 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
887
+ | 0.0982 | 111 | 3.4563 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
888
+ | 0.0991 | 112 | 3.7422 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
889
+ | 0.1 | 113 | 3.8055 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
890
+ | 0.1009 | 114 | 5.2369 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
891
+ | 0.1018 | 115 | 5.6518 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
892
+ | 0.1027 | 116 | 3.2906 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
893
+ | 0.1035 | 117 | 3.4996 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
894
+ | 0.1044 | 118 | 3.6283 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
895
+ | 0.1053 | 119 | 4.1487 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
896
+ | 0.1062 | 120 | 4.3996 | 2.7279 | 4.3946 | 1.4130 | 2.1150 | 6.0486 | 0.7172 | 2.9669 | 4.4180 | 6.3022 | 6.8412 | 6.2013 | 6.0982 | 0.9474 | 4.3852 | 0.3149 | 0.3693 | 0.5975 |
897
+ | 0.1071 | 121 | 3.5291 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
898
+ | 0.1080 | 122 | 3.8232 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
899
+ | 0.1088 | 123 | 4.6035 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
900
+ | 0.1097 | 124 | 3.7607 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
901
+ | 0.1106 | 125 | 3.8461 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
902
+ | 0.1115 | 126 | 3.3413 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
903
+ | 0.1124 | 127 | 4.2777 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
904
+ | 0.1133 | 128 | 4.3597 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
905
+ | 0.1142 | 129 | 3.9046 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
906
+ | 0.1150 | 130 | 4.0527 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
907
+ | 0.1159 | 131 | 5.0883 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
908
+ | 0.1168 | 132 | 3.8308 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
909
+ | 0.1177 | 133 | 3.572 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
910
+ | 0.1186 | 134 | 3.4299 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
911
+ | 0.1195 | 135 | 4.1541 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
912
+ | 0.1204 | 136 | 3.584 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
913
+ | 0.1212 | 137 | 5.0977 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
914
+ | 0.1221 | 138 | 4.6769 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
915
+ | 0.1230 | 139 | 3.8396 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
916
+ | 0.1239 | 140 | 3.2875 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
917
+ | 0.1248 | 141 | 4.1946 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
918
+ | 0.1257 | 142 | 4.9602 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
919
+ | 0.1265 | 143 | 4.1531 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
920
+ | 0.1274 | 144 | 3.8351 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
921
+ | 0.1283 | 145 | 3.112 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
922
+ | 0.1292 | 146 | 2.3145 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
923
+ | 0.1301 | 147 | 4.0989 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
924
+ | 0.1310 | 148 | 3.2173 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
925
+ | 0.1319 | 149 | 2.7913 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
926
+ | 0.1327 | 150 | 3.7627 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
927
+ | 0.1336 | 151 | 3.3669 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
928
+ | 0.1345 | 152 | 2.6775 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
929
+ | 0.1354 | 153 | 3.2804 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
930
+ | 0.1363 | 154 | 3.0676 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
931
+ | 0.1372 | 155 | 3.1559 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
932
+ | 0.1381 | 156 | 2.6638 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
933
+ | 0.1389 | 157 | 2.8045 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
934
+ | 0.1398 | 158 | 4.0568 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
935
+ | 0.1407 | 159 | 2.7554 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
936
+ | 0.1416 | 160 | 3.7407 | 2.7439 | 4.6364 | 1.0089 | 1.1229 | 5.4870 | 0.6284 | 2.5933 | 4.3943 | 5.6565 | 5.9870 | 5.6944 | 5.3857 | 0.3622 | 3.4011 | 0.3141 | 0.3898 | 0.6417 |
937
+
938
+ </details>
939
+
940
+ ### Framework Versions
941
+ - Python: 3.10.14
942
+ - Sentence Transformers: 3.2.0
943
+ - Transformers: 4.45.1
944
+ - PyTorch: 2.4.0
945
+ - Accelerate: 0.34.2
946
+ - Datasets: 3.0.1
947
+ - Tokenizers: 0.20.0
948
+
949
+ ## Citation
950
+
951
+ ### BibTeX
952
+
953
+ #### Sentence Transformers
954
+ ```bibtex
955
+ @inproceedings{reimers-2019-sentence-bert,
956
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
957
+ author = "Reimers, Nils and Gurevych, Iryna",
958
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
959
+ month = "11",
960
+ year = "2019",
961
+ publisher = "Association for Computational Linguistics",
962
+ url = "https://arxiv.org/abs/1908.10084",
963
+ }
964
+ ```
965
+
966
+ <!--
967
+ ## Glossary
968
+
969
+ *Clearly define terms in order to be accessible across audiences.*
970
+ -->
971
+
972
+ <!--
973
+ ## Model Card Authors
974
+
975
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
976
+ -->
977
+
978
+ <!--
979
+ ## Model Card Contact
980
+
981
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
982
+ -->
checkpoint-160/added_tokens.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ {
2
+ "[MASK]": 128000
3
+ }
checkpoint-160/config.json ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "microsoft/deberta-v3-small",
3
+ "architectures": [
4
+ "DebertaV2Model"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "hidden_act": "gelu",
8
+ "hidden_dropout_prob": 0.1,
9
+ "hidden_size": 768,
10
+ "initializer_range": 0.02,
11
+ "intermediate_size": 3072,
12
+ "layer_norm_eps": 1e-07,
13
+ "max_position_embeddings": 512,
14
+ "max_relative_positions": -1,
15
+ "model_type": "deberta-v2",
16
+ "norm_rel_ebd": "layer_norm",
17
+ "num_attention_heads": 12,
18
+ "num_hidden_layers": 6,
19
+ "pad_token_id": 0,
20
+ "pooler_dropout": 0,
21
+ "pooler_hidden_act": "gelu",
22
+ "pooler_hidden_size": 768,
23
+ "pos_att_type": [
24
+ "p2c",
25
+ "c2p"
26
+ ],
27
+ "position_biased_input": false,
28
+ "position_buckets": 256,
29
+ "relative_attention": true,
30
+ "share_att_key": true,
31
+ "torch_dtype": "float32",
32
+ "transformers_version": "4.45.1",
33
+ "type_vocab_size": 0,
34
+ "vocab_size": 128100
35
+ }
checkpoint-160/config_sentence_transformers.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "3.2.0",
4
+ "transformers": "4.45.1",
5
+ "pytorch": "2.4.0"
6
+ },
7
+ "prompts": {},
8
+ "default_prompt_name": null,
9
+ "similarity_fn_name": null
10
+ }
checkpoint-160/modules.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_AdvancedWeightedPooling",
12
+ "type": "__main__.AdvancedWeightedPooling"
13
+ }
14
+ ]
checkpoint-160/optimizer.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6de7b746bd37c44759028006cdb5d4344a78cf549e02e4acea8ca52a11481e78
3
+ size 245742074
checkpoint-160/pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ed859980e5cced35b2234dcf194946504c27cfaeb47e0cbf5f81c0662360314a
3
+ size 565251810
checkpoint-160/rng_state.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9448eb96e5a9646b615ca3360c7d4779b2a35a153e319a11ccf79765aec03eae
3
+ size 14244
checkpoint-160/scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b9ac89dcef9273abeb7c5103e1582de275a0f4be15a81186dbf4b5ad33f4c2a4
3
+ size 1192
checkpoint-160/sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 512,
3
+ "do_lower_case": false
4
+ }
checkpoint-160/special_tokens_map.json ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": "[CLS]",
3
+ "cls_token": "[CLS]",
4
+ "eos_token": "[SEP]",
5
+ "mask_token": "[MASK]",
6
+ "pad_token": "[PAD]",
7
+ "sep_token": "[SEP]",
8
+ "unk_token": {
9
+ "content": "[UNK]",
10
+ "lstrip": false,
11
+ "normalized": true,
12
+ "rstrip": false,
13
+ "single_word": false
14
+ }
15
+ }
checkpoint-160/spm.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c679fbf93643d19aab7ee10c0b99e460bdbc02fedf34b92b05af343b4af586fd
3
+ size 2464616
checkpoint-160/tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
checkpoint-160/tokenizer_config.json ADDED
@@ -0,0 +1,58 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "1": {
12
+ "content": "[CLS]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "2": {
20
+ "content": "[SEP]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "3": {
28
+ "content": "[UNK]",
29
+ "lstrip": false,
30
+ "normalized": true,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "128000": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "bos_token": "[CLS]",
45
+ "clean_up_tokenization_spaces": false,
46
+ "cls_token": "[CLS]",
47
+ "do_lower_case": false,
48
+ "eos_token": "[SEP]",
49
+ "mask_token": "[MASK]",
50
+ "model_max_length": 1000000000000000019884624838656,
51
+ "pad_token": "[PAD]",
52
+ "sep_token": "[SEP]",
53
+ "sp_model_kwargs": {},
54
+ "split_by_punct": false,
55
+ "tokenizer_class": "DebertaV2Tokenizer",
56
+ "unk_token": "[UNK]",
57
+ "vocab_type": "spm"
58
+ }
checkpoint-160/trainer_state.json ADDED
@@ -0,0 +1,1925 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": null,
3
+ "best_model_checkpoint": null,
4
+ "epoch": 0.1415929203539823,
5
+ "eval_steps": 40,
6
+ "global_step": 160,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 0.0008849557522123894,
13
+ "grad_norm": NaN,
14
+ "learning_rate": 0.0,
15
+ "loss": 5.8564,
16
+ "step": 1
17
+ },
18
+ {
19
+ "epoch": 0.0017699115044247787,
20
+ "grad_norm": NaN,
21
+ "learning_rate": 0.0,
22
+ "loss": 7.1716,
23
+ "step": 2
24
+ },
25
+ {
26
+ "epoch": 0.002654867256637168,
27
+ "grad_norm": NaN,
28
+ "learning_rate": 0.0,
29
+ "loss": 5.9095,
30
+ "step": 3
31
+ },
32
+ {
33
+ "epoch": 0.0035398230088495575,
34
+ "grad_norm": 21.95326805114746,
35
+ "learning_rate": 3.5377358490566036e-09,
36
+ "loss": 5.0841,
37
+ "step": 4
38
+ },
39
+ {
40
+ "epoch": 0.004424778761061947,
41
+ "grad_norm": 16.607179641723633,
42
+ "learning_rate": 7.075471698113207e-09,
43
+ "loss": 4.0184,
44
+ "step": 5
45
+ },
46
+ {
47
+ "epoch": 0.005309734513274336,
48
+ "grad_norm": 33.789615631103516,
49
+ "learning_rate": 1.0613207547169811e-08,
50
+ "loss": 6.2191,
51
+ "step": 6
52
+ },
53
+ {
54
+ "epoch": 0.006194690265486726,
55
+ "grad_norm": 28.073551177978516,
56
+ "learning_rate": 1.4150943396226414e-08,
57
+ "loss": 5.6124,
58
+ "step": 7
59
+ },
60
+ {
61
+ "epoch": 0.007079646017699115,
62
+ "grad_norm": 17.365602493286133,
63
+ "learning_rate": 1.768867924528302e-08,
64
+ "loss": 3.9544,
65
+ "step": 8
66
+ },
67
+ {
68
+ "epoch": 0.007964601769911504,
69
+ "grad_norm": 19.384475708007812,
70
+ "learning_rate": 2.1226415094339622e-08,
71
+ "loss": 4.7149,
72
+ "step": 9
73
+ },
74
+ {
75
+ "epoch": 0.008849557522123894,
76
+ "grad_norm": 19.67770004272461,
77
+ "learning_rate": 2.4764150943396227e-08,
78
+ "loss": 4.9616,
79
+ "step": 10
80
+ },
81
+ {
82
+ "epoch": 0.009734513274336283,
83
+ "grad_norm": 24.233421325683594,
84
+ "learning_rate": 2.830188679245283e-08,
85
+ "loss": 5.2794,
86
+ "step": 11
87
+ },
88
+ {
89
+ "epoch": 0.010619469026548672,
90
+ "grad_norm": Infinity,
91
+ "learning_rate": 2.830188679245283e-08,
92
+ "loss": 8.8704,
93
+ "step": 12
94
+ },
95
+ {
96
+ "epoch": 0.011504424778761062,
97
+ "grad_norm": 34.37785720825195,
98
+ "learning_rate": 3.183962264150943e-08,
99
+ "loss": 6.0707,
100
+ "step": 13
101
+ },
102
+ {
103
+ "epoch": 0.012389380530973451,
104
+ "grad_norm": 25.11741065979004,
105
+ "learning_rate": 3.537735849056604e-08,
106
+ "loss": 5.4071,
107
+ "step": 14
108
+ },
109
+ {
110
+ "epoch": 0.01327433628318584,
111
+ "grad_norm": 53.84364700317383,
112
+ "learning_rate": 3.891509433962264e-08,
113
+ "loss": 6.9104,
114
+ "step": 15
115
+ },
116
+ {
117
+ "epoch": 0.01415929203539823,
118
+ "grad_norm": 32.0903434753418,
119
+ "learning_rate": 4.2452830188679244e-08,
120
+ "loss": 6.0276,
121
+ "step": 16
122
+ },
123
+ {
124
+ "epoch": 0.01504424778761062,
125
+ "grad_norm": 39.742130279541016,
126
+ "learning_rate": 4.599056603773585e-08,
127
+ "loss": 6.737,
128
+ "step": 17
129
+ },
130
+ {
131
+ "epoch": 0.01592920353982301,
132
+ "grad_norm": 45.267417907714844,
133
+ "learning_rate": 4.9528301886792454e-08,
134
+ "loss": 6.5354,
135
+ "step": 18
136
+ },
137
+ {
138
+ "epoch": 0.016814159292035398,
139
+ "grad_norm": 22.39731788635254,
140
+ "learning_rate": 5.3066037735849055e-08,
141
+ "loss": 5.206,
142
+ "step": 19
143
+ },
144
+ {
145
+ "epoch": 0.017699115044247787,
146
+ "grad_norm": 20.858232498168945,
147
+ "learning_rate": 5.660377358490566e-08,
148
+ "loss": 5.2469,
149
+ "step": 20
150
+ },
151
+ {
152
+ "epoch": 0.018584070796460177,
153
+ "grad_norm": 23.96446990966797,
154
+ "learning_rate": 6.014150943396226e-08,
155
+ "loss": 5.3771,
156
+ "step": 21
157
+ },
158
+ {
159
+ "epoch": 0.019469026548672566,
160
+ "grad_norm": 22.945741653442383,
161
+ "learning_rate": 6.367924528301887e-08,
162
+ "loss": 4.979,
163
+ "step": 22
164
+ },
165
+ {
166
+ "epoch": 0.020353982300884955,
167
+ "grad_norm": 15.497300148010254,
168
+ "learning_rate": 6.721698113207547e-08,
169
+ "loss": 4.7909,
170
+ "step": 23
171
+ },
172
+ {
173
+ "epoch": 0.021238938053097345,
174
+ "grad_norm": 20.039024353027344,
175
+ "learning_rate": 7.075471698113208e-08,
176
+ "loss": 4.9086,
177
+ "step": 24
178
+ },
179
+ {
180
+ "epoch": 0.022123893805309734,
181
+ "grad_norm": 21.30576515197754,
182
+ "learning_rate": 7.429245283018869e-08,
183
+ "loss": 4.8826,
184
+ "step": 25
185
+ },
186
+ {
187
+ "epoch": 0.023008849557522124,
188
+ "grad_norm": 64.5285873413086,
189
+ "learning_rate": 7.783018867924529e-08,
190
+ "loss": 8.2266,
191
+ "step": 26
192
+ },
193
+ {
194
+ "epoch": 0.023893805309734513,
195
+ "grad_norm": 59.894893646240234,
196
+ "learning_rate": 8.13679245283019e-08,
197
+ "loss": 8.3024,
198
+ "step": 27
199
+ },
200
+ {
201
+ "epoch": 0.024778761061946902,
202
+ "grad_norm": 25.504356384277344,
203
+ "learning_rate": 8.490566037735849e-08,
204
+ "loss": 5.8745,
205
+ "step": 28
206
+ },
207
+ {
208
+ "epoch": 0.02566371681415929,
209
+ "grad_norm": 15.169568061828613,
210
+ "learning_rate": 8.84433962264151e-08,
211
+ "loss": 4.7298,
212
+ "step": 29
213
+ },
214
+ {
215
+ "epoch": 0.02654867256637168,
216
+ "grad_norm": 24.09995460510254,
217
+ "learning_rate": 9.19811320754717e-08,
218
+ "loss": 5.4614,
219
+ "step": 30
220
+ },
221
+ {
222
+ "epoch": 0.02743362831858407,
223
+ "grad_norm": 28.669275283813477,
224
+ "learning_rate": 9.55188679245283e-08,
225
+ "loss": 5.8594,
226
+ "step": 31
227
+ },
228
+ {
229
+ "epoch": 0.02831858407079646,
230
+ "grad_norm": 23.37987518310547,
231
+ "learning_rate": 9.905660377358491e-08,
232
+ "loss": 5.2401,
233
+ "step": 32
234
+ },
235
+ {
236
+ "epoch": 0.02920353982300885,
237
+ "grad_norm": 22.815292358398438,
238
+ "learning_rate": 1.0259433962264152e-07,
239
+ "loss": 5.1579,
240
+ "step": 33
241
+ },
242
+ {
243
+ "epoch": 0.03008849557522124,
244
+ "grad_norm": 13.775344848632812,
245
+ "learning_rate": 1.0613207547169811e-07,
246
+ "loss": 5.2181,
247
+ "step": 34
248
+ },
249
+ {
250
+ "epoch": 0.030973451327433628,
251
+ "grad_norm": 18.642087936401367,
252
+ "learning_rate": 1.0966981132075472e-07,
253
+ "loss": 4.6328,
254
+ "step": 35
255
+ },
256
+ {
257
+ "epoch": 0.03185840707964602,
258
+ "grad_norm": 18.041406631469727,
259
+ "learning_rate": 1.1320754716981131e-07,
260
+ "loss": 2.121,
261
+ "step": 36
262
+ },
263
+ {
264
+ "epoch": 0.03274336283185841,
265
+ "grad_norm": 23.423933029174805,
266
+ "learning_rate": 1.1674528301886792e-07,
267
+ "loss": 5.9026,
268
+ "step": 37
269
+ },
270
+ {
271
+ "epoch": 0.033628318584070796,
272
+ "grad_norm": 46.25591278076172,
273
+ "learning_rate": 1.2028301886792452e-07,
274
+ "loss": 7.3796,
275
+ "step": 38
276
+ },
277
+ {
278
+ "epoch": 0.034513274336283185,
279
+ "grad_norm": 20.376422882080078,
280
+ "learning_rate": 1.2382075471698114e-07,
281
+ "loss": 5.5361,
282
+ "step": 39
283
+ },
284
+ {
285
+ "epoch": 0.035398230088495575,
286
+ "grad_norm": 12.82562255859375,
287
+ "learning_rate": 1.2735849056603773e-07,
288
+ "loss": 4.0243,
289
+ "step": 40
290
+ },
291
+ {
292
+ "epoch": 0.035398230088495575,
293
+ "eval_Qnli-dev_cosine_accuracy": 0.5859375,
294
+ "eval_Qnli-dev_cosine_accuracy_threshold": 0.9302856922149658,
295
+ "eval_Qnli-dev_cosine_ap": 0.5480269179285036,
296
+ "eval_Qnli-dev_cosine_f1": 0.6315789473684211,
297
+ "eval_Qnli-dev_cosine_f1_threshold": 0.7634451389312744,
298
+ "eval_Qnli-dev_cosine_precision": 0.4633663366336634,
299
+ "eval_Qnli-dev_cosine_recall": 0.9915254237288136,
300
+ "eval_Qnli-dev_dot_accuracy": 0.5859375,
301
+ "eval_Qnli-dev_dot_accuracy_threshold": 714.4895629882812,
302
+ "eval_Qnli-dev_dot_ap": 0.548060663242546,
303
+ "eval_Qnli-dev_dot_f1": 0.6315789473684211,
304
+ "eval_Qnli-dev_dot_f1_threshold": 586.342529296875,
305
+ "eval_Qnli-dev_dot_precision": 0.4633663366336634,
306
+ "eval_Qnli-dev_dot_recall": 0.9915254237288136,
307
+ "eval_Qnli-dev_euclidean_accuracy": 0.5859375,
308
+ "eval_Qnli-dev_euclidean_accuracy_threshold": 10.348224639892578,
309
+ "eval_Qnli-dev_euclidean_ap": 0.5480269179285036,
310
+ "eval_Qnli-dev_euclidean_f1": 0.6315789473684211,
311
+ "eval_Qnli-dev_euclidean_f1_threshold": 19.05518341064453,
312
+ "eval_Qnli-dev_euclidean_precision": 0.4633663366336634,
313
+ "eval_Qnli-dev_euclidean_recall": 0.9915254237288136,
314
+ "eval_Qnli-dev_manhattan_accuracy": 0.59765625,
315
+ "eval_Qnli-dev_manhattan_accuracy_threshold": 175.22628784179688,
316
+ "eval_Qnli-dev_manhattan_ap": 0.5780924813828909,
317
+ "eval_Qnli-dev_manhattan_f1": 0.6291834002677376,
318
+ "eval_Qnli-dev_manhattan_f1_threshold": 334.39178466796875,
319
+ "eval_Qnli-dev_manhattan_precision": 0.4598825831702544,
320
+ "eval_Qnli-dev_manhattan_recall": 0.9957627118644068,
321
+ "eval_Qnli-dev_max_accuracy": 0.59765625,
322
+ "eval_Qnli-dev_max_accuracy_threshold": 714.4895629882812,
323
+ "eval_Qnli-dev_max_ap": 0.5780924813828909,
324
+ "eval_Qnli-dev_max_f1": 0.6315789473684211,
325
+ "eval_Qnli-dev_max_f1_threshold": 586.342529296875,
326
+ "eval_Qnli-dev_max_precision": 0.4633663366336634,
327
+ "eval_Qnli-dev_max_recall": 0.9957627118644068,
328
+ "eval_allNLI-dev_cosine_accuracy": 0.6640625,
329
+ "eval_allNLI-dev_cosine_accuracy_threshold": 0.9888672828674316,
330
+ "eval_allNLI-dev_cosine_ap": 0.32886365768247516,
331
+ "eval_allNLI-dev_cosine_f1": 0.5095729013254787,
332
+ "eval_allNLI-dev_cosine_f1_threshold": 0.7477295398712158,
333
+ "eval_allNLI-dev_cosine_precision": 0.34189723320158105,
334
+ "eval_allNLI-dev_cosine_recall": 1.0,
335
+ "eval_allNLI-dev_dot_accuracy": 0.6640625,
336
+ "eval_allNLI-dev_dot_accuracy_threshold": 759.483154296875,
337
+ "eval_allNLI-dev_dot_ap": 0.3288581611938815,
338
+ "eval_allNLI-dev_dot_f1": 0.5095729013254787,
339
+ "eval_allNLI-dev_dot_f1_threshold": 574.2760620117188,
340
+ "eval_allNLI-dev_dot_precision": 0.34189723320158105,
341
+ "eval_allNLI-dev_dot_recall": 1.0,
342
+ "eval_allNLI-dev_euclidean_accuracy": 0.6640625,
343
+ "eval_allNLI-dev_euclidean_accuracy_threshold": 3.8085508346557617,
344
+ "eval_allNLI-dev_euclidean_ap": 0.32886365768247516,
345
+ "eval_allNLI-dev_euclidean_f1": 0.5095729013254787,
346
+ "eval_allNLI-dev_euclidean_f1_threshold": 19.684810638427734,
347
+ "eval_allNLI-dev_euclidean_precision": 0.34189723320158105,
348
+ "eval_allNLI-dev_euclidean_recall": 1.0,
349
+ "eval_allNLI-dev_manhattan_accuracy": 0.6640625,
350
+ "eval_allNLI-dev_manhattan_accuracy_threshold": 65.93238830566406,
351
+ "eval_allNLI-dev_manhattan_ap": 0.33852594919898543,
352
+ "eval_allNLI-dev_manhattan_f1": 0.5058479532163743,
353
+ "eval_allNLI-dev_manhattan_f1_threshold": 335.4263916015625,
354
+ "eval_allNLI-dev_manhattan_precision": 0.3385518590998043,
355
+ "eval_allNLI-dev_manhattan_recall": 1.0,
356
+ "eval_allNLI-dev_max_accuracy": 0.6640625,
357
+ "eval_allNLI-dev_max_accuracy_threshold": 759.483154296875,
358
+ "eval_allNLI-dev_max_ap": 0.33852594919898543,
359
+ "eval_allNLI-dev_max_f1": 0.5095729013254787,
360
+ "eval_allNLI-dev_max_f1_threshold": 574.2760620117188,
361
+ "eval_allNLI-dev_max_precision": 0.34189723320158105,
362
+ "eval_allNLI-dev_max_recall": 1.0,
363
+ "eval_sequential_score": 0.5780924813828909,
364
+ "eval_sts-test_pearson_cosine": 0.1533465318414369,
365
+ "eval_sts-test_pearson_dot": 0.15333057450060855,
366
+ "eval_sts-test_pearson_euclidean": 0.1664717893342273,
367
+ "eval_sts-test_pearson_manhattan": 0.20717970064899288,
368
+ "eval_sts-test_pearson_max": 0.20717970064899288,
369
+ "eval_sts-test_spearman_cosine": 0.18786210334203038,
370
+ "eval_sts-test_spearman_dot": 0.1878347337472397,
371
+ "eval_sts-test_spearman_euclidean": 0.18786046572196458,
372
+ "eval_sts-test_spearman_manhattan": 0.22429466463153608,
373
+ "eval_sts-test_spearman_max": 0.22429466463153608,
374
+ "eval_vitaminc-pairs_loss": 2.901831865310669,
375
+ "eval_vitaminc-pairs_runtime": 4.078,
376
+ "eval_vitaminc-pairs_samples_per_second": 31.388,
377
+ "eval_vitaminc-pairs_steps_per_second": 0.245,
378
+ "step": 40
379
+ },
380
+ {
381
+ "epoch": 0.035398230088495575,
382
+ "eval_negation-triplets_loss": 5.690315246582031,
383
+ "eval_negation-triplets_runtime": 0.7141,
384
+ "eval_negation-triplets_samples_per_second": 179.254,
385
+ "eval_negation-triplets_steps_per_second": 1.4,
386
+ "step": 40
387
+ },
388
+ {
389
+ "epoch": 0.035398230088495575,
390
+ "eval_scitail-pairs-pos_loss": 2.1135852336883545,
391
+ "eval_scitail-pairs-pos_runtime": 0.8282,
392
+ "eval_scitail-pairs-pos_samples_per_second": 154.543,
393
+ "eval_scitail-pairs-pos_steps_per_second": 1.207,
394
+ "step": 40
395
+ },
396
+ {
397
+ "epoch": 0.035398230088495575,
398
+ "eval_scitail-pairs-qa_loss": 2.8052029609680176,
399
+ "eval_scitail-pairs-qa_runtime": 0.5471,
400
+ "eval_scitail-pairs-qa_samples_per_second": 233.943,
401
+ "eval_scitail-pairs-qa_steps_per_second": 1.828,
402
+ "step": 40
403
+ },
404
+ {
405
+ "epoch": 0.035398230088495575,
406
+ "eval_xsum-pairs_loss": 6.583061695098877,
407
+ "eval_xsum-pairs_runtime": 2.8921,
408
+ "eval_xsum-pairs_samples_per_second": 44.259,
409
+ "eval_xsum-pairs_steps_per_second": 0.346,
410
+ "step": 40
411
+ },
412
+ {
413
+ "epoch": 0.035398230088495575,
414
+ "eval_sciq_pairs_loss": 0.8882207870483398,
415
+ "eval_sciq_pairs_runtime": 3.7993,
416
+ "eval_sciq_pairs_samples_per_second": 33.69,
417
+ "eval_sciq_pairs_steps_per_second": 0.263,
418
+ "step": 40
419
+ },
420
+ {
421
+ "epoch": 0.035398230088495575,
422
+ "eval_qasc_pairs_loss": 4.1147541999816895,
423
+ "eval_qasc_pairs_runtime": 0.6768,
424
+ "eval_qasc_pairs_samples_per_second": 189.125,
425
+ "eval_qasc_pairs_steps_per_second": 1.478,
426
+ "step": 40
427
+ },
428
+ {
429
+ "epoch": 0.035398230088495575,
430
+ "eval_openbookqa_pairs_loss": 5.096628665924072,
431
+ "eval_openbookqa_pairs_runtime": 0.5776,
432
+ "eval_openbookqa_pairs_samples_per_second": 221.615,
433
+ "eval_openbookqa_pairs_steps_per_second": 1.731,
434
+ "step": 40
435
+ },
436
+ {
437
+ "epoch": 0.035398230088495575,
438
+ "eval_msmarco_pairs_loss": 10.391141891479492,
439
+ "eval_msmarco_pairs_runtime": 1.2577,
440
+ "eval_msmarco_pairs_samples_per_second": 101.77,
441
+ "eval_msmarco_pairs_steps_per_second": 0.795,
442
+ "step": 40
443
+ },
444
+ {
445
+ "epoch": 0.035398230088495575,
446
+ "eval_nq_pairs_loss": 10.903197288513184,
447
+ "eval_nq_pairs_runtime": 2.5051,
448
+ "eval_nq_pairs_samples_per_second": 51.095,
449
+ "eval_nq_pairs_steps_per_second": 0.399,
450
+ "step": 40
451
+ },
452
+ {
453
+ "epoch": 0.035398230088495575,
454
+ "eval_trivia_pairs_loss": 7.190384387969971,
455
+ "eval_trivia_pairs_runtime": 3.6482,
456
+ "eval_trivia_pairs_samples_per_second": 35.085,
457
+ "eval_trivia_pairs_steps_per_second": 0.274,
458
+ "step": 40
459
+ },
460
+ {
461
+ "epoch": 0.035398230088495575,
462
+ "eval_gooaq_pairs_loss": 8.193528175354004,
463
+ "eval_gooaq_pairs_runtime": 0.9648,
464
+ "eval_gooaq_pairs_samples_per_second": 132.67,
465
+ "eval_gooaq_pairs_steps_per_second": 1.036,
466
+ "step": 40
467
+ },
468
+ {
469
+ "epoch": 0.035398230088495575,
470
+ "eval_paws-pos_loss": 1.3942564725875854,
471
+ "eval_paws-pos_runtime": 0.6718,
472
+ "eval_paws-pos_samples_per_second": 190.538,
473
+ "eval_paws-pos_steps_per_second": 1.489,
474
+ "step": 40
475
+ },
476
+ {
477
+ "epoch": 0.035398230088495575,
478
+ "eval_global_dataset_loss": 5.671571731567383,
479
+ "eval_global_dataset_runtime": 23.0452,
480
+ "eval_global_dataset_samples_per_second": 28.77,
481
+ "eval_global_dataset_steps_per_second": 0.26,
482
+ "step": 40
483
+ },
484
+ {
485
+ "epoch": 0.036283185840707964,
486
+ "grad_norm": 18.026830673217773,
487
+ "learning_rate": 1.3089622641509433e-07,
488
+ "loss": 4.9072,
489
+ "step": 41
490
+ },
491
+ {
492
+ "epoch": 0.03716814159292035,
493
+ "grad_norm": 15.423810958862305,
494
+ "learning_rate": 1.3443396226415095e-07,
495
+ "loss": 3.4439,
496
+ "step": 42
497
+ },
498
+ {
499
+ "epoch": 0.03805309734513274,
500
+ "grad_norm": 16.31403160095215,
501
+ "learning_rate": 1.3797169811320754e-07,
502
+ "loss": 4.9787,
503
+ "step": 43
504
+ },
505
+ {
506
+ "epoch": 0.03893805309734513,
507
+ "grad_norm": 21.37955093383789,
508
+ "learning_rate": 1.4150943396226417e-07,
509
+ "loss": 5.8318,
510
+ "step": 44
511
+ },
512
+ {
513
+ "epoch": 0.03982300884955752,
514
+ "grad_norm": 18.23583984375,
515
+ "learning_rate": 1.4504716981132076e-07,
516
+ "loss": 5.3226,
517
+ "step": 45
518
+ },
519
+ {
520
+ "epoch": 0.04070796460176991,
521
+ "grad_norm": 20.878713607788086,
522
+ "learning_rate": 1.4858490566037738e-07,
523
+ "loss": 5.1181,
524
+ "step": 46
525
+ },
526
+ {
527
+ "epoch": 0.0415929203539823,
528
+ "grad_norm": 18.71149444580078,
529
+ "learning_rate": 1.5212264150943398e-07,
530
+ "loss": 4.7834,
531
+ "step": 47
532
+ },
533
+ {
534
+ "epoch": 0.04247787610619469,
535
+ "grad_norm": 38.85902786254883,
536
+ "learning_rate": 1.5566037735849057e-07,
537
+ "loss": 6.6303,
538
+ "step": 48
539
+ },
540
+ {
541
+ "epoch": 0.04336283185840708,
542
+ "grad_norm": 37.41562271118164,
543
+ "learning_rate": 1.591981132075472e-07,
544
+ "loss": 5.8171,
545
+ "step": 49
546
+ },
547
+ {
548
+ "epoch": 0.04424778761061947,
549
+ "grad_norm": 17.541080474853516,
550
+ "learning_rate": 1.627358490566038e-07,
551
+ "loss": 5.1962,
552
+ "step": 50
553
+ },
554
+ {
555
+ "epoch": 0.04513274336283186,
556
+ "grad_norm": 16.145116806030273,
557
+ "learning_rate": 1.6627358490566038e-07,
558
+ "loss": 5.2096,
559
+ "step": 51
560
+ },
561
+ {
562
+ "epoch": 0.04601769911504425,
563
+ "grad_norm": 20.175189971923828,
564
+ "learning_rate": 1.6981132075471698e-07,
565
+ "loss": 5.0943,
566
+ "step": 52
567
+ },
568
+ {
569
+ "epoch": 0.046902654867256637,
570
+ "grad_norm": 13.441214561462402,
571
+ "learning_rate": 1.733490566037736e-07,
572
+ "loss": 4.9038,
573
+ "step": 53
574
+ },
575
+ {
576
+ "epoch": 0.047787610619469026,
577
+ "grad_norm": 13.396607398986816,
578
+ "learning_rate": 1.768867924528302e-07,
579
+ "loss": 4.6479,
580
+ "step": 54
581
+ },
582
+ {
583
+ "epoch": 0.048672566371681415,
584
+ "grad_norm": 13.68046760559082,
585
+ "learning_rate": 1.804245283018868e-07,
586
+ "loss": 5.5098,
587
+ "step": 55
588
+ },
589
+ {
590
+ "epoch": 0.049557522123893805,
591
+ "grad_norm": 13.278443336486816,
592
+ "learning_rate": 1.839622641509434e-07,
593
+ "loss": 4.6979,
594
+ "step": 56
595
+ },
596
+ {
597
+ "epoch": 0.050442477876106194,
598
+ "grad_norm": 15.295453071594238,
599
+ "learning_rate": 1.875e-07,
600
+ "loss": 3.1969,
601
+ "step": 57
602
+ },
603
+ {
604
+ "epoch": 0.05132743362831858,
605
+ "grad_norm": 12.185781478881836,
606
+ "learning_rate": 1.910377358490566e-07,
607
+ "loss": 4.4127,
608
+ "step": 58
609
+ },
610
+ {
611
+ "epoch": 0.05221238938053097,
612
+ "grad_norm": 10.874494552612305,
613
+ "learning_rate": 1.9457547169811322e-07,
614
+ "loss": 3.7746,
615
+ "step": 59
616
+ },
617
+ {
618
+ "epoch": 0.05309734513274336,
619
+ "grad_norm": 9.654823303222656,
620
+ "learning_rate": 1.9811320754716982e-07,
621
+ "loss": 4.5378,
622
+ "step": 60
623
+ },
624
+ {
625
+ "epoch": 0.05398230088495575,
626
+ "grad_norm": 21.123645782470703,
627
+ "learning_rate": 2.016509433962264e-07,
628
+ "loss": 5.0209,
629
+ "step": 61
630
+ },
631
+ {
632
+ "epoch": 0.05486725663716814,
633
+ "grad_norm": 33.47934341430664,
634
+ "learning_rate": 2.0518867924528303e-07,
635
+ "loss": 6.5936,
636
+ "step": 62
637
+ },
638
+ {
639
+ "epoch": 0.05575221238938053,
640
+ "grad_norm": 10.2566556930542,
641
+ "learning_rate": 2.0872641509433963e-07,
642
+ "loss": 4.2315,
643
+ "step": 63
644
+ },
645
+ {
646
+ "epoch": 0.05663716814159292,
647
+ "grad_norm": 28.198625564575195,
648
+ "learning_rate": 2.1226415094339622e-07,
649
+ "loss": 6.4269,
650
+ "step": 64
651
+ },
652
+ {
653
+ "epoch": 0.05752212389380531,
654
+ "grad_norm": 9.386558532714844,
655
+ "learning_rate": 2.1580188679245282e-07,
656
+ "loss": 4.2644,
657
+ "step": 65
658
+ },
659
+ {
660
+ "epoch": 0.0584070796460177,
661
+ "grad_norm": 12.687555313110352,
662
+ "learning_rate": 2.1933962264150944e-07,
663
+ "loss": 5.1388,
664
+ "step": 66
665
+ },
666
+ {
667
+ "epoch": 0.05929203539823009,
668
+ "grad_norm": 14.834878921508789,
669
+ "learning_rate": 2.2287735849056603e-07,
670
+ "loss": 5.1852,
671
+ "step": 67
672
+ },
673
+ {
674
+ "epoch": 0.06017699115044248,
675
+ "grad_norm": 10.888677597045898,
676
+ "learning_rate": 2.2641509433962263e-07,
677
+ "loss": 4.8057,
678
+ "step": 68
679
+ },
680
+ {
681
+ "epoch": 0.061061946902654866,
682
+ "grad_norm": 13.97256851196289,
683
+ "learning_rate": 2.2995283018867925e-07,
684
+ "loss": 3.1725,
685
+ "step": 69
686
+ },
687
+ {
688
+ "epoch": 0.061946902654867256,
689
+ "grad_norm": 11.82534122467041,
690
+ "learning_rate": 2.3349056603773584e-07,
691
+ "loss": 3.3322,
692
+ "step": 70
693
+ },
694
+ {
695
+ "epoch": 0.06283185840707965,
696
+ "grad_norm": 16.99266242980957,
697
+ "learning_rate": 2.3702830188679244e-07,
698
+ "loss": 5.139,
699
+ "step": 71
700
+ },
701
+ {
702
+ "epoch": 0.06371681415929203,
703
+ "grad_norm": 8.74513053894043,
704
+ "learning_rate": 2.4056603773584903e-07,
705
+ "loss": 4.307,
706
+ "step": 72
707
+ },
708
+ {
709
+ "epoch": 0.06460176991150443,
710
+ "grad_norm": 11.715869903564453,
711
+ "learning_rate": 2.4410377358490563e-07,
712
+ "loss": 5.0133,
713
+ "step": 73
714
+ },
715
+ {
716
+ "epoch": 0.06548672566371681,
717
+ "grad_norm": 9.844196319580078,
718
+ "learning_rate": 2.476415094339623e-07,
719
+ "loss": 4.0507,
720
+ "step": 74
721
+ },
722
+ {
723
+ "epoch": 0.06637168141592921,
724
+ "grad_norm": 12.447444915771484,
725
+ "learning_rate": 2.5117924528301887e-07,
726
+ "loss": 3.3895,
727
+ "step": 75
728
+ },
729
+ {
730
+ "epoch": 0.06725663716814159,
731
+ "grad_norm": 23.91596794128418,
732
+ "learning_rate": 2.5471698113207547e-07,
733
+ "loss": 5.6736,
734
+ "step": 76
735
+ },
736
+ {
737
+ "epoch": 0.06814159292035399,
738
+ "grad_norm": 9.635603904724121,
739
+ "learning_rate": 2.5825471698113206e-07,
740
+ "loss": 4.2572,
741
+ "step": 77
742
+ },
743
+ {
744
+ "epoch": 0.06902654867256637,
745
+ "grad_norm": 14.971665382385254,
746
+ "learning_rate": 2.6179245283018866e-07,
747
+ "loss": 3.0796,
748
+ "step": 78
749
+ },
750
+ {
751
+ "epoch": 0.06991150442477877,
752
+ "grad_norm": 11.226128578186035,
753
+ "learning_rate": 2.6533018867924525e-07,
754
+ "loss": 5.0199,
755
+ "step": 79
756
+ },
757
+ {
758
+ "epoch": 0.07079646017699115,
759
+ "grad_norm": 11.01388931274414,
760
+ "learning_rate": 2.688679245283019e-07,
761
+ "loss": 4.1414,
762
+ "step": 80
763
+ },
764
+ {
765
+ "epoch": 0.07079646017699115,
766
+ "eval_Qnli-dev_cosine_accuracy": 0.591796875,
767
+ "eval_Qnli-dev_cosine_accuracy_threshold": 0.9258557558059692,
768
+ "eval_Qnli-dev_cosine_ap": 0.5585355274462735,
769
+ "eval_Qnli-dev_cosine_f1": 0.6291834002677376,
770
+ "eval_Qnli-dev_cosine_f1_threshold": 0.750666618347168,
771
+ "eval_Qnli-dev_cosine_precision": 0.4598825831702544,
772
+ "eval_Qnli-dev_cosine_recall": 0.9957627118644068,
773
+ "eval_Qnli-dev_dot_accuracy": 0.591796875,
774
+ "eval_Qnli-dev_dot_accuracy_threshold": 711.18359375,
775
+ "eval_Qnli-dev_dot_ap": 0.5585297234749824,
776
+ "eval_Qnli-dev_dot_f1": 0.6291834002677376,
777
+ "eval_Qnli-dev_dot_f1_threshold": 576.5970458984375,
778
+ "eval_Qnli-dev_dot_precision": 0.4598825831702544,
779
+ "eval_Qnli-dev_dot_recall": 0.9957627118644068,
780
+ "eval_Qnli-dev_euclidean_accuracy": 0.591796875,
781
+ "eval_Qnli-dev_euclidean_accuracy_threshold": 10.672666549682617,
782
+ "eval_Qnli-dev_euclidean_ap": 0.5585355274462735,
783
+ "eval_Qnli-dev_euclidean_f1": 0.6291834002677376,
784
+ "eval_Qnli-dev_euclidean_f1_threshold": 19.553747177124023,
785
+ "eval_Qnli-dev_euclidean_precision": 0.4598825831702544,
786
+ "eval_Qnli-dev_euclidean_recall": 0.9957627118644068,
787
+ "eval_Qnli-dev_manhattan_accuracy": 0.619140625,
788
+ "eval_Qnli-dev_manhattan_accuracy_threshold": 188.09068298339844,
789
+ "eval_Qnli-dev_manhattan_ap": 0.5898283705050701,
790
+ "eval_Qnli-dev_manhattan_f1": 0.6301775147928994,
791
+ "eval_Qnli-dev_manhattan_f1_threshold": 237.80462646484375,
792
+ "eval_Qnli-dev_manhattan_precision": 0.48409090909090907,
793
+ "eval_Qnli-dev_manhattan_recall": 0.902542372881356,
794
+ "eval_Qnli-dev_max_accuracy": 0.619140625,
795
+ "eval_Qnli-dev_max_accuracy_threshold": 711.18359375,
796
+ "eval_Qnli-dev_max_ap": 0.5898283705050701,
797
+ "eval_Qnli-dev_max_f1": 0.6301775147928994,
798
+ "eval_Qnli-dev_max_f1_threshold": 576.5970458984375,
799
+ "eval_Qnli-dev_max_precision": 0.48409090909090907,
800
+ "eval_Qnli-dev_max_recall": 0.9957627118644068,
801
+ "eval_allNLI-dev_cosine_accuracy": 0.666015625,
802
+ "eval_allNLI-dev_cosine_accuracy_threshold": 0.983686089515686,
803
+ "eval_allNLI-dev_cosine_ap": 0.34411819659341086,
804
+ "eval_allNLI-dev_cosine_f1": 0.5065885797950219,
805
+ "eval_allNLI-dev_cosine_f1_threshold": 0.7642872333526611,
806
+ "eval_allNLI-dev_cosine_precision": 0.3392156862745098,
807
+ "eval_allNLI-dev_cosine_recall": 1.0,
808
+ "eval_allNLI-dev_dot_accuracy": 0.666015625,
809
+ "eval_allNLI-dev_dot_accuracy_threshold": 755.60302734375,
810
+ "eval_allNLI-dev_dot_ap": 0.344109544232086,
811
+ "eval_allNLI-dev_dot_f1": 0.5065885797950219,
812
+ "eval_allNLI-dev_dot_f1_threshold": 587.0625,
813
+ "eval_allNLI-dev_dot_precision": 0.3392156862745098,
814
+ "eval_allNLI-dev_dot_recall": 1.0,
815
+ "eval_allNLI-dev_euclidean_accuracy": 0.666015625,
816
+ "eval_allNLI-dev_euclidean_accuracy_threshold": 5.00581693649292,
817
+ "eval_allNLI-dev_euclidean_ap": 0.3441246898925644,
818
+ "eval_allNLI-dev_euclidean_f1": 0.5065885797950219,
819
+ "eval_allNLI-dev_euclidean_f1_threshold": 19.022436141967773,
820
+ "eval_allNLI-dev_euclidean_precision": 0.3392156862745098,
821
+ "eval_allNLI-dev_euclidean_recall": 1.0,
822
+ "eval_allNLI-dev_manhattan_accuracy": 0.6640625,
823
+ "eval_allNLI-dev_manhattan_accuracy_threshold": 62.69102096557617,
824
+ "eval_allNLI-dev_manhattan_ap": 0.35131239981425566,
825
+ "eval_allNLI-dev_manhattan_f1": 0.5058479532163743,
826
+ "eval_allNLI-dev_manhattan_f1_threshold": 337.6861877441406,
827
+ "eval_allNLI-dev_manhattan_precision": 0.3385518590998043,
828
+ "eval_allNLI-dev_manhattan_recall": 1.0,
829
+ "eval_allNLI-dev_max_accuracy": 0.666015625,
830
+ "eval_allNLI-dev_max_accuracy_threshold": 755.60302734375,
831
+ "eval_allNLI-dev_max_ap": 0.35131239981425566,
832
+ "eval_allNLI-dev_max_f1": 0.5065885797950219,
833
+ "eval_allNLI-dev_max_f1_threshold": 587.0625,
834
+ "eval_allNLI-dev_max_precision": 0.3392156862745098,
835
+ "eval_allNLI-dev_max_recall": 1.0,
836
+ "eval_sequential_score": 0.5898283705050701,
837
+ "eval_sts-test_pearson_cosine": 0.22248205020578934,
838
+ "eval_sts-test_pearson_dot": 0.22239084967931927,
839
+ "eval_sts-test_pearson_euclidean": 0.2323160413842197,
840
+ "eval_sts-test_pearson_manhattan": 0.26632593273308647,
841
+ "eval_sts-test_pearson_max": 0.26632593273308647,
842
+ "eval_sts-test_spearman_cosine": 0.24802235964390085,
843
+ "eval_sts-test_spearman_dot": 0.24791612015173234,
844
+ "eval_sts-test_spearman_euclidean": 0.24799036249272113,
845
+ "eval_sts-test_spearman_manhattan": 0.2843623073856928,
846
+ "eval_sts-test_spearman_max": 0.2843623073856928,
847
+ "eval_vitaminc-pairs_loss": 2.7793872356414795,
848
+ "eval_vitaminc-pairs_runtime": 3.7649,
849
+ "eval_vitaminc-pairs_samples_per_second": 33.998,
850
+ "eval_vitaminc-pairs_steps_per_second": 0.266,
851
+ "step": 80
852
+ },
853
+ {
854
+ "epoch": 0.07079646017699115,
855
+ "eval_negation-triplets_loss": 4.888970851898193,
856
+ "eval_negation-triplets_runtime": 0.7134,
857
+ "eval_negation-triplets_samples_per_second": 179.432,
858
+ "eval_negation-triplets_steps_per_second": 1.402,
859
+ "step": 80
860
+ },
861
+ {
862
+ "epoch": 0.07079646017699115,
863
+ "eval_scitail-pairs-pos_loss": 1.8996644020080566,
864
+ "eval_scitail-pairs-pos_runtime": 0.8506,
865
+ "eval_scitail-pairs-pos_samples_per_second": 150.477,
866
+ "eval_scitail-pairs-pos_steps_per_second": 1.176,
867
+ "step": 80
868
+ },
869
+ {
870
+ "epoch": 0.07079646017699115,
871
+ "eval_scitail-pairs-qa_loss": 2.6760551929473877,
872
+ "eval_scitail-pairs-qa_runtime": 0.5685,
873
+ "eval_scitail-pairs-qa_samples_per_second": 225.171,
874
+ "eval_scitail-pairs-qa_steps_per_second": 1.759,
875
+ "step": 80
876
+ },
877
+ {
878
+ "epoch": 0.07079646017699115,
879
+ "eval_xsum-pairs_loss": 6.209648609161377,
880
+ "eval_xsum-pairs_runtime": 2.9221,
881
+ "eval_xsum-pairs_samples_per_second": 43.804,
882
+ "eval_xsum-pairs_steps_per_second": 0.342,
883
+ "step": 80
884
+ },
885
+ {
886
+ "epoch": 0.07079646017699115,
887
+ "eval_sciq_pairs_loss": 0.7622462511062622,
888
+ "eval_sciq_pairs_runtime": 3.7816,
889
+ "eval_sciq_pairs_samples_per_second": 33.848,
890
+ "eval_sciq_pairs_steps_per_second": 0.264,
891
+ "step": 80
892
+ },
893
+ {
894
+ "epoch": 0.07079646017699115,
895
+ "eval_qasc_pairs_loss": 3.3129472732543945,
896
+ "eval_qasc_pairs_runtime": 0.6761,
897
+ "eval_qasc_pairs_samples_per_second": 189.334,
898
+ "eval_qasc_pairs_steps_per_second": 1.479,
899
+ "step": 80
900
+ },
901
+ {
902
+ "epoch": 0.07079646017699115,
903
+ "eval_openbookqa_pairs_loss": 4.549765586853027,
904
+ "eval_openbookqa_pairs_runtime": 0.5767,
905
+ "eval_openbookqa_pairs_samples_per_second": 221.954,
906
+ "eval_openbookqa_pairs_steps_per_second": 1.734,
907
+ "step": 80
908
+ },
909
+ {
910
+ "epoch": 0.07079646017699115,
911
+ "eval_msmarco_pairs_loss": 7.205582141876221,
912
+ "eval_msmarco_pairs_runtime": 1.2621,
913
+ "eval_msmarco_pairs_samples_per_second": 101.416,
914
+ "eval_msmarco_pairs_steps_per_second": 0.792,
915
+ "step": 80
916
+ },
917
+ {
918
+ "epoch": 0.07079646017699115,
919
+ "eval_nq_pairs_loss": 7.680945873260498,
920
+ "eval_nq_pairs_runtime": 2.5052,
921
+ "eval_nq_pairs_samples_per_second": 51.095,
922
+ "eval_nq_pairs_steps_per_second": 0.399,
923
+ "step": 80
924
+ },
925
+ {
926
+ "epoch": 0.07079646017699115,
927
+ "eval_trivia_pairs_loss": 6.37924861907959,
928
+ "eval_trivia_pairs_runtime": 3.6293,
929
+ "eval_trivia_pairs_samples_per_second": 35.268,
930
+ "eval_trivia_pairs_steps_per_second": 0.276,
931
+ "step": 80
932
+ },
933
+ {
934
+ "epoch": 0.07079646017699115,
935
+ "eval_gooaq_pairs_loss": 6.656675338745117,
936
+ "eval_gooaq_pairs_runtime": 0.9698,
937
+ "eval_gooaq_pairs_samples_per_second": 131.988,
938
+ "eval_gooaq_pairs_steps_per_second": 1.031,
939
+ "step": 80
940
+ },
941
+ {
942
+ "epoch": 0.07079646017699115,
943
+ "eval_paws-pos_loss": 1.3848179578781128,
944
+ "eval_paws-pos_runtime": 0.6727,
945
+ "eval_paws-pos_samples_per_second": 190.278,
946
+ "eval_paws-pos_steps_per_second": 1.487,
947
+ "step": 80
948
+ },
949
+ {
950
+ "epoch": 0.07079646017699115,
951
+ "eval_global_dataset_loss": 5.002967834472656,
952
+ "eval_global_dataset_runtime": 23.048,
953
+ "eval_global_dataset_samples_per_second": 28.766,
954
+ "eval_global_dataset_steps_per_second": 0.26,
955
+ "step": 80
956
+ },
957
+ {
958
+ "epoch": 0.07168141592920355,
959
+ "grad_norm": 18.9890193939209,
960
+ "learning_rate": 2.724056603773585e-07,
961
+ "loss": 5.8604,
962
+ "step": 81
963
+ },
964
+ {
965
+ "epoch": 0.07256637168141593,
966
+ "grad_norm": 8.206193923950195,
967
+ "learning_rate": 2.759433962264151e-07,
968
+ "loss": 4.3003,
969
+ "step": 82
970
+ },
971
+ {
972
+ "epoch": 0.07345132743362832,
973
+ "grad_norm": 10.03178882598877,
974
+ "learning_rate": 2.794811320754717e-07,
975
+ "loss": 4.4568,
976
+ "step": 83
977
+ },
978
+ {
979
+ "epoch": 0.0743362831858407,
980
+ "grad_norm": 14.74673080444336,
981
+ "learning_rate": 2.8301886792452833e-07,
982
+ "loss": 4.2747,
983
+ "step": 84
984
+ },
985
+ {
986
+ "epoch": 0.0752212389380531,
987
+ "grad_norm": 19.097232818603516,
988
+ "learning_rate": 2.865566037735849e-07,
989
+ "loss": 5.52,
990
+ "step": 85
991
+ },
992
+ {
993
+ "epoch": 0.07610619469026549,
994
+ "grad_norm": 14.828218460083008,
995
+ "learning_rate": 2.900943396226415e-07,
996
+ "loss": 2.7767,
997
+ "step": 86
998
+ },
999
+ {
1000
+ "epoch": 0.07699115044247788,
1001
+ "grad_norm": 9.30789566040039,
1002
+ "learning_rate": 2.936320754716981e-07,
1003
+ "loss": 4.397,
1004
+ "step": 87
1005
+ },
1006
+ {
1007
+ "epoch": 0.07787610619469026,
1008
+ "grad_norm": 15.119461059570312,
1009
+ "learning_rate": 2.9716981132075476e-07,
1010
+ "loss": 5.4449,
1011
+ "step": 88
1012
+ },
1013
+ {
1014
+ "epoch": 0.07876106194690266,
1015
+ "grad_norm": 8.459301948547363,
1016
+ "learning_rate": 3.0070754716981136e-07,
1017
+ "loss": 4.2706,
1018
+ "step": 89
1019
+ },
1020
+ {
1021
+ "epoch": 0.07964601769911504,
1022
+ "grad_norm": 23.59125518798828,
1023
+ "learning_rate": 3.0424528301886795e-07,
1024
+ "loss": 6.4759,
1025
+ "step": 90
1026
+ },
1027
+ {
1028
+ "epoch": 0.08053097345132744,
1029
+ "grad_norm": 8.729449272155762,
1030
+ "learning_rate": 3.0778301886792455e-07,
1031
+ "loss": 4.1951,
1032
+ "step": 91
1033
+ },
1034
+ {
1035
+ "epoch": 0.08141592920353982,
1036
+ "grad_norm": 8.37271785736084,
1037
+ "learning_rate": 3.1132075471698114e-07,
1038
+ "loss": 4.6328,
1039
+ "step": 92
1040
+ },
1041
+ {
1042
+ "epoch": 0.08230088495575222,
1043
+ "grad_norm": 10.029474258422852,
1044
+ "learning_rate": 3.1485849056603774e-07,
1045
+ "loss": 4.1278,
1046
+ "step": 93
1047
+ },
1048
+ {
1049
+ "epoch": 0.0831858407079646,
1050
+ "grad_norm": 8.706567764282227,
1051
+ "learning_rate": 3.183962264150944e-07,
1052
+ "loss": 4.1787,
1053
+ "step": 94
1054
+ },
1055
+ {
1056
+ "epoch": 0.084070796460177,
1057
+ "grad_norm": 13.88837718963623,
1058
+ "learning_rate": 3.21933962264151e-07,
1059
+ "loss": 5.2156,
1060
+ "step": 95
1061
+ },
1062
+ {
1063
+ "epoch": 0.08495575221238938,
1064
+ "grad_norm": 12.01068115234375,
1065
+ "learning_rate": 3.254716981132076e-07,
1066
+ "loss": 3.1403,
1067
+ "step": 96
1068
+ },
1069
+ {
1070
+ "epoch": 0.08584070796460178,
1071
+ "grad_norm": 8.432968139648438,
1072
+ "learning_rate": 3.2900943396226417e-07,
1073
+ "loss": 4.0273,
1074
+ "step": 97
1075
+ },
1076
+ {
1077
+ "epoch": 0.08672566371681416,
1078
+ "grad_norm": 12.645098686218262,
1079
+ "learning_rate": 3.3254716981132077e-07,
1080
+ "loss": 3.0624,
1081
+ "step": 98
1082
+ },
1083
+ {
1084
+ "epoch": 0.08761061946902655,
1085
+ "grad_norm": 11.483688354492188,
1086
+ "learning_rate": 3.3608490566037736e-07,
1087
+ "loss": 4.6786,
1088
+ "step": 99
1089
+ },
1090
+ {
1091
+ "epoch": 0.08849557522123894,
1092
+ "grad_norm": 8.645537376403809,
1093
+ "learning_rate": 3.3962264150943395e-07,
1094
+ "loss": 4.1505,
1095
+ "step": 100
1096
+ },
1097
+ {
1098
+ "epoch": 0.08938053097345133,
1099
+ "grad_norm": 13.053335189819336,
1100
+ "learning_rate": 3.431603773584906e-07,
1101
+ "loss": 2.9529,
1102
+ "step": 101
1103
+ },
1104
+ {
1105
+ "epoch": 0.09026548672566372,
1106
+ "grad_norm": 14.494400978088379,
1107
+ "learning_rate": 3.466981132075472e-07,
1108
+ "loss": 4.7048,
1109
+ "step": 102
1110
+ },
1111
+ {
1112
+ "epoch": 0.09115044247787611,
1113
+ "grad_norm": 9.513616561889648,
1114
+ "learning_rate": 3.502358490566038e-07,
1115
+ "loss": 4.7388,
1116
+ "step": 103
1117
+ },
1118
+ {
1119
+ "epoch": 0.0920353982300885,
1120
+ "grad_norm": 9.751347541809082,
1121
+ "learning_rate": 3.537735849056604e-07,
1122
+ "loss": 3.7879,
1123
+ "step": 104
1124
+ },
1125
+ {
1126
+ "epoch": 0.09292035398230089,
1127
+ "grad_norm": 9.06558895111084,
1128
+ "learning_rate": 3.57311320754717e-07,
1129
+ "loss": 4.0311,
1130
+ "step": 105
1131
+ },
1132
+ {
1133
+ "epoch": 0.09380530973451327,
1134
+ "grad_norm": 9.53257942199707,
1135
+ "learning_rate": 3.608490566037736e-07,
1136
+ "loss": 4.1314,
1137
+ "step": 106
1138
+ },
1139
+ {
1140
+ "epoch": 0.09469026548672567,
1141
+ "grad_norm": 11.554676055908203,
1142
+ "learning_rate": 3.643867924528302e-07,
1143
+ "loss": 4.9411,
1144
+ "step": 107
1145
+ },
1146
+ {
1147
+ "epoch": 0.09557522123893805,
1148
+ "grad_norm": 8.559597969055176,
1149
+ "learning_rate": 3.679245283018868e-07,
1150
+ "loss": 4.1118,
1151
+ "step": 108
1152
+ },
1153
+ {
1154
+ "epoch": 0.09646017699115045,
1155
+ "grad_norm": 10.008039474487305,
1156
+ "learning_rate": 3.714622641509434e-07,
1157
+ "loss": 3.6971,
1158
+ "step": 109
1159
+ },
1160
+ {
1161
+ "epoch": 0.09734513274336283,
1162
+ "grad_norm": 16.543254852294922,
1163
+ "learning_rate": 3.75e-07,
1164
+ "loss": 5.605,
1165
+ "step": 110
1166
+ },
1167
+ {
1168
+ "epoch": 0.09823008849557523,
1169
+ "grad_norm": 11.816540718078613,
1170
+ "learning_rate": 3.7853773584905666e-07,
1171
+ "loss": 3.4563,
1172
+ "step": 111
1173
+ },
1174
+ {
1175
+ "epoch": 0.09911504424778761,
1176
+ "grad_norm": 10.638028144836426,
1177
+ "learning_rate": 3.820754716981132e-07,
1178
+ "loss": 3.7422,
1179
+ "step": 112
1180
+ },
1181
+ {
1182
+ "epoch": 0.1,
1183
+ "grad_norm": 8.5276460647583,
1184
+ "learning_rate": 3.8561320754716985e-07,
1185
+ "loss": 3.8055,
1186
+ "step": 113
1187
+ },
1188
+ {
1189
+ "epoch": 0.10088495575221239,
1190
+ "grad_norm": 13.437420845031738,
1191
+ "learning_rate": 3.8915094339622644e-07,
1192
+ "loss": 5.2369,
1193
+ "step": 114
1194
+ },
1195
+ {
1196
+ "epoch": 0.10176991150442478,
1197
+ "grad_norm": 21.039424896240234,
1198
+ "learning_rate": 3.926886792452831e-07,
1199
+ "loss": 5.6518,
1200
+ "step": 115
1201
+ },
1202
+ {
1203
+ "epoch": 0.10265486725663717,
1204
+ "grad_norm": 13.487382888793945,
1205
+ "learning_rate": 3.9622641509433963e-07,
1206
+ "loss": 3.2906,
1207
+ "step": 116
1208
+ },
1209
+ {
1210
+ "epoch": 0.10353982300884956,
1211
+ "grad_norm": 11.895822525024414,
1212
+ "learning_rate": 3.997641509433963e-07,
1213
+ "loss": 3.4996,
1214
+ "step": 117
1215
+ },
1216
+ {
1217
+ "epoch": 0.10442477876106195,
1218
+ "grad_norm": 10.83902359008789,
1219
+ "learning_rate": 4.033018867924528e-07,
1220
+ "loss": 3.6283,
1221
+ "step": 118
1222
+ },
1223
+ {
1224
+ "epoch": 0.10530973451327434,
1225
+ "grad_norm": 10.552660942077637,
1226
+ "learning_rate": 4.0683962264150947e-07,
1227
+ "loss": 4.1487,
1228
+ "step": 119
1229
+ },
1230
+ {
1231
+ "epoch": 0.10619469026548672,
1232
+ "grad_norm": 9.924088478088379,
1233
+ "learning_rate": 4.1037735849056606e-07,
1234
+ "loss": 4.3996,
1235
+ "step": 120
1236
+ },
1237
+ {
1238
+ "epoch": 0.10619469026548672,
1239
+ "eval_Qnli-dev_cosine_accuracy": 0.595703125,
1240
+ "eval_Qnli-dev_cosine_accuracy_threshold": 0.9275249242782593,
1241
+ "eval_Qnli-dev_cosine_ap": 0.5645920090286662,
1242
+ "eval_Qnli-dev_cosine_f1": 0.6327077747989276,
1243
+ "eval_Qnli-dev_cosine_f1_threshold": 0.7267085313796997,
1244
+ "eval_Qnli-dev_cosine_precision": 0.4627450980392157,
1245
+ "eval_Qnli-dev_cosine_recall": 1.0,
1246
+ "eval_Qnli-dev_dot_accuracy": 0.595703125,
1247
+ "eval_Qnli-dev_dot_accuracy_threshold": 712.4608154296875,
1248
+ "eval_Qnli-dev_dot_ap": 0.5646837736357366,
1249
+ "eval_Qnli-dev_dot_f1": 0.6327077747989276,
1250
+ "eval_Qnli-dev_dot_f1_threshold": 558.2177734375,
1251
+ "eval_Qnli-dev_dot_precision": 0.4627450980392157,
1252
+ "eval_Qnli-dev_dot_recall": 1.0,
1253
+ "eval_Qnli-dev_euclidean_accuracy": 0.595703125,
1254
+ "eval_Qnli-dev_euclidean_accuracy_threshold": 10.551876068115234,
1255
+ "eval_Qnli-dev_euclidean_ap": 0.5645997569733668,
1256
+ "eval_Qnli-dev_euclidean_f1": 0.6327077747989276,
1257
+ "eval_Qnli-dev_euclidean_f1_threshold": 20.490163803100586,
1258
+ "eval_Qnli-dev_euclidean_precision": 0.4627450980392157,
1259
+ "eval_Qnli-dev_euclidean_recall": 1.0,
1260
+ "eval_Qnli-dev_manhattan_accuracy": 0.626953125,
1261
+ "eval_Qnli-dev_manhattan_accuracy_threshold": 195.12744140625,
1262
+ "eval_Qnli-dev_manhattan_ap": 0.5975206086733145,
1263
+ "eval_Qnli-dev_manhattan_f1": 0.6322008862629247,
1264
+ "eval_Qnli-dev_manhattan_f1_threshold": 256.6172180175781,
1265
+ "eval_Qnli-dev_manhattan_precision": 0.4852607709750567,
1266
+ "eval_Qnli-dev_manhattan_recall": 0.9067796610169492,
1267
+ "eval_Qnli-dev_max_accuracy": 0.626953125,
1268
+ "eval_Qnli-dev_max_accuracy_threshold": 712.4608154296875,
1269
+ "eval_Qnli-dev_max_ap": 0.5975206086733145,
1270
+ "eval_Qnli-dev_max_f1": 0.6327077747989276,
1271
+ "eval_Qnli-dev_max_f1_threshold": 558.2177734375,
1272
+ "eval_Qnli-dev_max_precision": 0.4852607709750567,
1273
+ "eval_Qnli-dev_max_recall": 1.0,
1274
+ "eval_allNLI-dev_cosine_accuracy": 0.666015625,
1275
+ "eval_allNLI-dev_cosine_accuracy_threshold": 0.983871340751648,
1276
+ "eval_allNLI-dev_cosine_ap": 0.36035507065342104,
1277
+ "eval_allNLI-dev_cosine_f1": 0.5051395007342143,
1278
+ "eval_allNLI-dev_cosine_f1_threshold": 0.7787582874298096,
1279
+ "eval_allNLI-dev_cosine_precision": 0.33858267716535434,
1280
+ "eval_allNLI-dev_cosine_recall": 0.9942196531791907,
1281
+ "eval_allNLI-dev_dot_accuracy": 0.666015625,
1282
+ "eval_allNLI-dev_dot_accuracy_threshold": 755.7670288085938,
1283
+ "eval_allNLI-dev_dot_ap": 0.36031241443166284,
1284
+ "eval_allNLI-dev_dot_f1": 0.5051395007342143,
1285
+ "eval_allNLI-dev_dot_f1_threshold": 598.2041625976562,
1286
+ "eval_allNLI-dev_dot_precision": 0.33858267716535434,
1287
+ "eval_allNLI-dev_dot_recall": 0.9942196531791907,
1288
+ "eval_allNLI-dev_euclidean_accuracy": 0.666015625,
1289
+ "eval_allNLI-dev_euclidean_accuracy_threshold": 4.964720249176025,
1290
+ "eval_allNLI-dev_euclidean_ap": 0.36035507065342104,
1291
+ "eval_allNLI-dev_euclidean_f1": 0.5051395007342143,
1292
+ "eval_allNLI-dev_euclidean_f1_threshold": 18.434789657592773,
1293
+ "eval_allNLI-dev_euclidean_precision": 0.33858267716535434,
1294
+ "eval_allNLI-dev_euclidean_recall": 0.9942196531791907,
1295
+ "eval_allNLI-dev_manhattan_accuracy": 0.6640625,
1296
+ "eval_allNLI-dev_manhattan_accuracy_threshold": 66.59053039550781,
1297
+ "eval_allNLI-dev_manhattan_ap": 0.3692975841596879,
1298
+ "eval_allNLI-dev_manhattan_f1": 0.5029239766081871,
1299
+ "eval_allNLI-dev_manhattan_f1_threshold": 380.123779296875,
1300
+ "eval_allNLI-dev_manhattan_precision": 0.33659491193737767,
1301
+ "eval_allNLI-dev_manhattan_recall": 0.9942196531791907,
1302
+ "eval_allNLI-dev_max_accuracy": 0.666015625,
1303
+ "eval_allNLI-dev_max_accuracy_threshold": 755.7670288085938,
1304
+ "eval_allNLI-dev_max_ap": 0.3692975841596879,
1305
+ "eval_allNLI-dev_max_f1": 0.5051395007342143,
1306
+ "eval_allNLI-dev_max_f1_threshold": 598.2041625976562,
1307
+ "eval_allNLI-dev_max_precision": 0.33858267716535434,
1308
+ "eval_allNLI-dev_max_recall": 0.9942196531791907,
1309
+ "eval_sequential_score": 0.5975206086733145,
1310
+ "eval_sts-test_pearson_cosine": 0.2980667522290251,
1311
+ "eval_sts-test_pearson_dot": 0.29795063801865274,
1312
+ "eval_sts-test_pearson_euclidean": 0.30279956330153407,
1313
+ "eval_sts-test_pearson_manhattan": 0.32939035635624725,
1314
+ "eval_sts-test_pearson_max": 0.32939035635624725,
1315
+ "eval_sts-test_spearman_cosine": 0.3148821747085771,
1316
+ "eval_sts-test_spearman_dot": 0.3149517475826025,
1317
+ "eval_sts-test_spearman_euclidean": 0.31489636085812106,
1318
+ "eval_sts-test_spearman_manhattan": 0.34558301612848313,
1319
+ "eval_sts-test_spearman_max": 0.34558301612848313,
1320
+ "eval_vitaminc-pairs_loss": 2.727938652038574,
1321
+ "eval_vitaminc-pairs_runtime": 3.7459,
1322
+ "eval_vitaminc-pairs_samples_per_second": 34.17,
1323
+ "eval_vitaminc-pairs_steps_per_second": 0.267,
1324
+ "step": 120
1325
+ },
1326
+ {
1327
+ "epoch": 0.10619469026548672,
1328
+ "eval_negation-triplets_loss": 4.394620418548584,
1329
+ "eval_negation-triplets_runtime": 0.7078,
1330
+ "eval_negation-triplets_samples_per_second": 180.852,
1331
+ "eval_negation-triplets_steps_per_second": 1.413,
1332
+ "step": 120
1333
+ },
1334
+ {
1335
+ "epoch": 0.10619469026548672,
1336
+ "eval_scitail-pairs-pos_loss": 1.4130322933197021,
1337
+ "eval_scitail-pairs-pos_runtime": 0.8587,
1338
+ "eval_scitail-pairs-pos_samples_per_second": 149.07,
1339
+ "eval_scitail-pairs-pos_steps_per_second": 1.165,
1340
+ "step": 120
1341
+ },
1342
+ {
1343
+ "epoch": 0.10619469026548672,
1344
+ "eval_scitail-pairs-qa_loss": 2.1150403022766113,
1345
+ "eval_scitail-pairs-qa_runtime": 0.549,
1346
+ "eval_scitail-pairs-qa_samples_per_second": 233.163,
1347
+ "eval_scitail-pairs-qa_steps_per_second": 1.822,
1348
+ "step": 120
1349
+ },
1350
+ {
1351
+ "epoch": 0.10619469026548672,
1352
+ "eval_xsum-pairs_loss": 6.048598289489746,
1353
+ "eval_xsum-pairs_runtime": 2.9142,
1354
+ "eval_xsum-pairs_samples_per_second": 43.923,
1355
+ "eval_xsum-pairs_steps_per_second": 0.343,
1356
+ "step": 120
1357
+ },
1358
+ {
1359
+ "epoch": 0.10619469026548672,
1360
+ "eval_sciq_pairs_loss": 0.7171850800514221,
1361
+ "eval_sciq_pairs_runtime": 3.7786,
1362
+ "eval_sciq_pairs_samples_per_second": 33.875,
1363
+ "eval_sciq_pairs_steps_per_second": 0.265,
1364
+ "step": 120
1365
+ },
1366
+ {
1367
+ "epoch": 0.10619469026548672,
1368
+ "eval_qasc_pairs_loss": 2.96693754196167,
1369
+ "eval_qasc_pairs_runtime": 0.6718,
1370
+ "eval_qasc_pairs_samples_per_second": 190.538,
1371
+ "eval_qasc_pairs_steps_per_second": 1.489,
1372
+ "step": 120
1373
+ },
1374
+ {
1375
+ "epoch": 0.10619469026548672,
1376
+ "eval_openbookqa_pairs_loss": 4.418018341064453,
1377
+ "eval_openbookqa_pairs_runtime": 0.577,
1378
+ "eval_openbookqa_pairs_samples_per_second": 221.852,
1379
+ "eval_openbookqa_pairs_steps_per_second": 1.733,
1380
+ "step": 120
1381
+ },
1382
+ {
1383
+ "epoch": 0.10619469026548672,
1384
+ "eval_msmarco_pairs_loss": 6.302182197570801,
1385
+ "eval_msmarco_pairs_runtime": 1.2547,
1386
+ "eval_msmarco_pairs_samples_per_second": 102.016,
1387
+ "eval_msmarco_pairs_steps_per_second": 0.797,
1388
+ "step": 120
1389
+ },
1390
+ {
1391
+ "epoch": 0.10619469026548672,
1392
+ "eval_nq_pairs_loss": 6.841231822967529,
1393
+ "eval_nq_pairs_runtime": 2.5052,
1394
+ "eval_nq_pairs_samples_per_second": 51.094,
1395
+ "eval_nq_pairs_steps_per_second": 0.399,
1396
+ "step": 120
1397
+ },
1398
+ {
1399
+ "epoch": 0.10619469026548672,
1400
+ "eval_trivia_pairs_loss": 6.201311111450195,
1401
+ "eval_trivia_pairs_runtime": 3.6311,
1402
+ "eval_trivia_pairs_samples_per_second": 35.251,
1403
+ "eval_trivia_pairs_steps_per_second": 0.275,
1404
+ "step": 120
1405
+ },
1406
+ {
1407
+ "epoch": 0.10619469026548672,
1408
+ "eval_gooaq_pairs_loss": 6.098212718963623,
1409
+ "eval_gooaq_pairs_runtime": 0.9643,
1410
+ "eval_gooaq_pairs_samples_per_second": 132.741,
1411
+ "eval_gooaq_pairs_steps_per_second": 1.037,
1412
+ "step": 120
1413
+ },
1414
+ {
1415
+ "epoch": 0.10619469026548672,
1416
+ "eval_paws-pos_loss": 0.9473956823348999,
1417
+ "eval_paws-pos_runtime": 0.6684,
1418
+ "eval_paws-pos_samples_per_second": 191.51,
1419
+ "eval_paws-pos_steps_per_second": 1.496,
1420
+ "step": 120
1421
+ },
1422
+ {
1423
+ "epoch": 0.10619469026548672,
1424
+ "eval_global_dataset_loss": 4.385201454162598,
1425
+ "eval_global_dataset_runtime": 23.0455,
1426
+ "eval_global_dataset_samples_per_second": 28.769,
1427
+ "eval_global_dataset_steps_per_second": 0.26,
1428
+ "step": 120
1429
+ },
1430
+ {
1431
+ "epoch": 0.10707964601769912,
1432
+ "grad_norm": 12.284002304077148,
1433
+ "learning_rate": 4.1391509433962266e-07,
1434
+ "loss": 3.5291,
1435
+ "step": 121
1436
+ },
1437
+ {
1438
+ "epoch": 0.1079646017699115,
1439
+ "grad_norm": 10.567977905273438,
1440
+ "learning_rate": 4.1745283018867925e-07,
1441
+ "loss": 3.8232,
1442
+ "step": 122
1443
+ },
1444
+ {
1445
+ "epoch": 0.1088495575221239,
1446
+ "grad_norm": 11.508279800415039,
1447
+ "learning_rate": 4.209905660377359e-07,
1448
+ "loss": 4.6035,
1449
+ "step": 123
1450
+ },
1451
+ {
1452
+ "epoch": 0.10973451327433628,
1453
+ "grad_norm": 10.180809020996094,
1454
+ "learning_rate": 4.2452830188679244e-07,
1455
+ "loss": 3.7607,
1456
+ "step": 124
1457
+ },
1458
+ {
1459
+ "epoch": 0.11061946902654868,
1460
+ "grad_norm": 9.519749641418457,
1461
+ "learning_rate": 4.280660377358491e-07,
1462
+ "loss": 3.8461,
1463
+ "step": 125
1464
+ },
1465
+ {
1466
+ "epoch": 0.11150442477876106,
1467
+ "grad_norm": 11.971588134765625,
1468
+ "learning_rate": 4.3160377358490563e-07,
1469
+ "loss": 3.3413,
1470
+ "step": 126
1471
+ },
1472
+ {
1473
+ "epoch": 0.11238938053097346,
1474
+ "grad_norm": 9.211153984069824,
1475
+ "learning_rate": 4.351415094339623e-07,
1476
+ "loss": 4.2777,
1477
+ "step": 127
1478
+ },
1479
+ {
1480
+ "epoch": 0.11327433628318584,
1481
+ "grad_norm": 12.393014907836914,
1482
+ "learning_rate": 4.386792452830189e-07,
1483
+ "loss": 4.3597,
1484
+ "step": 128
1485
+ },
1486
+ {
1487
+ "epoch": 0.11415929203539824,
1488
+ "grad_norm": 14.332024574279785,
1489
+ "learning_rate": 4.422169811320755e-07,
1490
+ "loss": 3.9046,
1491
+ "step": 129
1492
+ },
1493
+ {
1494
+ "epoch": 0.11504424778761062,
1495
+ "grad_norm": 10.091246604919434,
1496
+ "learning_rate": 4.4575471698113207e-07,
1497
+ "loss": 4.0527,
1498
+ "step": 130
1499
+ },
1500
+ {
1501
+ "epoch": 0.11592920353982301,
1502
+ "grad_norm": 15.043377876281738,
1503
+ "learning_rate": 4.492924528301887e-07,
1504
+ "loss": 5.0883,
1505
+ "step": 131
1506
+ },
1507
+ {
1508
+ "epoch": 0.1168141592920354,
1509
+ "grad_norm": 12.942100524902344,
1510
+ "learning_rate": 4.5283018867924526e-07,
1511
+ "loss": 3.8308,
1512
+ "step": 132
1513
+ },
1514
+ {
1515
+ "epoch": 0.11769911504424779,
1516
+ "grad_norm": 11.961737632751465,
1517
+ "learning_rate": 4.563679245283019e-07,
1518
+ "loss": 3.572,
1519
+ "step": 133
1520
+ },
1521
+ {
1522
+ "epoch": 0.11858407079646018,
1523
+ "grad_norm": 12.325026512145996,
1524
+ "learning_rate": 4.599056603773585e-07,
1525
+ "loss": 3.4299,
1526
+ "step": 134
1527
+ },
1528
+ {
1529
+ "epoch": 0.11946902654867257,
1530
+ "grad_norm": 12.118773460388184,
1531
+ "learning_rate": 4.6344339622641515e-07,
1532
+ "loss": 4.1541,
1533
+ "step": 135
1534
+ },
1535
+ {
1536
+ "epoch": 0.12035398230088495,
1537
+ "grad_norm": 11.99026107788086,
1538
+ "learning_rate": 4.669811320754717e-07,
1539
+ "loss": 3.584,
1540
+ "step": 136
1541
+ },
1542
+ {
1543
+ "epoch": 0.12123893805309735,
1544
+ "grad_norm": 15.083515167236328,
1545
+ "learning_rate": 4.7051886792452834e-07,
1546
+ "loss": 5.0977,
1547
+ "step": 137
1548
+ },
1549
+ {
1550
+ "epoch": 0.12212389380530973,
1551
+ "grad_norm": 15.059394836425781,
1552
+ "learning_rate": 4.740566037735849e-07,
1553
+ "loss": 4.6769,
1554
+ "step": 138
1555
+ },
1556
+ {
1557
+ "epoch": 0.12300884955752213,
1558
+ "grad_norm": 8.864882469177246,
1559
+ "learning_rate": 4.775943396226415e-07,
1560
+ "loss": 3.8396,
1561
+ "step": 139
1562
+ },
1563
+ {
1564
+ "epoch": 0.12389380530973451,
1565
+ "grad_norm": 12.116555213928223,
1566
+ "learning_rate": 4.811320754716981e-07,
1567
+ "loss": 3.2875,
1568
+ "step": 140
1569
+ },
1570
+ {
1571
+ "epoch": 0.12477876106194691,
1572
+ "grad_norm": 14.214646339416504,
1573
+ "learning_rate": 4.846698113207547e-07,
1574
+ "loss": 4.1946,
1575
+ "step": 141
1576
+ },
1577
+ {
1578
+ "epoch": 0.1256637168141593,
1579
+ "grad_norm": 16.207908630371094,
1580
+ "learning_rate": 4.882075471698113e-07,
1581
+ "loss": 4.9602,
1582
+ "step": 142
1583
+ },
1584
+ {
1585
+ "epoch": 0.12654867256637167,
1586
+ "grad_norm": 11.662668228149414,
1587
+ "learning_rate": 4.917452830188679e-07,
1588
+ "loss": 4.1531,
1589
+ "step": 143
1590
+ },
1591
+ {
1592
+ "epoch": 0.12743362831858407,
1593
+ "grad_norm": 12.429448127746582,
1594
+ "learning_rate": 4.952830188679246e-07,
1595
+ "loss": 3.8351,
1596
+ "step": 144
1597
+ },
1598
+ {
1599
+ "epoch": 0.12831858407079647,
1600
+ "grad_norm": 11.522616386413574,
1601
+ "learning_rate": 4.988207547169812e-07,
1602
+ "loss": 3.112,
1603
+ "step": 145
1604
+ },
1605
+ {
1606
+ "epoch": 0.12920353982300886,
1607
+ "grad_norm": 14.556803703308105,
1608
+ "learning_rate": 5.023584905660377e-07,
1609
+ "loss": 2.3145,
1610
+ "step": 146
1611
+ },
1612
+ {
1613
+ "epoch": 0.13008849557522123,
1614
+ "grad_norm": 12.348714828491211,
1615
+ "learning_rate": 5.058962264150944e-07,
1616
+ "loss": 4.0989,
1617
+ "step": 147
1618
+ },
1619
+ {
1620
+ "epoch": 0.13097345132743363,
1621
+ "grad_norm": 13.150403022766113,
1622
+ "learning_rate": 5.094339622641509e-07,
1623
+ "loss": 3.2173,
1624
+ "step": 148
1625
+ },
1626
+ {
1627
+ "epoch": 0.13185840707964602,
1628
+ "grad_norm": 12.066205978393555,
1629
+ "learning_rate": 5.129716981132076e-07,
1630
+ "loss": 2.7913,
1631
+ "step": 149
1632
+ },
1633
+ {
1634
+ "epoch": 0.13274336283185842,
1635
+ "grad_norm": 11.519116401672363,
1636
+ "learning_rate": 5.165094339622641e-07,
1637
+ "loss": 3.7627,
1638
+ "step": 150
1639
+ },
1640
+ {
1641
+ "epoch": 0.1336283185840708,
1642
+ "grad_norm": 12.59196662902832,
1643
+ "learning_rate": 5.200471698113208e-07,
1644
+ "loss": 3.3669,
1645
+ "step": 151
1646
+ },
1647
+ {
1648
+ "epoch": 0.13451327433628318,
1649
+ "grad_norm": 13.791536331176758,
1650
+ "learning_rate": 5.235849056603773e-07,
1651
+ "loss": 2.6775,
1652
+ "step": 152
1653
+ },
1654
+ {
1655
+ "epoch": 0.13539823008849558,
1656
+ "grad_norm": 11.906597137451172,
1657
+ "learning_rate": 5.27122641509434e-07,
1658
+ "loss": 3.2804,
1659
+ "step": 153
1660
+ },
1661
+ {
1662
+ "epoch": 0.13628318584070798,
1663
+ "grad_norm": 11.267363548278809,
1664
+ "learning_rate": 5.306603773584905e-07,
1665
+ "loss": 3.0676,
1666
+ "step": 154
1667
+ },
1668
+ {
1669
+ "epoch": 0.13716814159292035,
1670
+ "grad_norm": 12.373686790466309,
1671
+ "learning_rate": 5.341981132075471e-07,
1672
+ "loss": 3.1559,
1673
+ "step": 155
1674
+ },
1675
+ {
1676
+ "epoch": 0.13805309734513274,
1677
+ "grad_norm": 13.258451461791992,
1678
+ "learning_rate": 5.377358490566038e-07,
1679
+ "loss": 2.6638,
1680
+ "step": 156
1681
+ },
1682
+ {
1683
+ "epoch": 0.13893805309734514,
1684
+ "grad_norm": 12.79727554321289,
1685
+ "learning_rate": 5.412735849056604e-07,
1686
+ "loss": 2.8045,
1687
+ "step": 157
1688
+ },
1689
+ {
1690
+ "epoch": 0.13982300884955753,
1691
+ "grad_norm": 13.88683032989502,
1692
+ "learning_rate": 5.44811320754717e-07,
1693
+ "loss": 4.0568,
1694
+ "step": 158
1695
+ },
1696
+ {
1697
+ "epoch": 0.1407079646017699,
1698
+ "grad_norm": 12.57358169555664,
1699
+ "learning_rate": 5.483490566037736e-07,
1700
+ "loss": 2.7554,
1701
+ "step": 159
1702
+ },
1703
+ {
1704
+ "epoch": 0.1415929203539823,
1705
+ "grad_norm": 14.520818710327148,
1706
+ "learning_rate": 5.518867924528302e-07,
1707
+ "loss": 3.7407,
1708
+ "step": 160
1709
+ },
1710
+ {
1711
+ "epoch": 0.1415929203539823,
1712
+ "eval_Qnli-dev_cosine_accuracy": 0.62890625,
1713
+ "eval_Qnli-dev_cosine_accuracy_threshold": 0.9045097827911377,
1714
+ "eval_Qnli-dev_cosine_ap": 0.6193527955003784,
1715
+ "eval_Qnli-dev_cosine_f1": 0.6397415185783522,
1716
+ "eval_Qnli-dev_cosine_f1_threshold": 0.8351442813873291,
1717
+ "eval_Qnli-dev_cosine_precision": 0.5169712793733682,
1718
+ "eval_Qnli-dev_cosine_recall": 0.8389830508474576,
1719
+ "eval_Qnli-dev_dot_accuracy": 0.62890625,
1720
+ "eval_Qnli-dev_dot_accuracy_threshold": 694.7778930664062,
1721
+ "eval_Qnli-dev_dot_ap": 0.6194150916988216,
1722
+ "eval_Qnli-dev_dot_f1": 0.6397415185783522,
1723
+ "eval_Qnli-dev_dot_f1_threshold": 641.4969482421875,
1724
+ "eval_Qnli-dev_dot_precision": 0.5169712793733682,
1725
+ "eval_Qnli-dev_dot_recall": 0.8389830508474576,
1726
+ "eval_Qnli-dev_euclidean_accuracy": 0.62890625,
1727
+ "eval_Qnli-dev_euclidean_accuracy_threshold": 12.111844062805176,
1728
+ "eval_Qnli-dev_euclidean_ap": 0.6193576186776235,
1729
+ "eval_Qnli-dev_euclidean_f1": 0.6397415185783522,
1730
+ "eval_Qnli-dev_euclidean_f1_threshold": 15.914146423339844,
1731
+ "eval_Qnli-dev_euclidean_precision": 0.5169712793733682,
1732
+ "eval_Qnli-dev_euclidean_recall": 0.8389830508474576,
1733
+ "eval_Qnli-dev_manhattan_accuracy": 0.646484375,
1734
+ "eval_Qnli-dev_manhattan_accuracy_threshold": 245.2164306640625,
1735
+ "eval_Qnli-dev_manhattan_ap": 0.6417015148414534,
1736
+ "eval_Qnli-dev_manhattan_f1": 0.6521060842433698,
1737
+ "eval_Qnli-dev_manhattan_f1_threshold": 303.317626953125,
1738
+ "eval_Qnli-dev_manhattan_precision": 0.5160493827160494,
1739
+ "eval_Qnli-dev_manhattan_recall": 0.885593220338983,
1740
+ "eval_Qnli-dev_max_accuracy": 0.646484375,
1741
+ "eval_Qnli-dev_max_accuracy_threshold": 694.7778930664062,
1742
+ "eval_Qnli-dev_max_ap": 0.6417015148414534,
1743
+ "eval_Qnli-dev_max_f1": 0.6521060842433698,
1744
+ "eval_Qnli-dev_max_f1_threshold": 641.4969482421875,
1745
+ "eval_Qnli-dev_max_precision": 0.5169712793733682,
1746
+ "eval_Qnli-dev_max_recall": 0.885593220338983,
1747
+ "eval_allNLI-dev_cosine_accuracy": 0.66796875,
1748
+ "eval_allNLI-dev_cosine_accuracy_threshold": 0.9767438173294067,
1749
+ "eval_allNLI-dev_cosine_ap": 0.38624833037583434,
1750
+ "eval_allNLI-dev_cosine_f1": 0.5100182149362477,
1751
+ "eval_allNLI-dev_cosine_f1_threshold": 0.8540960550308228,
1752
+ "eval_allNLI-dev_cosine_precision": 0.3723404255319149,
1753
+ "eval_allNLI-dev_cosine_recall": 0.8092485549132948,
1754
+ "eval_allNLI-dev_dot_accuracy": 0.66796875,
1755
+ "eval_allNLI-dev_dot_accuracy_threshold": 750.345458984375,
1756
+ "eval_allNLI-dev_dot_ap": 0.3862261253421553,
1757
+ "eval_allNLI-dev_dot_f1": 0.5100182149362477,
1758
+ "eval_allNLI-dev_dot_f1_threshold": 656.0940551757812,
1759
+ "eval_allNLI-dev_dot_precision": 0.3723404255319149,
1760
+ "eval_allNLI-dev_dot_recall": 0.8092485549132948,
1761
+ "eval_allNLI-dev_euclidean_accuracy": 0.66796875,
1762
+ "eval_allNLI-dev_euclidean_accuracy_threshold": 5.977196216583252,
1763
+ "eval_allNLI-dev_euclidean_ap": 0.38624380046547035,
1764
+ "eval_allNLI-dev_euclidean_f1": 0.5100182149362477,
1765
+ "eval_allNLI-dev_euclidean_f1_threshold": 14.971920013427734,
1766
+ "eval_allNLI-dev_euclidean_precision": 0.3723404255319149,
1767
+ "eval_allNLI-dev_euclidean_recall": 0.8092485549132948,
1768
+ "eval_allNLI-dev_manhattan_accuracy": 0.6640625,
1769
+ "eval_allNLI-dev_manhattan_accuracy_threshold": 78.52637481689453,
1770
+ "eval_allNLI-dev_manhattan_ap": 0.3898187083180651,
1771
+ "eval_allNLI-dev_manhattan_f1": 0.5062388591800357,
1772
+ "eval_allNLI-dev_manhattan_f1_threshold": 285.7745361328125,
1773
+ "eval_allNLI-dev_manhattan_precision": 0.36597938144329895,
1774
+ "eval_allNLI-dev_manhattan_recall": 0.8208092485549133,
1775
+ "eval_allNLI-dev_max_accuracy": 0.66796875,
1776
+ "eval_allNLI-dev_max_accuracy_threshold": 750.345458984375,
1777
+ "eval_allNLI-dev_max_ap": 0.3898187083180651,
1778
+ "eval_allNLI-dev_max_f1": 0.5100182149362477,
1779
+ "eval_allNLI-dev_max_f1_threshold": 656.0940551757812,
1780
+ "eval_allNLI-dev_max_precision": 0.3723404255319149,
1781
+ "eval_allNLI-dev_max_recall": 0.8208092485549133,
1782
+ "eval_sequential_score": 0.6417015148414534,
1783
+ "eval_sts-test_pearson_cosine": 0.2853943019391156,
1784
+ "eval_sts-test_pearson_dot": 0.28526334639473966,
1785
+ "eval_sts-test_pearson_euclidean": 0.29405773952219494,
1786
+ "eval_sts-test_pearson_manhattan": 0.3110310476615048,
1787
+ "eval_sts-test_pearson_max": 0.3110310476615048,
1788
+ "eval_sts-test_spearman_cosine": 0.31414239162305135,
1789
+ "eval_sts-test_spearman_dot": 0.31380407209449446,
1790
+ "eval_sts-test_spearman_euclidean": 0.3141516551339523,
1791
+ "eval_sts-test_spearman_manhattan": 0.3366243060620438,
1792
+ "eval_sts-test_spearman_max": 0.3366243060620438,
1793
+ "eval_vitaminc-pairs_loss": 2.7439002990722656,
1794
+ "eval_vitaminc-pairs_runtime": 3.7639,
1795
+ "eval_vitaminc-pairs_samples_per_second": 34.007,
1796
+ "eval_vitaminc-pairs_steps_per_second": 0.266,
1797
+ "step": 160
1798
+ },
1799
+ {
1800
+ "epoch": 0.1415929203539823,
1801
+ "eval_negation-triplets_loss": 4.63640022277832,
1802
+ "eval_negation-triplets_runtime": 0.7072,
1803
+ "eval_negation-triplets_samples_per_second": 180.999,
1804
+ "eval_negation-triplets_steps_per_second": 1.414,
1805
+ "step": 160
1806
+ },
1807
+ {
1808
+ "epoch": 0.1415929203539823,
1809
+ "eval_scitail-pairs-pos_loss": 1.0088545083999634,
1810
+ "eval_scitail-pairs-pos_runtime": 0.8123,
1811
+ "eval_scitail-pairs-pos_samples_per_second": 157.577,
1812
+ "eval_scitail-pairs-pos_steps_per_second": 1.231,
1813
+ "step": 160
1814
+ },
1815
+ {
1816
+ "epoch": 0.1415929203539823,
1817
+ "eval_scitail-pairs-qa_loss": 1.1228678226470947,
1818
+ "eval_scitail-pairs-qa_runtime": 0.5444,
1819
+ "eval_scitail-pairs-qa_samples_per_second": 235.115,
1820
+ "eval_scitail-pairs-qa_steps_per_second": 1.837,
1821
+ "step": 160
1822
+ },
1823
+ {
1824
+ "epoch": 0.1415929203539823,
1825
+ "eval_xsum-pairs_loss": 5.4869818687438965,
1826
+ "eval_xsum-pairs_runtime": 2.8888,
1827
+ "eval_xsum-pairs_samples_per_second": 44.308,
1828
+ "eval_xsum-pairs_steps_per_second": 0.346,
1829
+ "step": 160
1830
+ },
1831
+ {
1832
+ "epoch": 0.1415929203539823,
1833
+ "eval_sciq_pairs_loss": 0.628353476524353,
1834
+ "eval_sciq_pairs_runtime": 3.8061,
1835
+ "eval_sciq_pairs_samples_per_second": 33.631,
1836
+ "eval_sciq_pairs_steps_per_second": 0.263,
1837
+ "step": 160
1838
+ },
1839
+ {
1840
+ "epoch": 0.1415929203539823,
1841
+ "eval_qasc_pairs_loss": 2.593322277069092,
1842
+ "eval_qasc_pairs_runtime": 0.6728,
1843
+ "eval_qasc_pairs_samples_per_second": 190.241,
1844
+ "eval_qasc_pairs_steps_per_second": 1.486,
1845
+ "step": 160
1846
+ },
1847
+ {
1848
+ "epoch": 0.1415929203539823,
1849
+ "eval_openbookqa_pairs_loss": 4.394308090209961,
1850
+ "eval_openbookqa_pairs_runtime": 0.5852,
1851
+ "eval_openbookqa_pairs_samples_per_second": 218.729,
1852
+ "eval_openbookqa_pairs_steps_per_second": 1.709,
1853
+ "step": 160
1854
+ },
1855
+ {
1856
+ "epoch": 0.1415929203539823,
1857
+ "eval_msmarco_pairs_loss": 5.656517505645752,
1858
+ "eval_msmarco_pairs_runtime": 1.2571,
1859
+ "eval_msmarco_pairs_samples_per_second": 101.822,
1860
+ "eval_msmarco_pairs_steps_per_second": 0.795,
1861
+ "step": 160
1862
+ },
1863
+ {
1864
+ "epoch": 0.1415929203539823,
1865
+ "eval_nq_pairs_loss": 5.986983776092529,
1866
+ "eval_nq_pairs_runtime": 2.5075,
1867
+ "eval_nq_pairs_samples_per_second": 51.047,
1868
+ "eval_nq_pairs_steps_per_second": 0.399,
1869
+ "step": 160
1870
+ },
1871
+ {
1872
+ "epoch": 0.1415929203539823,
1873
+ "eval_trivia_pairs_loss": 5.694415092468262,
1874
+ "eval_trivia_pairs_runtime": 3.6302,
1875
+ "eval_trivia_pairs_samples_per_second": 35.26,
1876
+ "eval_trivia_pairs_steps_per_second": 0.275,
1877
+ "step": 160
1878
+ },
1879
+ {
1880
+ "epoch": 0.1415929203539823,
1881
+ "eval_gooaq_pairs_loss": 5.3856658935546875,
1882
+ "eval_gooaq_pairs_runtime": 0.9618,
1883
+ "eval_gooaq_pairs_samples_per_second": 133.082,
1884
+ "eval_gooaq_pairs_steps_per_second": 1.04,
1885
+ "step": 160
1886
+ },
1887
+ {
1888
+ "epoch": 0.1415929203539823,
1889
+ "eval_paws-pos_loss": 0.3622308671474457,
1890
+ "eval_paws-pos_runtime": 0.6678,
1891
+ "eval_paws-pos_samples_per_second": 191.674,
1892
+ "eval_paws-pos_steps_per_second": 1.497,
1893
+ "step": 160
1894
+ },
1895
+ {
1896
+ "epoch": 0.1415929203539823,
1897
+ "eval_global_dataset_loss": 3.401135206222534,
1898
+ "eval_global_dataset_runtime": 23.0422,
1899
+ "eval_global_dataset_samples_per_second": 28.773,
1900
+ "eval_global_dataset_steps_per_second": 0.26,
1901
+ "step": 160
1902
+ }
1903
+ ],
1904
+ "logging_steps": 1,
1905
+ "max_steps": 3390,
1906
+ "num_input_tokens_seen": 0,
1907
+ "num_train_epochs": 3,
1908
+ "save_steps": 80,
1909
+ "stateful_callbacks": {
1910
+ "TrainerControl": {
1911
+ "args": {
1912
+ "should_epoch_stop": false,
1913
+ "should_evaluate": false,
1914
+ "should_log": false,
1915
+ "should_save": true,
1916
+ "should_training_stop": false
1917
+ },
1918
+ "attributes": {}
1919
+ }
1920
+ },
1921
+ "total_flos": 0.0,
1922
+ "train_batch_size": 42,
1923
+ "trial_name": null,
1924
+ "trial_params": null
1925
+ }
checkpoint-160/training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7bfb21b1a8b0022475cba81f0306eaa079a06c682d78c599327457cfd397d216
3
+ size 5688