whitemouse84 commited on
Commit
d0a690d
·
verified ·
1 Parent(s): 2412ed2

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +717 -705
README.md CHANGED
@@ -1,706 +1,718 @@
1
- ---
2
- base_model: cointegrated/LaBSE-en-ru
3
- datasets: []
4
- language: []
5
- library_name: sentence-transformers
6
- metrics:
7
- - pearson_cosine
8
- - spearman_cosine
9
- - pearson_manhattan
10
- - spearman_manhattan
11
- - pearson_euclidean
12
- - spearman_euclidean
13
- - pearson_dot
14
- - spearman_dot
15
- - pearson_max
16
- - spearman_max
17
- - negative_mse
18
- pipeline_tag: sentence-similarity
19
- tags:
20
- - sentence-transformers
21
- - sentence-similarity
22
- - feature-extraction
23
- - generated_from_trainer
24
- - dataset_size:10975066
25
- - loss:MSELoss
26
- widget:
27
- - source_sentence: Такие лодки строились, чтобы получить быстрый доступ к приходящим
28
- судам.
29
- sentences:
30
- - been nice talking to you
31
- - Нельзя ставить под сомнение притязания клиента, если не были предприняты шаги.
32
- - Dharangaon Railway Station serves Dharangaon in Jalgaon district in the Indian
33
- state of Maharashtra.
34
- - source_sentence: Если прилагательные смягчают этнические термины, существительные
35
- могут сделать их жестче.
36
- sentences:
37
- - Вслед за этим последовало секретное письмо А.Б.Чубайса об изъятии у МЦР, переданного
38
- ему С.Н.Рерихом наследия.
39
- - Coaches should not give young athletes a hard time.
40
- - Эшкрофт хотел прослушивать сводки новостей снова и снова
41
- - source_sentence: Земля была мягкой.
42
- sentences:
43
- - По мере того, как самообладание покидало его, сердце его все больше наполнялось
44
- тревогой.
45
- - Our borders and immigration system, including law enforcement, ought to send a
46
- message of welcome, tolerance, and justice to members of immigrant communities
47
- in the United States and in their countries of origin.
48
- - Начнут действовать льготные условия аренды земель, которые предназначены для реализации
49
- инвестиционных проектов.
50
- - source_sentence: 'Что же касается рава Кука: мой рав лично знал его и много раз
51
- с теплотой рассказывал мне о нем как о великом каббалисте.'
52
- sentences:
53
- - Вдова Эдгара Эванса, его дети и мать получили 1500 фунтов стерлингов (
54
- - Please do not make any changes to your address.
55
- - Мы уже закончили все запланированные дела!
56
- - source_sentence: See Name section.
57
- sentences:
58
- - Ms. Packard is the voice of the female blood elf in the video game World of Warcraft.
59
- - Основным функциональным элементом, реализующим функции управления соединением,
60
- является абонентский терминал.
61
- - Yeah, people who might not be hungry.
62
- model-index:
63
- - name: SentenceTransformer based on cointegrated/LaBSE-en-ru
64
- results:
65
- - task:
66
- type: semantic-similarity
67
- name: Semantic Similarity
68
- dataset:
69
- name: sts dev
70
- type: sts-dev
71
- metrics:
72
- - type: pearson_cosine
73
- value: 0.5305176535187099
74
- name: Pearson Cosine
75
- - type: spearman_cosine
76
- value: 0.6347069834349862
77
- name: Spearman Cosine
78
- - type: pearson_manhattan
79
- value: 0.5553415140113596
80
- name: Pearson Manhattan
81
- - type: spearman_manhattan
82
- value: 0.6389336208598283
83
- name: Spearman Manhattan
84
- - type: pearson_euclidean
85
- value: 0.5499910306125031
86
- name: Pearson Euclidean
87
- - type: spearman_euclidean
88
- value: 0.6347073809507647
89
- name: Spearman Euclidean
90
- - type: pearson_dot
91
- value: 0.5305176585564861
92
- name: Pearson Dot
93
- - type: spearman_dot
94
- value: 0.6347078463557637
95
- name: Spearman Dot
96
- - type: pearson_max
97
- value: 0.5553415140113596
98
- name: Pearson Max
99
- - type: spearman_max
100
- value: 0.6389336208598283
101
- name: Spearman Max
102
- - task:
103
- type: knowledge-distillation
104
- name: Knowledge Distillation
105
- dataset:
106
- name: Unknown
107
- type: unknown
108
- metrics:
109
- - type: negative_mse
110
- value: -0.006337030936265364
111
- name: Negative Mse
112
- - task:
113
- type: semantic-similarity
114
- name: Semantic Similarity
115
- dataset:
116
- name: sts test
117
- type: sts-test
118
- metrics:
119
- - type: pearson_cosine
120
- value: 0.5042796836494269
121
- name: Pearson Cosine
122
- - type: spearman_cosine
123
- value: 0.5986471772428711
124
- name: Spearman Cosine
125
- - type: pearson_manhattan
126
- value: 0.522744495080616
127
- name: Pearson Manhattan
128
- - type: spearman_manhattan
129
- value: 0.5983901280447074
130
- name: Spearman Manhattan
131
- - type: pearson_euclidean
132
- value: 0.522721961447153
133
- name: Pearson Euclidean
134
- - type: spearman_euclidean
135
- value: 0.5986471095414022
136
- name: Spearman Euclidean
137
- - type: pearson_dot
138
- value: 0.504279685613151
139
- name: Pearson Dot
140
- - type: spearman_dot
141
- value: 0.598648155615724
142
- name: Spearman Dot
143
- - type: pearson_max
144
- value: 0.522744495080616
145
- name: Pearson Max
146
- - type: spearman_max
147
- value: 0.598648155615724
148
- name: Spearman Max
149
- ---
150
-
151
- # SentenceTransformer based on cointegrated/LaBSE-en-ru
152
-
153
- This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [cointegrated/LaBSE-en-ru](https://huggingface.co/cointegrated/LaBSE-en-ru). It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
154
-
155
- ## Model Details
156
-
157
- ### Model Description
158
- - **Model Type:** Sentence Transformer
159
- - **Base model:** [cointegrated/LaBSE-en-ru](https://huggingface.co/cointegrated/LaBSE-en-ru) <!-- at revision cf0714e606d4af551e14ad69a7929cd6b0da7f7e -->
160
- - **Maximum Sequence Length:** 512 tokens
161
- - **Output Dimensionality:** 768 tokens
162
- - **Similarity Function:** Cosine Similarity
163
- <!-- - **Training Dataset:** Unknown -->
164
- <!-- - **Language:** Unknown -->
165
- <!-- - **License:** Unknown -->
166
-
167
- ### Model Sources
168
-
169
- - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
170
- - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
171
- - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
172
-
173
- ### Full Model Architecture
174
-
175
- ```
176
- SentenceTransformer(
177
- (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
178
- (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
179
- (2): Dense({'in_features': 768, 'out_features': 768, 'bias': True, 'activation_function': 'torch.nn.modules.activation.Tanh'})
180
- (3): Normalize()
181
- )
182
- ```
183
-
184
- ## Usage
185
-
186
- ### Direct Usage (Sentence Transformers)
187
-
188
- First install the Sentence Transformers library:
189
-
190
- ```bash
191
- pip install -U sentence-transformers
192
- ```
193
-
194
- Then you can load this model and run inference.
195
- ```python
196
- from sentence_transformers import SentenceTransformer
197
-
198
- # Download from the 🤗 Hub
199
- model = SentenceTransformer("whitemouse84/LaBSE-en-ru-distilled-each-third-layer")
200
- # Run inference
201
- sentences = [
202
- 'See Name section.',
203
- 'Ms. Packard is the voice of the female blood elf in the video game World of Warcraft.',
204
- 'Yeah, people who might not be hungry.',
205
- ]
206
- embeddings = model.encode(sentences)
207
- print(embeddings.shape)
208
- # [3, 768]
209
-
210
- # Get the similarity scores for the embeddings
211
- similarities = model.similarity(embeddings, embeddings)
212
- print(similarities.shape)
213
- # [3, 3]
214
- ```
215
-
216
- <!--
217
- ### Direct Usage (Transformers)
218
-
219
- <details><summary>Click to see the direct usage in Transformers</summary>
220
-
221
- </details>
222
- -->
223
-
224
- <!--
225
- ### Downstream Usage (Sentence Transformers)
226
-
227
- You can finetune this model on your own dataset.
228
-
229
- <details><summary>Click to expand</summary>
230
-
231
- </details>
232
- -->
233
-
234
- <!--
235
- ### Out-of-Scope Use
236
-
237
- *List how the model may foreseeably be misused and address what users ought not to do with the model.*
238
- -->
239
-
240
- ## Evaluation
241
-
242
- ### Metrics
243
-
244
- #### Semantic Similarity
245
- * Dataset: `sts-dev`
246
- * Evaluated with [<code>EmbeddingSimilarityEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.EmbeddingSimilarityEvaluator)
247
-
248
- | Metric | Value |
249
- |:--------------------|:-----------|
250
- | pearson_cosine | 0.5305 |
251
- | **spearman_cosine** | **0.6347** |
252
- | pearson_manhattan | 0.5553 |
253
- | spearman_manhattan | 0.6389 |
254
- | pearson_euclidean | 0.55 |
255
- | spearman_euclidean | 0.6347 |
256
- | pearson_dot | 0.5305 |
257
- | spearman_dot | 0.6347 |
258
- | pearson_max | 0.5553 |
259
- | spearman_max | 0.6389 |
260
-
261
- #### Knowledge Distillation
262
-
263
- * Evaluated with [<code>MSEEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.MSEEvaluator)
264
-
265
- | Metric | Value |
266
- |:-----------------|:------------|
267
- | **negative_mse** | **-0.0063** |
268
-
269
- #### Semantic Similarity
270
- * Dataset: `sts-test`
271
- * Evaluated with [<code>EmbeddingSimilarityEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.EmbeddingSimilarityEvaluator)
272
-
273
- | Metric | Value |
274
- |:--------------------|:-----------|
275
- | pearson_cosine | 0.5043 |
276
- | **spearman_cosine** | **0.5986** |
277
- | pearson_manhattan | 0.5227 |
278
- | spearman_manhattan | 0.5984 |
279
- | pearson_euclidean | 0.5227 |
280
- | spearman_euclidean | 0.5986 |
281
- | pearson_dot | 0.5043 |
282
- | spearman_dot | 0.5986 |
283
- | pearson_max | 0.5227 |
284
- | spearman_max | 0.5986 |
285
-
286
- <!--
287
- ## Bias, Risks and Limitations
288
-
289
- *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
290
- -->
291
-
292
- <!--
293
- ### Recommendations
294
-
295
- *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
296
- -->
297
-
298
- ## Training Details
299
-
300
- ### Training Dataset
301
-
302
- #### Unnamed Dataset
303
-
304
-
305
- * Size: 10,975,066 training samples
306
- * Columns: <code>sentence</code> and <code>label</code>
307
- * Approximate statistics based on the first 1000 samples:
308
- | | sentence | label |
309
- |:--------|:-----------------------------------------------------------------------------------|:-------------------------------------|
310
- | type | string | list |
311
- | details | <ul><li>min: 6 tokens</li><li>mean: 26.93 tokens</li><li>max: 139 tokens</li></ul> | <ul><li>size: 768 elements</li></ul> |
312
- * Samples:
313
- | sentence | label |
314
- |:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------------------------------------------------------|
315
- | <code>It is based on the Java Persistence API (JPA), but it does not strictly follow the JSR 338 Specification, as it implements different design patterns and technologies.</code> | <code>[-0.012331949546933174, -0.04570527374744415, -0.024963658303022385, -0.03620213270187378, 0.022556383162736893, ...]</code> |
316
- | <code>Покупаем вторичное сырье в Каунасе (Переработка вторичного сырья) - Алфенас АНД КО, ЗАО на Bizorg.</code> | <code>[-0.07498518377542496, -0.01913534104824066, -0.01797042042016983, 0.048263177275657654, -0.00016611881437711418, ...]</code> |
317
- | <code>At the Equal Justice Conference ( EJC ) held in March 2001 in San Diego , LSC and the Project for the Future of Equal Justice held the second Case Management Software pre-conference .</code> | <code>[0.03870972990989685, -0.0638347640633583, -0.01696585863828659, -0.043612319976091385, -0.048241738229990005, ...]</code> |
318
- * Loss: [<code>MSELoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#mseloss)
319
-
320
- ### Evaluation Dataset
321
-
322
- #### Unnamed Dataset
323
-
324
-
325
- * Size: 10,000 evaluation samples
326
- * Columns: <code>sentence</code> and <code>label</code>
327
- * Approximate statistics based on the first 1000 samples:
328
- | | sentence | label |
329
- |:--------|:-----------------------------------------------------------------------------------|:-------------------------------------|
330
- | type | string | list |
331
- | details | <ul><li>min: 5 tokens</li><li>mean: 24.18 tokens</li><li>max: 111 tokens</li></ul> | <ul><li>size: 768 elements</li></ul> |
332
- * Samples:
333
- | sentence | label |
334
- |:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------|
335
- | <code>The Canadian Canoe Museum is a museum dedicated to canoes located in Peterborough, Ontario, Canada.</code> | <code>[-0.05444105342030525, -0.03650881350040436, -0.041163671761751175, -0.010616903193295002, -0.04094529151916504, ...]</code> |
336
- | <code>И мне нравилось, что я одновременно зарабатываю и смотрю бои».</code> | <code>[-0.03404555842280388, 0.028203096240758896, -0.056121889501810074, -0.0591997392475605, -0.05523117259144783, ...]</code> |
337
- | <code>Ну, а на следующий день, разумеется, Президент Кеннеди объявил блокаду Кубы, и наши корабли остановили у кубинских берегов направлявшийся на Кубу российский корабль, и у него на борту нашли ракеты.</code> | <code>[-0.008193841204047203, 0.00694894278421998, -0.03027420863509178, -0.03290146216750145, 0.01425305474549532, ...]</code> |
338
- * Loss: [<code>MSELoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#mseloss)
339
-
340
- ### Training Hyperparameters
341
- #### Non-Default Hyperparameters
342
-
343
- - `eval_strategy`: steps
344
- - `per_device_train_batch_size`: 64
345
- - `per_device_eval_batch_size`: 64
346
- - `learning_rate`: 0.0001
347
- - `num_train_epochs`: 1
348
- - `warmup_ratio`: 0.1
349
- - `fp16`: True
350
- - `load_best_model_at_end`: True
351
-
352
- #### All Hyperparameters
353
- <details><summary>Click to expand</summary>
354
-
355
- - `overwrite_output_dir`: False
356
- - `do_predict`: False
357
- - `eval_strategy`: steps
358
- - `prediction_loss_only`: True
359
- - `per_device_train_batch_size`: 64
360
- - `per_device_eval_batch_size`: 64
361
- - `per_gpu_train_batch_size`: None
362
- - `per_gpu_eval_batch_size`: None
363
- - `gradient_accumulation_steps`: 1
364
- - `eval_accumulation_steps`: None
365
- - `torch_empty_cache_steps`: None
366
- - `learning_rate`: 0.0001
367
- - `weight_decay`: 0.0
368
- - `adam_beta1`: 0.9
369
- - `adam_beta2`: 0.999
370
- - `adam_epsilon`: 1e-08
371
- - `max_grad_norm`: 1.0
372
- - `num_train_epochs`: 1
373
- - `max_steps`: -1
374
- - `lr_scheduler_type`: linear
375
- - `lr_scheduler_kwargs`: {}
376
- - `warmup_ratio`: 0.1
377
- - `warmup_steps`: 0
378
- - `log_level`: passive
379
- - `log_level_replica`: warning
380
- - `log_on_each_node`: True
381
- - `logging_nan_inf_filter`: True
382
- - `save_safetensors`: True
383
- - `save_on_each_node`: False
384
- - `save_only_model`: False
385
- - `restore_callback_states_from_checkpoint`: False
386
- - `no_cuda`: False
387
- - `use_cpu`: False
388
- - `use_mps_device`: False
389
- - `seed`: 42
390
- - `data_seed`: None
391
- - `jit_mode_eval`: False
392
- - `use_ipex`: False
393
- - `bf16`: False
394
- - `fp16`: True
395
- - `fp16_opt_level`: O1
396
- - `half_precision_backend`: auto
397
- - `bf16_full_eval`: False
398
- - `fp16_full_eval`: False
399
- - `tf32`: None
400
- - `local_rank`: 0
401
- - `ddp_backend`: None
402
- - `tpu_num_cores`: None
403
- - `tpu_metrics_debug`: False
404
- - `debug`: []
405
- - `dataloader_drop_last`: False
406
- - `dataloader_num_workers`: 0
407
- - `dataloader_prefetch_factor`: None
408
- - `past_index`: -1
409
- - `disable_tqdm`: False
410
- - `remove_unused_columns`: True
411
- - `label_names`: None
412
- - `load_best_model_at_end`: True
413
- - `ignore_data_skip`: False
414
- - `fsdp`: []
415
- - `fsdp_min_num_params`: 0
416
- - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
417
- - `fsdp_transformer_layer_cls_to_wrap`: None
418
- - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
419
- - `deepspeed`: None
420
- - `label_smoothing_factor`: 0.0
421
- - `optim`: adamw_torch
422
- - `optim_args`: None
423
- - `adafactor`: False
424
- - `group_by_length`: False
425
- - `length_column_name`: length
426
- - `ddp_find_unused_parameters`: None
427
- - `ddp_bucket_cap_mb`: None
428
- - `ddp_broadcast_buffers`: False
429
- - `dataloader_pin_memory`: True
430
- - `dataloader_persistent_workers`: False
431
- - `skip_memory_metrics`: True
432
- - `use_legacy_prediction_loop`: False
433
- - `push_to_hub`: False
434
- - `resume_from_checkpoint`: None
435
- - `hub_model_id`: None
436
- - `hub_strategy`: every_save
437
- - `hub_private_repo`: False
438
- - `hub_always_push`: False
439
- - `gradient_checkpointing`: False
440
- - `gradient_checkpointing_kwargs`: None
441
- - `include_inputs_for_metrics`: False
442
- - `eval_do_concat_batches`: True
443
- - `fp16_backend`: auto
444
- - `push_to_hub_model_id`: None
445
- - `push_to_hub_organization`: None
446
- - `mp_parameters`:
447
- - `auto_find_batch_size`: False
448
- - `full_determinism`: False
449
- - `torchdynamo`: None
450
- - `ray_scope`: last
451
- - `ddp_timeout`: 1800
452
- - `torch_compile`: False
453
- - `torch_compile_backend`: None
454
- - `torch_compile_mode`: None
455
- - `dispatch_batches`: None
456
- - `split_batches`: None
457
- - `include_tokens_per_second`: False
458
- - `include_num_input_tokens_seen`: False
459
- - `neftune_noise_alpha`: None
460
- - `optim_target_modules`: None
461
- - `batch_eval_metrics`: False
462
- - `eval_on_start`: False
463
- - `eval_use_gather_object`: False
464
- - `batch_sampler`: batch_sampler
465
- - `multi_dataset_batch_sampler`: proportional
466
-
467
- </details>
468
-
469
- ### Training Logs
470
- <details><summary>Click to expand</summary>
471
-
472
- | Epoch | Step | Training Loss | loss | negative_mse | sts-dev_spearman_cosine | sts-test_spearman_cosine |
473
- |:----------:|:--------:|:-------------:|:----------:|:------------:|:-----------------------:|:------------------------:|
474
- | 0 | 0 | - | - | -0.2381 | 0.4206 | - |
475
- | 0.0058 | 1000 | 0.0014 | - | - | - | - |
476
- | 0.0117 | 2000 | 0.0009 | - | - | - | - |
477
- | 0.0175 | 3000 | 0.0007 | - | - | - | - |
478
- | 0.0233 | 4000 | 0.0006 | - | - | - | - |
479
- | **0.0292** | **5000** | **0.0005** | **0.0004** | **-0.0363** | **0.6393** | **-** |
480
- | 0.0350 | 6000 | 0.0004 | - | - | - | - |
481
- | 0.0408 | 7000 | 0.0004 | - | - | - | - |
482
- | 0.0467 | 8000 | 0.0003 | - | - | - | - |
483
- | 0.0525 | 9000 | 0.0003 | - | - | - | - |
484
- | 0.0583 | 10000 | 0.0003 | 0.0002 | -0.0207 | 0.6350 | - |
485
- | 0.0641 | 11000 | 0.0003 | - | - | - | - |
486
- | 0.0700 | 12000 | 0.0003 | - | - | - | - |
487
- | 0.0758 | 13000 | 0.0002 | - | - | - | - |
488
- | 0.0816 | 14000 | 0.0002 | - | - | - | - |
489
- | 0.0875 | 15000 | 0.0002 | 0.0002 | -0.0157 | 0.6328 | - |
490
- | 0.0933 | 16000 | 0.0002 | - | - | - | - |
491
- | 0.0991 | 17000 | 0.0002 | - | - | - | - |
492
- | 0.1050 | 18000 | 0.0002 | - | - | - | - |
493
- | 0.1108 | 19000 | 0.0002 | - | - | - | - |
494
- | 0.1166 | 20000 | 0.0002 | 0.0001 | -0.0132 | 0.6317 | - |
495
- | 0.1225 | 21000 | 0.0002 | - | - | - | - |
496
- | 0.1283 | 22000 | 0.0002 | - | - | - | - |
497
- | 0.1341 | 23000 | 0.0002 | - | - | - | - |
498
- | 0.1400 | 24000 | 0.0002 | - | - | - | - |
499
- | 0.1458 | 25000 | 0.0002 | 0.0001 | -0.0118 | 0.6251 | - |
500
- | 0.1516 | 26000 | 0.0002 | - | - | - | - |
501
- | 0.1574 | 27000 | 0.0002 | - | - | - | - |
502
- | 0.1633 | 28000 | 0.0002 | - | - | - | - |
503
- | 0.1691 | 29000 | 0.0002 | - | - | - | - |
504
- | 0.1749 | 30000 | 0.0002 | 0.0001 | -0.0109 | 0.6304 | - |
505
- | 0.1808 | 31000 | 0.0002 | - | - | - | - |
506
- | 0.1866 | 32000 | 0.0002 | - | - | - | - |
507
- | 0.1924 | 33000 | 0.0002 | - | - | - | - |
508
- | 0.1983 | 34000 | 0.0001 | - | - | - | - |
509
- | 0.2041 | 35000 | 0.0001 | 0.0001 | -0.0102 | 0.6280 | - |
510
- | 0.2099 | 36000 | 0.0001 | - | - | - | - |
511
- | 0.2158 | 37000 | 0.0001 | - | - | - | - |
512
- | 0.2216 | 38000 | 0.0001 | - | - | - | - |
513
- | 0.2274 | 39000 | 0.0001 | - | - | - | - |
514
- | 0.2333 | 40000 | 0.0001 | 0.0001 | -0.0098 | 0.6272 | - |
515
- | 0.2391 | 41000 | 0.0001 | - | - | - | - |
516
- | 0.2449 | 42000 | 0.0001 | - | - | - | - |
517
- | 0.2507 | 43000 | 0.0001 | - | - | - | - |
518
- | 0.2566 | 44000 | 0.0001 | - | - | - | - |
519
- | 0.2624 | 45000 | 0.0001 | 0.0001 | -0.0093 | 0.6378 | - |
520
- | 0.2682 | 46000 | 0.0001 | - | - | - | - |
521
- | 0.2741 | 47000 | 0.0001 | - | - | - | - |
522
- | 0.2799 | 48000 | 0.0001 | - | - | - | - |
523
- | 0.2857 | 49000 | 0.0001 | - | - | - | - |
524
- | 0.2916 | 50000 | 0.0001 | 0.0001 | -0.0089 | 0.6325 | - |
525
- | 0.2974 | 51000 | 0.0001 | - | - | - | - |
526
- | 0.3032 | 52000 | 0.0001 | - | - | - | - |
527
- | 0.3091 | 53000 | 0.0001 | - | - | - | - |
528
- | 0.3149 | 54000 | 0.0001 | - | - | - | - |
529
- | 0.3207 | 55000 | 0.0001 | 0.0001 | -0.0087 | 0.6328 | - |
530
- | 0.3266 | 56000 | 0.0001 | - | - | - | - |
531
- | 0.3324 | 57000 | 0.0001 | - | - | - | - |
532
- | 0.3382 | 58000 | 0.0001 | - | - | - | - |
533
- | 0.3441 | 59000 | 0.0001 | - | - | - | - |
534
- | 0.3499 | 60000 | 0.0001 | 0.0001 | -0.0085 | 0.6357 | - |
535
- | 0.3557 | 61000 | 0.0001 | - | - | - | - |
536
- | 0.3615 | 62000 | 0.0001 | - | - | - | - |
537
- | 0.3674 | 63000 | 0.0001 | - | - | - | - |
538
- | 0.3732 | 64000 | 0.0001 | - | - | - | - |
539
- | 0.3790 | 65000 | 0.0001 | 0.0001 | -0.0083 | 0.6366 | - |
540
- | 0.3849 | 66000 | 0.0001 | - | - | - | - |
541
- | 0.3907 | 67000 | 0.0001 | - | - | - | - |
542
- | 0.3965 | 68000 | 0.0001 | - | - | - | - |
543
- | 0.4024 | 69000 | 0.0001 | - | - | - | - |
544
- | 0.4082 | 70000 | 0.0001 | 0.0001 | -0.0080 | 0.6325 | - |
545
- | 0.4140 | 71000 | 0.0001 | - | - | - | - |
546
- | 0.4199 | 72000 | 0.0001 | - | - | - | - |
547
- | 0.4257 | 73000 | 0.0001 | - | - | - | - |
548
- | 0.4315 | 74000 | 0.0001 | - | - | - | - |
549
- | 0.4374 | 75000 | 0.0001 | 0.0001 | -0.0078 | 0.6351 | - |
550
- | 0.4432 | 76000 | 0.0001 | - | - | - | - |
551
- | 0.4490 | 77000 | 0.0001 | - | - | - | - |
552
- | 0.4548 | 78000 | 0.0001 | - | - | - | - |
553
- | 0.4607 | 79000 | 0.0001 | - | - | - | - |
554
- | 0.4665 | 80000 | 0.0001 | 0.0001 | -0.0077 | 0.6323 | - |
555
- | 0.4723 | 81000 | 0.0001 | - | - | - | - |
556
- | 0.4782 | 82000 | 0.0001 | - | - | - | - |
557
- | 0.4840 | 83000 | 0.0001 | - | - | - | - |
558
- | 0.4898 | 84000 | 0.0001 | - | - | - | - |
559
- | 0.4957 | 85000 | 0.0001 | 0.0001 | -0.0076 | 0.6316 | - |
560
- | 0.5015 | 86000 | 0.0001 | - | - | - | - |
561
- | 0.5073 | 87000 | 0.0001 | - | - | - | - |
562
- | 0.5132 | 88000 | 0.0001 | - | - | - | - |
563
- | 0.5190 | 89000 | 0.0001 | - | - | - | - |
564
- | 0.5248 | 90000 | 0.0001 | 0.0001 | -0.0074 | 0.6306 | - |
565
- | 0.5307 | 91000 | 0.0001 | - | - | - | - |
566
- | 0.5365 | 92000 | 0.0001 | - | - | - | - |
567
- | 0.5423 | 93000 | 0.0001 | - | - | - | - |
568
- | 0.5481 | 94000 | 0.0001 | - | - | - | - |
569
- | 0.5540 | 95000 | 0.0001 | 0.0001 | -0.0073 | 0.6305 | - |
570
- | 0.5598 | 96000 | 0.0001 | - | - | - | - |
571
- | 0.5656 | 97000 | 0.0001 | - | - | - | - |
572
- | 0.5715 | 98000 | 0.0001 | - | - | - | - |
573
- | 0.5773 | 99000 | 0.0001 | - | - | - | - |
574
- | 0.5831 | 100000 | 0.0001 | 0.0001 | -0.0072 | 0.6333 | - |
575
- | 0.5890 | 101000 | 0.0001 | - | - | - | - |
576
- | 0.5948 | 102000 | 0.0001 | - | - | - | - |
577
- | 0.6006 | 103000 | 0.0001 | - | - | - | - |
578
- | 0.6065 | 104000 | 0.0001 | - | - | - | - |
579
- | 0.6123 | 105000 | 0.0001 | 0.0001 | -0.0071 | 0.6351 | - |
580
- | 0.6181 | 106000 | 0.0001 | - | - | - | - |
581
- | 0.6240 | 107000 | 0.0001 | - | - | - | - |
582
- | 0.6298 | 108000 | 0.0001 | - | - | - | - |
583
- | 0.6356 | 109000 | 0.0001 | - | - | - | - |
584
- | 0.6415 | 110000 | 0.0001 | 0.0001 | -0.0070 | 0.6330 | - |
585
- | 0.6473 | 111000 | 0.0001 | - | - | - | - |
586
- | 0.6531 | 112000 | 0.0001 | - | - | - | - |
587
- | 0.6589 | 113000 | 0.0001 | - | - | - | - |
588
- | 0.6648 | 114000 | 0.0001 | - | - | - | - |
589
- | 0.6706 | 115000 | 0.0001 | 0.0001 | -0.0070 | 0.6336 | - |
590
- | 0.6764 | 116000 | 0.0001 | - | - | - | - |
591
- | 0.6823 | 117000 | 0.0001 | - | - | - | - |
592
- | 0.6881 | 118000 | 0.0001 | - | - | - | - |
593
- | 0.6939 | 119000 | 0.0001 | - | - | - | - |
594
- | 0.6998 | 120000 | 0.0001 | 0.0001 | -0.0069 | 0.6305 | - |
595
- | 0.7056 | 121000 | 0.0001 | - | - | - | - |
596
- | 0.7114 | 122000 | 0.0001 | - | - | - | - |
597
- | 0.7173 | 123000 | 0.0001 | - | - | - | - |
598
- | 0.7231 | 124000 | 0.0001 | - | - | - | - |
599
- | 0.7289 | 125000 | 0.0001 | 0.0001 | -0.0068 | 0.6362 | - |
600
- | 0.7348 | 126000 | 0.0001 | - | - | - | - |
601
- | 0.7406 | 127000 | 0.0001 | - | - | - | - |
602
- | 0.7464 | 128000 | 0.0001 | - | - | - | - |
603
- | 0.7522 | 129000 | 0.0001 | - | - | - | - |
604
- | 0.7581 | 130000 | 0.0001 | 0.0001 | -0.0067 | 0.6340 | - |
605
- | 0.7639 | 131000 | 0.0001 | - | - | - | - |
606
- | 0.7697 | 132000 | 0.0001 | - | - | - | - |
607
- | 0.7756 | 133000 | 0.0001 | - | - | - | - |
608
- | 0.7814 | 134000 | 0.0001 | - | - | - | - |
609
- | 0.7872 | 135000 | 0.0001 | 0.0001 | -0.0067 | 0.6365 | - |
610
- | 0.7931 | 136000 | 0.0001 | - | - | - | - |
611
- | 0.7989 | 137000 | 0.0001 | - | - | - | - |
612
- | 0.8047 | 138000 | 0.0001 | - | - | - | - |
613
- | 0.8106 | 139000 | 0.0001 | - | - | - | - |
614
- | 0.8164 | 140000 | 0.0001 | 0.0001 | -0.0066 | 0.6339 | - |
615
- | 0.8222 | 141000 | 0.0001 | - | - | - | - |
616
- | 0.8281 | 142000 | 0.0001 | - | - | - | - |
617
- | 0.8339 | 143000 | 0.0001 | - | - | - | - |
618
- | 0.8397 | 144000 | 0.0001 | - | - | - | - |
619
- | 0.8456 | 145000 | 0.0001 | 0.0001 | -0.0066 | 0.6352 | - |
620
- | 0.8514 | 146000 | 0.0001 | - | - | - | - |
621
- | 0.8572 | 147000 | 0.0001 | - | - | - | - |
622
- | 0.8630 | 148000 | 0.0001 | - | - | - | - |
623
- | 0.8689 | 149000 | 0.0001 | - | - | - | - |
624
- | 0.8747 | 150000 | 0.0001 | 0.0001 | -0.0065 | 0.6357 | - |
625
- | 0.8805 | 151000 | 0.0001 | - | - | - | - |
626
- | 0.8864 | 152000 | 0.0001 | - | - | - | - |
627
- | 0.8922 | 153000 | 0.0001 | - | - | - | - |
628
- | 0.8980 | 154000 | 0.0001 | - | - | - | - |
629
- | 0.9039 | 155000 | 0.0001 | 0.0001 | -0.0065 | 0.6336 | - |
630
- | 0.9097 | 156000 | 0.0001 | - | - | - | - |
631
- | 0.9155 | 157000 | 0.0001 | - | - | - | - |
632
- | 0.9214 | 158000 | 0.0001 | - | - | - | - |
633
- | 0.9272 | 159000 | 0.0001 | - | - | - | - |
634
- | 0.9330 | 160000 | 0.0001 | 0.0001 | -0.0064 | 0.6334 | - |
635
- | 0.9389 | 161000 | 0.0001 | - | - | - | - |
636
- | 0.9447 | 162000 | 0.0001 | - | - | - | - |
637
- | 0.9505 | 163000 | 0.0001 | - | - | - | - |
638
- | 0.9563 | 164000 | 0.0001 | - | - | - | - |
639
- | 0.9622 | 165000 | 0.0001 | 0.0001 | -0.0064 | 0.6337 | - |
640
- | 0.9680 | 166000 | 0.0001 | - | - | - | - |
641
- | 0.9738 | 167000 | 0.0001 | - | - | - | - |
642
- | 0.9797 | 168000 | 0.0001 | - | - | - | - |
643
- | 0.9855 | 169000 | 0.0001 | - | - | - | - |
644
- | 0.9913 | 170000 | 0.0001 | 0.0001 | -0.0063 | 0.6347 | - |
645
- | 0.9972 | 171000 | 0.0001 | - | - | - | - |
646
- | 1.0 | 171486 | - | - | - | - | 0.5986 |
647
-
648
- * The bold row denotes the saved checkpoint.
649
- </details>
650
-
651
- ### Framework Versions
652
- - Python: 3.10.14
653
- - Sentence Transformers: 3.0.1
654
- - Transformers: 4.44.0
655
- - PyTorch: 2.4.0
656
- - Accelerate: 0.33.0
657
- - Datasets: 2.20.0
658
- - Tokenizers: 0.19.1
659
-
660
- ## Citation
661
-
662
- ### BibTeX
663
-
664
- #### Sentence Transformers
665
- ```bibtex
666
- @inproceedings{reimers-2019-sentence-bert,
667
- title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
668
- author = "Reimers, Nils and Gurevych, Iryna",
669
- booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
670
- month = "11",
671
- year = "2019",
672
- publisher = "Association for Computational Linguistics",
673
- url = "https://arxiv.org/abs/1908.10084",
674
- }
675
- ```
676
-
677
- #### MSELoss
678
- ```bibtex
679
- @inproceedings{reimers-2020-multilingual-sentence-bert,
680
- title = "Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation",
681
- author = "Reimers, Nils and Gurevych, Iryna",
682
- booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing",
683
- month = "11",
684
- year = "2020",
685
- publisher = "Association for Computational Linguistics",
686
- url = "https://arxiv.org/abs/2004.09813",
687
- }
688
- ```
689
-
690
- <!--
691
- ## Glossary
692
-
693
- *Clearly define terms in order to be accessible across audiences.*
694
- -->
695
-
696
- <!--
697
- ## Model Card Authors
698
-
699
- *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
700
- -->
701
-
702
- <!--
703
- ## Model Card Contact
704
-
705
- *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
 
 
 
 
 
 
 
 
 
 
 
 
706
  -->
 
1
+ ---
2
+ base_model: cointegrated/LaBSE-en-ru
3
+ language:
4
+ - ru
5
+ - en
6
+ library_name: sentence-transformers
7
+ metrics:
8
+ - pearson_cosine
9
+ - spearman_cosine
10
+ - pearson_manhattan
11
+ - spearman_manhattan
12
+ - pearson_euclidean
13
+ - spearman_euclidean
14
+ - pearson_dot
15
+ - spearman_dot
16
+ - pearson_max
17
+ - spearman_max
18
+ - negative_mse
19
+ pipeline_tag: sentence-similarity
20
+ tags:
21
+ - sentence-transformers
22
+ - sentence-similarity
23
+ - feature-extraction
24
+ - generated_from_trainer
25
+ - dataset_size:10975066
26
+ - loss:MSELoss
27
+ widget:
28
+ - source_sentence: Такие лодки строились, чтобы получить быстрый доступ к приходящим судам.
29
+ sentences:
30
+ - been nice talking to you
31
+ - >-
32
+ Нельзя ставить под сомнение притязания клиента, если не были предприняты
33
+ шаги.
34
+ - >-
35
+ Dharangaon Railway Station serves Dharangaon in Jalgaon district in the
36
+ Indian state of Maharashtra.
37
+ - source_sentence: >-
38
+ Если прилагательные смягчают этнические термины, существительные могут
39
+ сделать их жестче.
40
+ sentences:
41
+ - >-
42
+ Вслед за этим последовало секретное письмо А.Б.Чубайса об изъятии у МЦР,
43
+ переданного ему С.Н.Рерихом наследия.
44
+ - Coaches should not give young athletes a hard time.
45
+ - Эшкрофт хотел прослушивать сводки новостей снова и снова
46
+ - source_sentence: Земля была мягкой.
47
+ sentences:
48
+ - >-
49
+ По мере того, как самообладание покидало его, с��рдце его все больше
50
+ наполнялось тревогой.
51
+ - >-
52
+ Our borders and immigration system, including law enforcement, ought to send
53
+ a message of welcome, tolerance, and justice to members of immigrant
54
+ communities in the United States and in their countries of origin.
55
+ - >-
56
+ Начнут действовать льготные условия аренды земель, которые предназначены для
57
+ реализации инвестиционных проектов.
58
+ - source_sentence: >-
59
+ Что же касается рава Кука: мой рав лично знал его и много раз с теплотой
60
+ рассказывал мне о нем как о великом каббалисте.
61
+ sentences:
62
+ - Вдова Эдгара Эванса, его дети и мать получили 1500 фунтов стерлингов (
63
+ - Please do not make any changes to your address.
64
+ - Мы уже закончили все запланированные дела!
65
+ - source_sentence: See Name section.
66
+ sentences:
67
+ - >-
68
+ Ms. Packard is the voice of the female blood elf in the video game World of
69
+ Warcraft.
70
+ - >-
71
+ Основным функциональным элементом, реализующим функции управления
72
+ соединением, является абонентский терминал.
73
+ - Yeah, people who might not be hungry.
74
+ model-index:
75
+ - name: SentenceTransformer based on cointegrated/LaBSE-en-ru
76
+ results:
77
+ - task:
78
+ type: semantic-similarity
79
+ name: Semantic Similarity
80
+ dataset:
81
+ name: sts dev
82
+ type: sts-dev
83
+ metrics:
84
+ - type: pearson_cosine
85
+ value: 0.5305176535187099
86
+ name: Pearson Cosine
87
+ - type: spearman_cosine
88
+ value: 0.6347069834349862
89
+ name: Spearman Cosine
90
+ - type: pearson_manhattan
91
+ value: 0.5553415140113596
92
+ name: Pearson Manhattan
93
+ - type: spearman_manhattan
94
+ value: 0.6389336208598283
95
+ name: Spearman Manhattan
96
+ - type: pearson_euclidean
97
+ value: 0.5499910306125031
98
+ name: Pearson Euclidean
99
+ - type: spearman_euclidean
100
+ value: 0.6347073809507647
101
+ name: Spearman Euclidean
102
+ - type: pearson_dot
103
+ value: 0.5305176585564861
104
+ name: Pearson Dot
105
+ - type: spearman_dot
106
+ value: 0.6347078463557637
107
+ name: Spearman Dot
108
+ - type: pearson_max
109
+ value: 0.5553415140113596
110
+ name: Pearson Max
111
+ - type: spearman_max
112
+ value: 0.6389336208598283
113
+ name: Spearman Max
114
+ - task:
115
+ type: knowledge-distillation
116
+ name: Knowledge Distillation
117
+ dataset:
118
+ name: Unknown
119
+ type: unknown
120
+ metrics:
121
+ - type: negative_mse
122
+ value: -0.006337030936265364
123
+ name: Negative Mse
124
+ - task:
125
+ type: semantic-similarity
126
+ name: Semantic Similarity
127
+ dataset:
128
+ name: sts test
129
+ type: sts-test
130
+ metrics:
131
+ - type: pearson_cosine
132
+ value: 0.5042796836494269
133
+ name: Pearson Cosine
134
+ - type: spearman_cosine
135
+ value: 0.5986471772428711
136
+ name: Spearman Cosine
137
+ - type: pearson_manhattan
138
+ value: 0.522744495080616
139
+ name: Pearson Manhattan
140
+ - type: spearman_manhattan
141
+ value: 0.5983901280447074
142
+ name: Spearman Manhattan
143
+ - type: pearson_euclidean
144
+ value: 0.522721961447153
145
+ name: Pearson Euclidean
146
+ - type: spearman_euclidean
147
+ value: 0.5986471095414022
148
+ name: Spearman Euclidean
149
+ - type: pearson_dot
150
+ value: 0.504279685613151
151
+ name: Pearson Dot
152
+ - type: spearman_dot
153
+ value: 0.598648155615724
154
+ name: Spearman Dot
155
+ - type: pearson_max
156
+ value: 0.522744495080616
157
+ name: Pearson Max
158
+ - type: spearman_max
159
+ value: 0.598648155615724
160
+ name: Spearman Max
161
+ ---
162
+
163
+ # SentenceTransformer based on cointegrated/LaBSE-en-ru
164
+
165
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [cointegrated/LaBSE-en-ru](https://huggingface.co/cointegrated/LaBSE-en-ru). It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
166
+
167
+ ## Model Details
168
+
169
+ ### Model Description
170
+ - **Model Type:** Sentence Transformer
171
+ - **Base model:** [cointegrated/LaBSE-en-ru](https://huggingface.co/cointegrated/LaBSE-en-ru) <!-- at revision cf0714e606d4af551e14ad69a7929cd6b0da7f7e -->
172
+ - **Maximum Sequence Length:** 512 tokens
173
+ - **Output Dimensionality:** 768 tokens
174
+ - **Similarity Function:** Cosine Similarity
175
+ <!-- - **Training Dataset:** Unknown -->
176
+ <!-- - **Language:** Unknown -->
177
+ <!-- - **License:** Unknown -->
178
+
179
+ ### Model Sources
180
+
181
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
182
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
183
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
184
+
185
+ ### Full Model Architecture
186
+
187
+ ```
188
+ SentenceTransformer(
189
+ (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
190
+ (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
191
+ (2): Dense({'in_features': 768, 'out_features': 768, 'bias': True, 'activation_function': 'torch.nn.modules.activation.Tanh'})
192
+ (3): Normalize()
193
+ )
194
+ ```
195
+
196
+ ## Usage
197
+
198
+ ### Direct Usage (Sentence Transformers)
199
+
200
+ First install the Sentence Transformers library:
201
+
202
+ ```bash
203
+ pip install -U sentence-transformers
204
+ ```
205
+
206
+ Then you can load this model and run inference.
207
+ ```python
208
+ from sentence_transformers import SentenceTransformer
209
+
210
+ # Download from the 🤗 Hub
211
+ model = SentenceTransformer("whitemouse84/LaBSE-en-ru-distilled-each-third-layer")
212
+ # Run inference
213
+ sentences = [
214
+ 'See Name section.',
215
+ 'Ms. Packard is the voice of the female blood elf in the video game World of Warcraft.',
216
+ 'Yeah, people who might not be hungry.',
217
+ ]
218
+ embeddings = model.encode(sentences)
219
+ print(embeddings.shape)
220
+ # [3, 768]
221
+
222
+ # Get the similarity scores for the embeddings
223
+ similarities = model.similarity(embeddings, embeddings)
224
+ print(similarities.shape)
225
+ # [3, 3]
226
+ ```
227
+
228
+ <!--
229
+ ### Direct Usage (Transformers)
230
+
231
+ <details><summary>Click to see the direct usage in Transformers</summary>
232
+
233
+ </details>
234
+ -->
235
+
236
+ <!--
237
+ ### Downstream Usage (Sentence Transformers)
238
+
239
+ You can finetune this model on your own dataset.
240
+
241
+ <details><summary>Click to expand</summary>
242
+
243
+ </details>
244
+ -->
245
+
246
+ <!--
247
+ ### Out-of-Scope Use
248
+
249
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
250
+ -->
251
+
252
+ ## Evaluation
253
+
254
+ ### Metrics
255
+
256
+ #### Semantic Similarity
257
+ * Dataset: `sts-dev`
258
+ * Evaluated with [<code>EmbeddingSimilarityEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.EmbeddingSimilarityEvaluator)
259
+
260
+ | Metric | Value |
261
+ |:--------------------|:-----------|
262
+ | pearson_cosine | 0.5305 |
263
+ | **spearman_cosine** | **0.6347** |
264
+ | pearson_manhattan | 0.5553 |
265
+ | spearman_manhattan | 0.6389 |
266
+ | pearson_euclidean | 0.55 |
267
+ | spearman_euclidean | 0.6347 |
268
+ | pearson_dot | 0.5305 |
269
+ | spearman_dot | 0.6347 |
270
+ | pearson_max | 0.5553 |
271
+ | spearman_max | 0.6389 |
272
+
273
+ #### Knowledge Distillation
274
+
275
+ * Evaluated with [<code>MSEEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.MSEEvaluator)
276
+
277
+ | Metric | Value |
278
+ |:-----------------|:------------|
279
+ | **negative_mse** | **-0.0063** |
280
+
281
+ #### Semantic Similarity
282
+ * Dataset: `sts-test`
283
+ * Evaluated with [<code>EmbeddingSimilarityEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.EmbeddingSimilarityEvaluator)
284
+
285
+ | Metric | Value |
286
+ |:--------------------|:-----------|
287
+ | pearson_cosine | 0.5043 |
288
+ | **spearman_cosine** | **0.5986** |
289
+ | pearson_manhattan | 0.5227 |
290
+ | spearman_manhattan | 0.5984 |
291
+ | pearson_euclidean | 0.5227 |
292
+ | spearman_euclidean | 0.5986 |
293
+ | pearson_dot | 0.5043 |
294
+ | spearman_dot | 0.5986 |
295
+ | pearson_max | 0.5227 |
296
+ | spearman_max | 0.5986 |
297
+
298
+ <!--
299
+ ## Bias, Risks and Limitations
300
+
301
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
302
+ -->
303
+
304
+ <!--
305
+ ### Recommendations
306
+
307
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
308
+ -->
309
+
310
+ ## Training Details
311
+
312
+ ### Training Dataset
313
+
314
+ #### Unnamed Dataset
315
+
316
+
317
+ * Size: 10,975,066 training samples
318
+ * Columns: <code>sentence</code> and <code>label</code>
319
+ * Approximate statistics based on the first 1000 samples:
320
+ | | sentence | label |
321
+ |:--------|:-----------------------------------------------------------------------------------|:-------------------------------------|
322
+ | type | string | list |
323
+ | details | <ul><li>min: 6 tokens</li><li>mean: 26.93 tokens</li><li>max: 139 tokens</li></ul> | <ul><li>size: 768 elements</li></ul> |
324
+ * Samples:
325
+ | sentence | label |
326
+ |:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------------------------------------------------------|
327
+ | <code>It is based on the Java Persistence API (JPA), but it does not strictly follow the JSR 338 Specification, as it implements different design patterns and technologies.</code> | <code>[-0.012331949546933174, -0.04570527374744415, -0.024963658303022385, -0.03620213270187378, 0.022556383162736893, ...]</code> |
328
+ | <code>Покупаем вторичное сырье в Каунасе (Переработка вторичного сырья) - Алфенас АНД КО, ЗАО на Bizorg.</code> | <code>[-0.07498518377542496, -0.01913534104824066, -0.01797042042016983, 0.048263177275657654, -0.00016611881437711418, ...]</code> |
329
+ | <code>At the Equal Justice Conference ( EJC ) held in March 2001 in San Diego , LSC and the Project for the Future of Equal Justice held the second Case Management Software pre-conference .</code> | <code>[0.03870972990989685, -0.0638347640633583, -0.01696585863828659, -0.043612319976091385, -0.048241738229990005, ...]</code> |
330
+ * Loss: [<code>MSELoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#mseloss)
331
+
332
+ ### Evaluation Dataset
333
+
334
+ #### Unnamed Dataset
335
+
336
+
337
+ * Size: 10,000 evaluation samples
338
+ * Columns: <code>sentence</code> and <code>label</code>
339
+ * Approximate statistics based on the first 1000 samples:
340
+ | | sentence | label |
341
+ |:--------|:-----------------------------------------------------------------------------------|:-------------------------------------|
342
+ | type | string | list |
343
+ | details | <ul><li>min: 5 tokens</li><li>mean: 24.18 tokens</li><li>max: 111 tokens</li></ul> | <ul><li>size: 768 elements</li></ul> |
344
+ * Samples:
345
+ | sentence | label |
346
+ |:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------|
347
+ | <code>The Canadian Canoe Museum is a museum dedicated to canoes located in Peterborough, Ontario, Canada.</code> | <code>[-0.05444105342030525, -0.03650881350040436, -0.041163671761751175, -0.010616903193295002, -0.04094529151916504, ...]</code> |
348
+ | <code>И мне нравилось, что я одновременно зарабатываю и смотрю бои».</code> | <code>[-0.03404555842280388, 0.028203096240758896, -0.056121889501810074, -0.0591997392475605, -0.05523117259144783, ...]</code> |
349
+ | <code>Ну, а на следующий день, разумеется, Президент Кеннеди объявил блокаду Кубы, и наши корабли остановили у кубинских берегов направлявшийся на Кубу российский корабль, и у него на борту нашли ракеты.</code> | <code>[-0.008193841204047203, 0.00694894278421998, -0.03027420863509178, -0.03290146216750145, 0.01425305474549532, ...]</code> |
350
+ * Loss: [<code>MSELoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#mseloss)
351
+
352
+ ### Training Hyperparameters
353
+ #### Non-Default Hyperparameters
354
+
355
+ - `eval_strategy`: steps
356
+ - `per_device_train_batch_size`: 64
357
+ - `per_device_eval_batch_size`: 64
358
+ - `learning_rate`: 0.0001
359
+ - `num_train_epochs`: 1
360
+ - `warmup_ratio`: 0.1
361
+ - `fp16`: True
362
+ - `load_best_model_at_end`: True
363
+
364
+ #### All Hyperparameters
365
+ <details><summary>Click to expand</summary>
366
+
367
+ - `overwrite_output_dir`: False
368
+ - `do_predict`: False
369
+ - `eval_strategy`: steps
370
+ - `prediction_loss_only`: True
371
+ - `per_device_train_batch_size`: 64
372
+ - `per_device_eval_batch_size`: 64
373
+ - `per_gpu_train_batch_size`: None
374
+ - `per_gpu_eval_batch_size`: None
375
+ - `gradient_accumulation_steps`: 1
376
+ - `eval_accumulation_steps`: None
377
+ - `torch_empty_cache_steps`: None
378
+ - `learning_rate`: 0.0001
379
+ - `weight_decay`: 0.0
380
+ - `adam_beta1`: 0.9
381
+ - `adam_beta2`: 0.999
382
+ - `adam_epsilon`: 1e-08
383
+ - `max_grad_norm`: 1.0
384
+ - `num_train_epochs`: 1
385
+ - `max_steps`: -1
386
+ - `lr_scheduler_type`: linear
387
+ - `lr_scheduler_kwargs`: {}
388
+ - `warmup_ratio`: 0.1
389
+ - `warmup_steps`: 0
390
+ - `log_level`: passive
391
+ - `log_level_replica`: warning
392
+ - `log_on_each_node`: True
393
+ - `logging_nan_inf_filter`: True
394
+ - `save_safetensors`: True
395
+ - `save_on_each_node`: False
396
+ - `save_only_model`: False
397
+ - `restore_callback_states_from_checkpoint`: False
398
+ - `no_cuda`: False
399
+ - `use_cpu`: False
400
+ - `use_mps_device`: False
401
+ - `seed`: 42
402
+ - `data_seed`: None
403
+ - `jit_mode_eval`: False
404
+ - `use_ipex`: False
405
+ - `bf16`: False
406
+ - `fp16`: True
407
+ - `fp16_opt_level`: O1
408
+ - `half_precision_backend`: auto
409
+ - `bf16_full_eval`: False
410
+ - `fp16_full_eval`: False
411
+ - `tf32`: None
412
+ - `local_rank`: 0
413
+ - `ddp_backend`: None
414
+ - `tpu_num_cores`: None
415
+ - `tpu_metrics_debug`: False
416
+ - `debug`: []
417
+ - `dataloader_drop_last`: False
418
+ - `dataloader_num_workers`: 0
419
+ - `dataloader_prefetch_factor`: None
420
+ - `past_index`: -1
421
+ - `disable_tqdm`: False
422
+ - `remove_unused_columns`: True
423
+ - `label_names`: None
424
+ - `load_best_model_at_end`: True
425
+ - `ignore_data_skip`: False
426
+ - `fsdp`: []
427
+ - `fsdp_min_num_params`: 0
428
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
429
+ - `fsdp_transformer_layer_cls_to_wrap`: None
430
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
431
+ - `deepspeed`: None
432
+ - `label_smoothing_factor`: 0.0
433
+ - `optim`: adamw_torch
434
+ - `optim_args`: None
435
+ - `adafactor`: False
436
+ - `group_by_length`: False
437
+ - `length_column_name`: length
438
+ - `ddp_find_unused_parameters`: None
439
+ - `ddp_bucket_cap_mb`: None
440
+ - `ddp_broadcast_buffers`: False
441
+ - `dataloader_pin_memory`: True
442
+ - `dataloader_persistent_workers`: False
443
+ - `skip_memory_metrics`: True
444
+ - `use_legacy_prediction_loop`: False
445
+ - `push_to_hub`: False
446
+ - `resume_from_checkpoint`: None
447
+ - `hub_model_id`: None
448
+ - `hub_strategy`: every_save
449
+ - `hub_private_repo`: False
450
+ - `hub_always_push`: False
451
+ - `gradient_checkpointing`: False
452
+ - `gradient_checkpointing_kwargs`: None
453
+ - `include_inputs_for_metrics`: False
454
+ - `eval_do_concat_batches`: True
455
+ - `fp16_backend`: auto
456
+ - `push_to_hub_model_id`: None
457
+ - `push_to_hub_organization`: None
458
+ - `mp_parameters`:
459
+ - `auto_find_batch_size`: False
460
+ - `full_determinism`: False
461
+ - `torchdynamo`: None
462
+ - `ray_scope`: last
463
+ - `ddp_timeout`: 1800
464
+ - `torch_compile`: False
465
+ - `torch_compile_backend`: None
466
+ - `torch_compile_mode`: None
467
+ - `dispatch_batches`: None
468
+ - `split_batches`: None
469
+ - `include_tokens_per_second`: False
470
+ - `include_num_input_tokens_seen`: False
471
+ - `neftune_noise_alpha`: None
472
+ - `optim_target_modules`: None
473
+ - `batch_eval_metrics`: False
474
+ - `eval_on_start`: False
475
+ - `eval_use_gather_object`: False
476
+ - `batch_sampler`: batch_sampler
477
+ - `multi_dataset_batch_sampler`: proportional
478
+
479
+ </details>
480
+
481
+ ### Training Logs
482
+ <details><summary>Click to expand</summary>
483
+
484
+ | Epoch | Step | Training Loss | loss | negative_mse | sts-dev_spearman_cosine | sts-test_spearman_cosine |
485
+ |:----------:|:--------:|:-------------:|:----------:|:------------:|:-----------------------:|:------------------------:|
486
+ | 0 | 0 | - | - | -0.2381 | 0.4206 | - |
487
+ | 0.0058 | 1000 | 0.0014 | - | - | - | - |
488
+ | 0.0117 | 2000 | 0.0009 | - | - | - | - |
489
+ | 0.0175 | 3000 | 0.0007 | - | - | - | - |
490
+ | 0.0233 | 4000 | 0.0006 | - | - | - | - |
491
+ | **0.0292** | **5000** | **0.0005** | **0.0004** | **-0.0363** | **0.6393** | **-** |
492
+ | 0.0350 | 6000 | 0.0004 | - | - | - | - |
493
+ | 0.0408 | 7000 | 0.0004 | - | - | - | - |
494
+ | 0.0467 | 8000 | 0.0003 | - | - | - | - |
495
+ | 0.0525 | 9000 | 0.0003 | - | - | - | - |
496
+ | 0.0583 | 10000 | 0.0003 | 0.0002 | -0.0207 | 0.6350 | - |
497
+ | 0.0641 | 11000 | 0.0003 | - | - | - | - |
498
+ | 0.0700 | 12000 | 0.0003 | - | - | - | - |
499
+ | 0.0758 | 13000 | 0.0002 | - | - | - | - |
500
+ | 0.0816 | 14000 | 0.0002 | - | - | - | - |
501
+ | 0.0875 | 15000 | 0.0002 | 0.0002 | -0.0157 | 0.6328 | - |
502
+ | 0.0933 | 16000 | 0.0002 | - | - | - | - |
503
+ | 0.0991 | 17000 | 0.0002 | - | - | - | - |
504
+ | 0.1050 | 18000 | 0.0002 | - | - | - | - |
505
+ | 0.1108 | 19000 | 0.0002 | - | - | - | - |
506
+ | 0.1166 | 20000 | 0.0002 | 0.0001 | -0.0132 | 0.6317 | - |
507
+ | 0.1225 | 21000 | 0.0002 | - | - | - | - |
508
+ | 0.1283 | 22000 | 0.0002 | - | - | - | - |
509
+ | 0.1341 | 23000 | 0.0002 | - | - | - | - |
510
+ | 0.1400 | 24000 | 0.0002 | - | - | - | - |
511
+ | 0.1458 | 25000 | 0.0002 | 0.0001 | -0.0118 | 0.6251 | - |
512
+ | 0.1516 | 26000 | 0.0002 | - | - | - | - |
513
+ | 0.1574 | 27000 | 0.0002 | - | - | - | - |
514
+ | 0.1633 | 28000 | 0.0002 | - | - | - | - |
515
+ | 0.1691 | 29000 | 0.0002 | - | - | - | - |
516
+ | 0.1749 | 30000 | 0.0002 | 0.0001 | -0.0109 | 0.6304 | - |
517
+ | 0.1808 | 31000 | 0.0002 | - | - | - | - |
518
+ | 0.1866 | 32000 | 0.0002 | - | - | - | - |
519
+ | 0.1924 | 33000 | 0.0002 | - | - | - | - |
520
+ | 0.1983 | 34000 | 0.0001 | - | - | - | - |
521
+ | 0.2041 | 35000 | 0.0001 | 0.0001 | -0.0102 | 0.6280 | - |
522
+ | 0.2099 | 36000 | 0.0001 | - | - | - | - |
523
+ | 0.2158 | 37000 | 0.0001 | - | - | - | - |
524
+ | 0.2216 | 38000 | 0.0001 | - | - | - | - |
525
+ | 0.2274 | 39000 | 0.0001 | - | - | - | - |
526
+ | 0.2333 | 40000 | 0.0001 | 0.0001 | -0.0098 | 0.6272 | - |
527
+ | 0.2391 | 41000 | 0.0001 | - | - | - | - |
528
+ | 0.2449 | 42000 | 0.0001 | - | - | - | - |
529
+ | 0.2507 | 43000 | 0.0001 | - | - | - | - |
530
+ | 0.2566 | 44000 | 0.0001 | - | - | - | - |
531
+ | 0.2624 | 45000 | 0.0001 | 0.0001 | -0.0093 | 0.6378 | - |
532
+ | 0.2682 | 46000 | 0.0001 | - | - | - | - |
533
+ | 0.2741 | 47000 | 0.0001 | - | - | - | - |
534
+ | 0.2799 | 48000 | 0.0001 | - | - | - | - |
535
+ | 0.2857 | 49000 | 0.0001 | - | - | - | - |
536
+ | 0.2916 | 50000 | 0.0001 | 0.0001 | -0.0089 | 0.6325 | - |
537
+ | 0.2974 | 51000 | 0.0001 | - | - | - | - |
538
+ | 0.3032 | 52000 | 0.0001 | - | - | - | - |
539
+ | 0.3091 | 53000 | 0.0001 | - | - | - | - |
540
+ | 0.3149 | 54000 | 0.0001 | - | - | - | - |
541
+ | 0.3207 | 55000 | 0.0001 | 0.0001 | -0.0087 | 0.6328 | - |
542
+ | 0.3266 | 56000 | 0.0001 | - | - | - | - |
543
+ | 0.3324 | 57000 | 0.0001 | - | - | - | - |
544
+ | 0.3382 | 58000 | 0.0001 | - | - | - | - |
545
+ | 0.3441 | 59000 | 0.0001 | - | - | - | - |
546
+ | 0.3499 | 60000 | 0.0001 | 0.0001 | -0.0085 | 0.6357 | - |
547
+ | 0.3557 | 61000 | 0.0001 | - | - | - | - |
548
+ | 0.3615 | 62000 | 0.0001 | - | - | - | - |
549
+ | 0.3674 | 63000 | 0.0001 | - | - | - | - |
550
+ | 0.3732 | 64000 | 0.0001 | - | - | - | - |
551
+ | 0.3790 | 65000 | 0.0001 | 0.0001 | -0.0083 | 0.6366 | - |
552
+ | 0.3849 | 66000 | 0.0001 | - | - | - | - |
553
+ | 0.3907 | 67000 | 0.0001 | - | - | - | - |
554
+ | 0.3965 | 68000 | 0.0001 | - | - | - | - |
555
+ | 0.4024 | 69000 | 0.0001 | - | - | - | - |
556
+ | 0.4082 | 70000 | 0.0001 | 0.0001 | -0.0080 | 0.6325 | - |
557
+ | 0.4140 | 71000 | 0.0001 | - | - | - | - |
558
+ | 0.4199 | 72000 | 0.0001 | - | - | - | - |
559
+ | 0.4257 | 73000 | 0.0001 | - | - | - | - |
560
+ | 0.4315 | 74000 | 0.0001 | - | - | - | - |
561
+ | 0.4374 | 75000 | 0.0001 | 0.0001 | -0.0078 | 0.6351 | - |
562
+ | 0.4432 | 76000 | 0.0001 | - | - | - | - |
563
+ | 0.4490 | 77000 | 0.0001 | - | - | - | - |
564
+ | 0.4548 | 78000 | 0.0001 | - | - | - | - |
565
+ | 0.4607 | 79000 | 0.0001 | - | - | - | - |
566
+ | 0.4665 | 80000 | 0.0001 | 0.0001 | -0.0077 | 0.6323 | - |
567
+ | 0.4723 | 81000 | 0.0001 | - | - | - | - |
568
+ | 0.4782 | 82000 | 0.0001 | - | - | - | - |
569
+ | 0.4840 | 83000 | 0.0001 | - | - | - | - |
570
+ | 0.4898 | 84000 | 0.0001 | - | - | - | - |
571
+ | 0.4957 | 85000 | 0.0001 | 0.0001 | -0.0076 | 0.6316 | - |
572
+ | 0.5015 | 86000 | 0.0001 | - | - | - | - |
573
+ | 0.5073 | 87000 | 0.0001 | - | - | - | - |
574
+ | 0.5132 | 88000 | 0.0001 | - | - | - | - |
575
+ | 0.5190 | 89000 | 0.0001 | - | - | - | - |
576
+ | 0.5248 | 90000 | 0.0001 | 0.0001 | -0.0074 | 0.6306 | - |
577
+ | 0.5307 | 91000 | 0.0001 | - | - | - | - |
578
+ | 0.5365 | 92000 | 0.0001 | - | - | - | - |
579
+ | 0.5423 | 93000 | 0.0001 | - | - | - | - |
580
+ | 0.5481 | 94000 | 0.0001 | - | - | - | - |
581
+ | 0.5540 | 95000 | 0.0001 | 0.0001 | -0.0073 | 0.6305 | - |
582
+ | 0.5598 | 96000 | 0.0001 | - | - | - | - |
583
+ | 0.5656 | 97000 | 0.0001 | - | - | - | - |
584
+ | 0.5715 | 98000 | 0.0001 | - | - | - | - |
585
+ | 0.5773 | 99000 | 0.0001 | - | - | - | - |
586
+ | 0.5831 | 100000 | 0.0001 | 0.0001 | -0.0072 | 0.6333 | - |
587
+ | 0.5890 | 101000 | 0.0001 | - | - | - | - |
588
+ | 0.5948 | 102000 | 0.0001 | - | - | - | - |
589
+ | 0.6006 | 103000 | 0.0001 | - | - | - | - |
590
+ | 0.6065 | 104000 | 0.0001 | - | - | - | - |
591
+ | 0.6123 | 105000 | 0.0001 | 0.0001 | -0.0071 | 0.6351 | - |
592
+ | 0.6181 | 106000 | 0.0001 | - | - | - | - |
593
+ | 0.6240 | 107000 | 0.0001 | - | - | - | - |
594
+ | 0.6298 | 108000 | 0.0001 | - | - | - | - |
595
+ | 0.6356 | 109000 | 0.0001 | - | - | - | - |
596
+ | 0.6415 | 110000 | 0.0001 | 0.0001 | -0.0070 | 0.6330 | - |
597
+ | 0.6473 | 111000 | 0.0001 | - | - | - | - |
598
+ | 0.6531 | 112000 | 0.0001 | - | - | - | - |
599
+ | 0.6589 | 113000 | 0.0001 | - | - | - | - |
600
+ | 0.6648 | 114000 | 0.0001 | - | - | - | - |
601
+ | 0.6706 | 115000 | 0.0001 | 0.0001 | -0.0070 | 0.6336 | - |
602
+ | 0.6764 | 116000 | 0.0001 | - | - | - | - |
603
+ | 0.6823 | 117000 | 0.0001 | - | - | - | - |
604
+ | 0.6881 | 118000 | 0.0001 | - | - | - | - |
605
+ | 0.6939 | 119000 | 0.0001 | - | - | - | - |
606
+ | 0.6998 | 120000 | 0.0001 | 0.0001 | -0.0069 | 0.6305 | - |
607
+ | 0.7056 | 121000 | 0.0001 | - | - | - | - |
608
+ | 0.7114 | 122000 | 0.0001 | - | - | - | - |
609
+ | 0.7173 | 123000 | 0.0001 | - | - | - | - |
610
+ | 0.7231 | 124000 | 0.0001 | - | - | - | - |
611
+ | 0.7289 | 125000 | 0.0001 | 0.0001 | -0.0068 | 0.6362 | - |
612
+ | 0.7348 | 126000 | 0.0001 | - | - | - | - |
613
+ | 0.7406 | 127000 | 0.0001 | - | - | - | - |
614
+ | 0.7464 | 128000 | 0.0001 | - | - | - | - |
615
+ | 0.7522 | 129000 | 0.0001 | - | - | - | - |
616
+ | 0.7581 | 130000 | 0.0001 | 0.0001 | -0.0067 | 0.6340 | - |
617
+ | 0.7639 | 131000 | 0.0001 | - | - | - | - |
618
+ | 0.7697 | 132000 | 0.0001 | - | - | - | - |
619
+ | 0.7756 | 133000 | 0.0001 | - | - | - | - |
620
+ | 0.7814 | 134000 | 0.0001 | - | - | - | - |
621
+ | 0.7872 | 135000 | 0.0001 | 0.0001 | -0.0067 | 0.6365 | - |
622
+ | 0.7931 | 136000 | 0.0001 | - | - | - | - |
623
+ | 0.7989 | 137000 | 0.0001 | - | - | - | - |
624
+ | 0.8047 | 138000 | 0.0001 | - | - | - | - |
625
+ | 0.8106 | 139000 | 0.0001 | - | - | - | - |
626
+ | 0.8164 | 140000 | 0.0001 | 0.0001 | -0.0066 | 0.6339 | - |
627
+ | 0.8222 | 141000 | 0.0001 | - | - | - | - |
628
+ | 0.8281 | 142000 | 0.0001 | - | - | - | - |
629
+ | 0.8339 | 143000 | 0.0001 | - | - | - | - |
630
+ | 0.8397 | 144000 | 0.0001 | - | - | - | - |
631
+ | 0.8456 | 145000 | 0.0001 | 0.0001 | -0.0066 | 0.6352 | - |
632
+ | 0.8514 | 146000 | 0.0001 | - | - | - | - |
633
+ | 0.8572 | 147000 | 0.0001 | - | - | - | - |
634
+ | 0.8630 | 148000 | 0.0001 | - | - | - | - |
635
+ | 0.8689 | 149000 | 0.0001 | - | - | - | - |
636
+ | 0.8747 | 150000 | 0.0001 | 0.0001 | -0.0065 | 0.6357 | - |
637
+ | 0.8805 | 151000 | 0.0001 | - | - | - | - |
638
+ | 0.8864 | 152000 | 0.0001 | - | - | - | - |
639
+ | 0.8922 | 153000 | 0.0001 | - | - | - | - |
640
+ | 0.8980 | 154000 | 0.0001 | - | - | - | - |
641
+ | 0.9039 | 155000 | 0.0001 | 0.0001 | -0.0065 | 0.6336 | - |
642
+ | 0.9097 | 156000 | 0.0001 | - | - | - | - |
643
+ | 0.9155 | 157000 | 0.0001 | - | - | - | - |
644
+ | 0.9214 | 158000 | 0.0001 | - | - | - | - |
645
+ | 0.9272 | 159000 | 0.0001 | - | - | - | - |
646
+ | 0.9330 | 160000 | 0.0001 | 0.0001 | -0.0064 | 0.6334 | - |
647
+ | 0.9389 | 161000 | 0.0001 | - | - | - | - |
648
+ | 0.9447 | 162000 | 0.0001 | - | - | - | - |
649
+ | 0.9505 | 163000 | 0.0001 | - | - | - | - |
650
+ | 0.9563 | 164000 | 0.0001 | - | - | - | - |
651
+ | 0.9622 | 165000 | 0.0001 | 0.0001 | -0.0064 | 0.6337 | - |
652
+ | 0.9680 | 166000 | 0.0001 | - | - | - | - |
653
+ | 0.9738 | 167000 | 0.0001 | - | - | - | - |
654
+ | 0.9797 | 168000 | 0.0001 | - | - | - | - |
655
+ | 0.9855 | 169000 | 0.0001 | - | - | - | - |
656
+ | 0.9913 | 170000 | 0.0001 | 0.0001 | -0.0063 | 0.6347 | - |
657
+ | 0.9972 | 171000 | 0.0001 | - | - | - | - |
658
+ | 1.0 | 171486 | - | - | - | - | 0.5986 |
659
+
660
+ * The bold row denotes the saved checkpoint.
661
+ </details>
662
+
663
+ ### Framework Versions
664
+ - Python: 3.10.14
665
+ - Sentence Transformers: 3.0.1
666
+ - Transformers: 4.44.0
667
+ - PyTorch: 2.4.0
668
+ - Accelerate: 0.33.0
669
+ - Datasets: 2.20.0
670
+ - Tokenizers: 0.19.1
671
+
672
+ ## Citation
673
+
674
+ ### BibTeX
675
+
676
+ #### Sentence Transformers
677
+ ```bibtex
678
+ @inproceedings{reimers-2019-sentence-bert,
679
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
680
+ author = "Reimers, Nils and Gurevych, Iryna",
681
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
682
+ month = "11",
683
+ year = "2019",
684
+ publisher = "Association for Computational Linguistics",
685
+ url = "https://arxiv.org/abs/1908.10084",
686
+ }
687
+ ```
688
+
689
+ #### MSELoss
690
+ ```bibtex
691
+ @inproceedings{reimers-2020-multilingual-sentence-bert,
692
+ title = "Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation",
693
+ author = "Reimers, Nils and Gurevych, Iryna",
694
+ booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing",
695
+ month = "11",
696
+ year = "2020",
697
+ publisher = "Association for Computational Linguistics",
698
+ url = "https://arxiv.org/abs/2004.09813",
699
+ }
700
+ ```
701
+
702
+ <!--
703
+ ## Glossary
704
+
705
+ *Clearly define terms in order to be accessible across audiences.*
706
+ -->
707
+
708
+ <!--
709
+ ## Model Card Authors
710
+
711
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
712
+ -->
713
+
714
+ <!--
715
+ ## Model Card Contact
716
+
717
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
718
  -->