yahyaabd commited on
Commit
f6ba240
·
verified ·
1 Parent(s): d6d6e7b

Add new SentenceTransformer model

Browse files
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ tokenizer.json filter=lfs diff=lfs merge=lfs -text
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 768,
3
+ "pooling_mode_cls_token": false,
4
+ "pooling_mode_mean_tokens": true,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md ADDED
@@ -0,0 +1,515 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: sentence-transformers/paraphrase-multilingual-mpnet-base-v2
3
+ datasets:
4
+ - yahyaabd/allstats-semantic-dataset-v4
5
+ library_name: sentence-transformers
6
+ metrics:
7
+ - pearson_cosine
8
+ - spearman_cosine
9
+ pipeline_tag: sentence-similarity
10
+ tags:
11
+ - sentence-transformers
12
+ - sentence-similarity
13
+ - feature-extraction
14
+ - generated_from_trainer
15
+ - dataset_size:88250
16
+ - loss:CosineSimilarityLoss
17
+ widget:
18
+ - source_sentence: Laporan ekspor Indonesia Juli 2020
19
+ sentences:
20
+ - Statistik Produksi Kehutanan 2021
21
+ - Buletin Statistik Perdagangan Luar Negeri Ekspor Menurut HS, Juli 2020
22
+ - Statistik Politik 2017
23
+ - source_sentence: Bulan apa yang dicatat data kunjungan wisatawan mancanegara?
24
+ sentences:
25
+ - Indeks Tendensi Bisnis dan Indeks Tendensi Konsumen 2005
26
+ - Data NTP bulan Maret 2022.
27
+ - Kunjungan wisatawan mancanegara pada Oktober 2023 mencapai 978,50 ribu kunjungan,
28
+ naik 33,27 persen (year-on-year)
29
+ - source_sentence: Seberapa besar kenaikan upah nominal harian buruh tani nasional
30
+ Januari 2016?
31
+ sentences:
32
+ - Keadaan Angkatan Kerja di Indonesia Mei 2013
33
+ - Profil Pasar Gorontalo 2020
34
+ - Tingkat pengangguran terbuka (TPT) Agustus 2024 sebesar 5,3%.
35
+ - source_sentence: Ringkasan data statistik Indonesia 1997
36
+ sentences:
37
+ - Statistik Upah 2007
38
+ - Harga konsumen bbrp jenis barang kelompok perumahan 2005
39
+ - Statistik Indonesia 1997
40
+ - source_sentence: Pernikahan usia anak di Indonesia periode 2013-2015
41
+ sentences:
42
+ - Jumlah penduduk Indonesia 2013-2015
43
+ - Indikator Ekonomi Desember 2006
44
+ - Indeks Tendensi Bisnis dan Indeks Tendensi Konsumen 2013
45
+ model-index:
46
+ - name: SentenceTransformer based on sentence-transformers/paraphrase-multilingual-mpnet-base-v2
47
+ results:
48
+ - task:
49
+ type: semantic-similarity
50
+ name: Semantic Similarity
51
+ dataset:
52
+ name: allstats semantic mpnet eval
53
+ type: allstats-semantic-mpnet-eval
54
+ metrics:
55
+ - type: pearson_cosine
56
+ value: 0.9714169395957917
57
+ name: Pearson Cosine
58
+ - type: spearman_cosine
59
+ value: 0.8933550959155299
60
+ name: Spearman Cosine
61
+ - task:
62
+ type: semantic-similarity
63
+ name: Semantic Similarity
64
+ dataset:
65
+ name: allstats semantic mpnet test
66
+ type: allstats-semantic-mpnet-test
67
+ metrics:
68
+ - type: pearson_cosine
69
+ value: 0.9723087139367028
70
+ name: Pearson Cosine
71
+ - type: spearman_cosine
72
+ value: 0.8932385415736595
73
+ name: Spearman Cosine
74
+ ---
75
+
76
+ # SentenceTransformer based on sentence-transformers/paraphrase-multilingual-mpnet-base-v2
77
+
78
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [sentence-transformers/paraphrase-multilingual-mpnet-base-v2](https://huggingface.co/sentence-transformers/paraphrase-multilingual-mpnet-base-v2) on the [allstats-semantic-dataset-v4](https://huggingface.co/datasets/yahyaabd/allstats-semantic-dataset-v4) dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
79
+
80
+ ## Model Details
81
+
82
+ ### Model Description
83
+ - **Model Type:** Sentence Transformer
84
+ - **Base model:** [sentence-transformers/paraphrase-multilingual-mpnet-base-v2](https://huggingface.co/sentence-transformers/paraphrase-multilingual-mpnet-base-v2) <!-- at revision 75c57757a97f90ad739aca51fa8bfea0e485a7f2 -->
85
+ - **Maximum Sequence Length:** 128 tokens
86
+ - **Output Dimensionality:** 768 dimensions
87
+ - **Similarity Function:** Cosine Similarity
88
+ - **Training Dataset:**
89
+ - [allstats-semantic-dataset-v4](https://huggingface.co/datasets/yahyaabd/allstats-semantic-dataset-v4)
90
+ <!-- - **Language:** Unknown -->
91
+ <!-- - **License:** Unknown -->
92
+
93
+ ### Model Sources
94
+
95
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
96
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
97
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
98
+
99
+ ### Full Model Architecture
100
+
101
+ ```
102
+ SentenceTransformer(
103
+ (0): Transformer({'max_seq_length': 128, 'do_lower_case': False}) with Transformer model: XLMRobertaModel
104
+ (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
105
+ )
106
+ ```
107
+
108
+ ## Usage
109
+
110
+ ### Direct Usage (Sentence Transformers)
111
+
112
+ First install the Sentence Transformers library:
113
+
114
+ ```bash
115
+ pip install -U sentence-transformers
116
+ ```
117
+
118
+ Then you can load this model and run inference.
119
+ ```python
120
+ from sentence_transformers import SentenceTransformer
121
+
122
+ # Download from the 🤗 Hub
123
+ model = SentenceTransformer("yahyaabd/allstats-semantic-mpnet")
124
+ # Run inference
125
+ sentences = [
126
+ 'Pernikahan usia anak di Indonesia periode 2013-2015',
127
+ 'Jumlah penduduk Indonesia 2013-2015',
128
+ 'Indeks Tendensi Bisnis dan Indeks Tendensi Konsumen 2013',
129
+ ]
130
+ embeddings = model.encode(sentences)
131
+ print(embeddings.shape)
132
+ # [3, 768]
133
+
134
+ # Get the similarity scores for the embeddings
135
+ similarities = model.similarity(embeddings, embeddings)
136
+ print(similarities.shape)
137
+ # [3, 3]
138
+ ```
139
+
140
+ <!--
141
+ ### Direct Usage (Transformers)
142
+
143
+ <details><summary>Click to see the direct usage in Transformers</summary>
144
+
145
+ </details>
146
+ -->
147
+
148
+ <!--
149
+ ### Downstream Usage (Sentence Transformers)
150
+
151
+ You can finetune this model on your own dataset.
152
+
153
+ <details><summary>Click to expand</summary>
154
+
155
+ </details>
156
+ -->
157
+
158
+ <!--
159
+ ### Out-of-Scope Use
160
+
161
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
162
+ -->
163
+
164
+ ## Evaluation
165
+
166
+ ### Metrics
167
+
168
+ #### Semantic Similarity
169
+
170
+ * Datasets: `allstats-semantic-mpnet-eval` and `allstats-semantic-mpnet-test`
171
+ * Evaluated with [<code>EmbeddingSimilarityEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.EmbeddingSimilarityEvaluator)
172
+
173
+ | Metric | allstats-semantic-mpnet-eval | allstats-semantic-mpnet-test |
174
+ |:--------------------|:-----------------------------|:-----------------------------|
175
+ | pearson_cosine | 0.9714 | 0.9723 |
176
+ | **spearman_cosine** | **0.8934** | **0.8932** |
177
+
178
+ <!--
179
+ ## Bias, Risks and Limitations
180
+
181
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
182
+ -->
183
+
184
+ <!--
185
+ ### Recommendations
186
+
187
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
188
+ -->
189
+
190
+ ## Training Details
191
+
192
+ ### Training Dataset
193
+
194
+ #### allstats-semantic-dataset-v4
195
+
196
+ * Dataset: [allstats-semantic-dataset-v4](https://huggingface.co/datasets/yahyaabd/allstats-semantic-dataset-v4) at [06c3cf8](https://huggingface.co/datasets/yahyaabd/allstats-semantic-dataset-v4/tree/06c3cf8715472fba6be04302a12790a6bd80443a)
197
+ * Size: 88,250 training samples
198
+ * Columns: <code>query</code>, <code>doc</code>, and <code>label</code>
199
+ * Approximate statistics based on the first 1000 samples:
200
+ | | query | doc | label |
201
+ |:--------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:---------------------------------------------------------------|
202
+ | type | string | string | float |
203
+ | details | <ul><li>min: 4 tokens</li><li>mean: 11.38 tokens</li><li>max: 46 tokens</li></ul> | <ul><li>min: 4 tokens</li><li>mean: 14.48 tokens</li><li>max: 67 tokens</li></ul> | <ul><li>min: 0.0</li><li>mean: 0.51</li><li>max: 1.0</li></ul> |
204
+ * Samples:
205
+ | query | doc | label |
206
+ |:-----------------------------------------------------------------------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------|
207
+ | <code>Industri teh Indonesia tahun 2021</code> | <code>Statistik Transportasi Laut 2014</code> | <code>0.1</code> |
208
+ | <code>Tahun berapa data pertumbuhan ekonomi Indonesia tersebut?</code> | <code>Nilai Tukar Petani (NTP) November 2023 sebesar 116,73 atau naik 0,82 persen. Harga Gabah Kering Panen di Tingkat Petani turun 1,94 persen dan Harga Beras Premium di Penggilingan turun 0,91 persen.</code> | <code>0.0</code> |
209
+ | <code>Kemiskinan di Indonesia Maret</code> | <code>2018 Feb Tenaga Kerja</code> | <code>0.1</code> |
210
+ * Loss: [<code>CosineSimilarityLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#cosinesimilarityloss) with these parameters:
211
+ ```json
212
+ {
213
+ "loss_fct": "torch.nn.modules.loss.MSELoss"
214
+ }
215
+ ```
216
+
217
+ ### Evaluation Dataset
218
+
219
+ #### allstats-semantic-dataset-v4
220
+
221
+ * Dataset: [allstats-semantic-dataset-v4](https://huggingface.co/datasets/yahyaabd/allstats-semantic-dataset-v4) at [06c3cf8](https://huggingface.co/datasets/yahyaabd/allstats-semantic-dataset-v4/tree/06c3cf8715472fba6be04302a12790a6bd80443a)
222
+ * Size: 18,910 evaluation samples
223
+ * Columns: <code>query</code>, <code>doc</code>, and <code>label</code>
224
+ * Approximate statistics based on the first 1000 samples:
225
+ | | query | doc | label |
226
+ |:--------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:---------------------------------------------------------------|
227
+ | type | string | string | float |
228
+ | details | <ul><li>min: 5 tokens</li><li>mean: 11.35 tokens</li><li>max: 33 tokens</li></ul> | <ul><li>min: 4 tokens</li><li>mean: 14.25 tokens</li><li>max: 52 tokens</li></ul> | <ul><li>min: 0.0</li><li>mean: 0.49</li><li>max: 1.0</li></ul> |
229
+ * Samples:
230
+ | query | doc | label |
231
+ |:--------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------|:-----------------|
232
+ | <code>nAalisis keuangam deas tshun 019</code> | <code>Statistik Migrasi Nusa Tenggara Barat Hasil Survei Penduduk Antar Sensus 2015</code> | <code>0.1</code> |
233
+ | <code>Data tanaman buah dan sayur Indonesia tahun 2016</code> | <code>Statistik Penduduk Lanjut Usia 2010</code> | <code>0.1</code> |
234
+ | <code>Pasar beras di Indonesia tahun 2018</code> | <code>Buletin Statistik Perdagangan Luar Negeri Ekspor Menurut Kelompok Komoditi dan Negara, April 2021</code> | <code>0.2</code> |
235
+ * Loss: [<code>CosineSimilarityLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#cosinesimilarityloss) with these parameters:
236
+ ```json
237
+ {
238
+ "loss_fct": "torch.nn.modules.loss.MSELoss"
239
+ }
240
+ ```
241
+
242
+ ### Training Hyperparameters
243
+ #### Non-Default Hyperparameters
244
+
245
+ - `eval_strategy`: steps
246
+ - `per_device_train_batch_size`: 32
247
+ - `per_device_eval_batch_size`: 32
248
+ - `num_train_epochs`: 8
249
+ - `warmup_ratio`: 0.1
250
+ - `fp16`: True
251
+ - `dataloader_num_workers`: 4
252
+ - `load_best_model_at_end`: True
253
+ - `label_smoothing_factor`: 0.05
254
+ - `eval_on_start`: True
255
+
256
+ #### All Hyperparameters
257
+ <details><summary>Click to expand</summary>
258
+
259
+ - `overwrite_output_dir`: False
260
+ - `do_predict`: False
261
+ - `eval_strategy`: steps
262
+ - `prediction_loss_only`: True
263
+ - `per_device_train_batch_size`: 32
264
+ - `per_device_eval_batch_size`: 32
265
+ - `per_gpu_train_batch_size`: None
266
+ - `per_gpu_eval_batch_size`: None
267
+ - `gradient_accumulation_steps`: 1
268
+ - `eval_accumulation_steps`: None
269
+ - `torch_empty_cache_steps`: None
270
+ - `learning_rate`: 5e-05
271
+ - `weight_decay`: 0.0
272
+ - `adam_beta1`: 0.9
273
+ - `adam_beta2`: 0.999
274
+ - `adam_epsilon`: 1e-08
275
+ - `max_grad_norm`: 1.0
276
+ - `num_train_epochs`: 8
277
+ - `max_steps`: -1
278
+ - `lr_scheduler_type`: linear
279
+ - `lr_scheduler_kwargs`: {}
280
+ - `warmup_ratio`: 0.1
281
+ - `warmup_steps`: 0
282
+ - `log_level`: passive
283
+ - `log_level_replica`: warning
284
+ - `log_on_each_node`: True
285
+ - `logging_nan_inf_filter`: True
286
+ - `save_safetensors`: True
287
+ - `save_on_each_node`: False
288
+ - `save_only_model`: False
289
+ - `restore_callback_states_from_checkpoint`: False
290
+ - `no_cuda`: False
291
+ - `use_cpu`: False
292
+ - `use_mps_device`: False
293
+ - `seed`: 42
294
+ - `data_seed`: None
295
+ - `jit_mode_eval`: False
296
+ - `use_ipex`: False
297
+ - `bf16`: False
298
+ - `fp16`: True
299
+ - `fp16_opt_level`: O1
300
+ - `half_precision_backend`: auto
301
+ - `bf16_full_eval`: False
302
+ - `fp16_full_eval`: False
303
+ - `tf32`: None
304
+ - `local_rank`: 0
305
+ - `ddp_backend`: None
306
+ - `tpu_num_cores`: None
307
+ - `tpu_metrics_debug`: False
308
+ - `debug`: []
309
+ - `dataloader_drop_last`: False
310
+ - `dataloader_num_workers`: 4
311
+ - `dataloader_prefetch_factor`: None
312
+ - `past_index`: -1
313
+ - `disable_tqdm`: False
314
+ - `remove_unused_columns`: True
315
+ - `label_names`: None
316
+ - `load_best_model_at_end`: True
317
+ - `ignore_data_skip`: False
318
+ - `fsdp`: []
319
+ - `fsdp_min_num_params`: 0
320
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
321
+ - `fsdp_transformer_layer_cls_to_wrap`: None
322
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
323
+ - `deepspeed`: None
324
+ - `label_smoothing_factor`: 0.05
325
+ - `optim`: adamw_torch
326
+ - `optim_args`: None
327
+ - `adafactor`: False
328
+ - `group_by_length`: False
329
+ - `length_column_name`: length
330
+ - `ddp_find_unused_parameters`: None
331
+ - `ddp_bucket_cap_mb`: None
332
+ - `ddp_broadcast_buffers`: False
333
+ - `dataloader_pin_memory`: True
334
+ - `dataloader_persistent_workers`: False
335
+ - `skip_memory_metrics`: True
336
+ - `use_legacy_prediction_loop`: False
337
+ - `push_to_hub`: False
338
+ - `resume_from_checkpoint`: None
339
+ - `hub_model_id`: None
340
+ - `hub_strategy`: every_save
341
+ - `hub_private_repo`: None
342
+ - `hub_always_push`: False
343
+ - `gradient_checkpointing`: False
344
+ - `gradient_checkpointing_kwargs`: None
345
+ - `include_inputs_for_metrics`: False
346
+ - `include_for_metrics`: []
347
+ - `eval_do_concat_batches`: True
348
+ - `fp16_backend`: auto
349
+ - `push_to_hub_model_id`: None
350
+ - `push_to_hub_organization`: None
351
+ - `mp_parameters`:
352
+ - `auto_find_batch_size`: False
353
+ - `full_determinism`: False
354
+ - `torchdynamo`: None
355
+ - `ray_scope`: last
356
+ - `ddp_timeout`: 1800
357
+ - `torch_compile`: False
358
+ - `torch_compile_backend`: None
359
+ - `torch_compile_mode`: None
360
+ - `dispatch_batches`: None
361
+ - `split_batches`: None
362
+ - `include_tokens_per_second`: False
363
+ - `include_num_input_tokens_seen`: False
364
+ - `neftune_noise_alpha`: None
365
+ - `optim_target_modules`: None
366
+ - `batch_eval_metrics`: False
367
+ - `eval_on_start`: True
368
+ - `use_liger_kernel`: False
369
+ - `eval_use_gather_object`: False
370
+ - `average_tokens_across_devices`: False
371
+ - `prompts`: None
372
+ - `batch_sampler`: batch_sampler
373
+ - `multi_dataset_batch_sampler`: proportional
374
+
375
+ </details>
376
+
377
+ ### Training Logs
378
+ | Epoch | Step | Training Loss | Validation Loss | allstats-semantic-mpnet-eval_spearman_cosine | allstats-semantic-mpnet-test_spearman_cosine |
379
+ |:----------:|:---------:|:-------------:|:---------------:|:--------------------------------------------:|:--------------------------------------------:|
380
+ | 0 | 0 | - | 0.0979 | 0.6119 | - |
381
+ | 0.0906 | 250 | 0.0646 | 0.0427 | 0.7249 | - |
382
+ | 0.1813 | 500 | 0.039 | 0.0324 | 0.7596 | - |
383
+ | 0.2719 | 750 | 0.032 | 0.0271 | 0.7860 | - |
384
+ | 0.3626 | 1000 | 0.0276 | 0.0255 | 0.7920 | - |
385
+ | 0.4532 | 1250 | 0.0264 | 0.0230 | 0.8072 | - |
386
+ | 0.5439 | 1500 | 0.0249 | 0.0222 | 0.8197 | - |
387
+ | 0.6345 | 1750 | 0.0226 | 0.0210 | 0.8200 | - |
388
+ | 0.7252 | 2000 | 0.0218 | 0.0209 | 0.8202 | - |
389
+ | 0.8158 | 2250 | 0.0208 | 0.0201 | 0.8346 | - |
390
+ | 0.9065 | 2500 | 0.0209 | 0.0211 | 0.8240 | - |
391
+ | 0.9971 | 2750 | 0.0211 | 0.0190 | 0.8170 | - |
392
+ | 1.0877 | 3000 | 0.0161 | 0.0182 | 0.8332 | - |
393
+ | 1.1784 | 3250 | 0.0158 | 0.0179 | 0.8393 | - |
394
+ | 1.2690 | 3500 | 0.0167 | 0.0189 | 0.8341 | - |
395
+ | 1.3597 | 3750 | 0.0152 | 0.0168 | 0.8371 | - |
396
+ | 1.4503 | 4000 | 0.0151 | 0.0165 | 0.8435 | - |
397
+ | 1.5410 | 4250 | 0.0143 | 0.0156 | 0.8365 | - |
398
+ | 1.6316 | 4500 | 0.0147 | 0.0157 | 0.8467 | - |
399
+ | 1.7223 | 4750 | 0.0138 | 0.0155 | 0.8501 | - |
400
+ | 1.8129 | 5000 | 0.0147 | 0.0154 | 0.8457 | - |
401
+ | 1.9036 | 5250 | 0.0137 | 0.0152 | 0.8498 | - |
402
+ | 1.9942 | 5500 | 0.0144 | 0.0143 | 0.8485 | - |
403
+ | 2.0848 | 5750 | 0.0108 | 0.0139 | 0.8439 | - |
404
+ | 2.1755 | 6000 | 0.01 | 0.0146 | 0.8563 | - |
405
+ | 2.2661 | 6250 | 0.011 | 0.0141 | 0.8558 | - |
406
+ | 2.3568 | 6500 | 0.0107 | 0.0144 | 0.8497 | - |
407
+ | 2.4474 | 6750 | 0.01 | 0.0138 | 0.8577 | - |
408
+ | 2.5381 | 7000 | 0.0097 | 0.0136 | 0.8585 | - |
409
+ | 2.6287 | 7250 | 0.0102 | 0.0135 | 0.8521 | - |
410
+ | 2.7194 | 7500 | 0.0106 | 0.0133 | 0.8537 | - |
411
+ | 2.8100 | 7750 | 0.0098 | 0.0133 | 0.8643 | - |
412
+ | 2.9007 | 8000 | 0.0105 | 0.0138 | 0.8543 | - |
413
+ | 2.9913 | 8250 | 0.009 | 0.0129 | 0.8555 | - |
414
+ | 3.0819 | 8500 | 0.0071 | 0.0121 | 0.8692 | - |
415
+ | 3.1726 | 8750 | 0.006 | 0.0120 | 0.8709 | - |
416
+ | 3.2632 | 9000 | 0.0078 | 0.0120 | 0.8660 | - |
417
+ | 3.3539 | 9250 | 0.0072 | 0.0122 | 0.8656 | - |
418
+ | 3.4445 | 9500 | 0.007 | 0.0123 | 0.8696 | - |
419
+ | 3.5352 | 9750 | 0.0075 | 0.0117 | 0.8707 | - |
420
+ | 3.6258 | 10000 | 0.0081 | 0.0115 | 0.8682 | - |
421
+ | 3.7165 | 10250 | 0.0083 | 0.0116 | 0.8617 | - |
422
+ | 3.8071 | 10500 | 0.0075 | 0.0116 | 0.8665 | - |
423
+ | 3.8978 | 10750 | 0.0077 | 0.0119 | 0.8733 | - |
424
+ | 3.9884 | 11000 | 0.008 | 0.0113 | 0.8678 | - |
425
+ | 4.0790 | 11250 | 0.0051 | 0.0110 | 0.8760 | - |
426
+ | 4.1697 | 11500 | 0.0052 | 0.0108 | 0.8729 | - |
427
+ | 4.2603 | 11750 | 0.0056 | 0.0108 | 0.8771 | - |
428
+ | 4.3510 | 12000 | 0.0052 | 0.0109 | 0.8793 | - |
429
+ | 4.4416 | 12250 | 0.0049 | 0.0109 | 0.8766 | - |
430
+ | 4.5323 | 12500 | 0.0055 | 0.0114 | 0.8742 | - |
431
+ | 4.6229 | 12750 | 0.0061 | 0.0108 | 0.8749 | - |
432
+ | 4.7136 | 13000 | 0.0058 | 0.0109 | 0.8833 | - |
433
+ | 4.8042 | 13250 | 0.0049 | 0.0108 | 0.8767 | - |
434
+ | 4.8949 | 13500 | 0.0046 | 0.0108 | 0.8839 | - |
435
+ | 4.9855 | 13750 | 0.0052 | 0.0104 | 0.8790 | - |
436
+ | 5.0761 | 14000 | 0.0041 | 0.0102 | 0.8826 | - |
437
+ | 5.1668 | 14250 | 0.004 | 0.0103 | 0.8775 | - |
438
+ | 5.2574 | 14500 | 0.0036 | 0.0102 | 0.8855 | - |
439
+ | 5.3481 | 14750 | 0.0037 | 0.0104 | 0.8841 | - |
440
+ | 5.4387 | 15000 | 0.0036 | 0.0101 | 0.8860 | - |
441
+ | 5.5294 | 15250 | 0.0043 | 0.0104 | 0.8852 | - |
442
+ | 5.6200 | 15500 | 0.004 | 0.0100 | 0.8856 | - |
443
+ | 5.7107 | 15750 | 0.0043 | 0.0101 | 0.8842 | - |
444
+ | 5.8013 | 16000 | 0.0043 | 0.0099 | 0.8835 | - |
445
+ | 5.8920 | 16250 | 0.0041 | 0.0099 | 0.8852 | - |
446
+ | 5.9826 | 16500 | 0.0036 | 0.0101 | 0.8866 | - |
447
+ | 6.0732 | 16750 | 0.0031 | 0.0100 | 0.8881 | - |
448
+ | 6.1639 | 17000 | 0.0031 | 0.0098 | 0.8880 | - |
449
+ | 6.2545 | 17250 | 0.0027 | 0.0098 | 0.8886 | - |
450
+ | 6.3452 | 17500 | 0.0032 | 0.0097 | 0.8868 | - |
451
+ | 6.4358 | 17750 | 0.0027 | 0.0097 | 0.8876 | - |
452
+ | 6.5265 | 18000 | 0.0031 | 0.0097 | 0.8893 | - |
453
+ | 6.6171 | 18250 | 0.0032 | 0.0096 | 0.8903 | - |
454
+ | 6.7078 | 18500 | 0.003 | 0.0096 | 0.8898 | - |
455
+ | 6.7984 | 18750 | 0.0029 | 0.0098 | 0.8907 | - |
456
+ | 6.8891 | 19000 | 0.003 | 0.0096 | 0.8896 | - |
457
+ | 6.9797 | 19250 | 0.0026 | 0.0096 | 0.8913 | - |
458
+ | 7.0703 | 19500 | 0.0024 | 0.0096 | 0.8921 | - |
459
+ | 7.1610 | 19750 | 0.0021 | 0.0097 | 0.8920 | - |
460
+ | 7.2516 | 20000 | 0.0023 | 0.0096 | 0.8910 | - |
461
+ | 7.3423 | 20250 | 0.002 | 0.0096 | 0.8920 | - |
462
+ | 7.4329 | 20500 | 0.0022 | 0.0096 | 0.8924 | - |
463
+ | 7.5236 | 20750 | 0.002 | 0.0097 | 0.8917 | - |
464
+ | 7.6142 | 21000 | 0.0024 | 0.0096 | 0.8923 | - |
465
+ | 7.7049 | 21250 | 0.0025 | 0.0095 | 0.8928 | - |
466
+ | 7.7955 | 21500 | 0.0022 | 0.0095 | 0.8931 | - |
467
+ | 7.8861 | 21750 | 0.0023 | 0.0095 | 0.8932 | - |
468
+ | **7.9768** | **22000** | **0.0022** | **0.0095** | **0.8934** | **-** |
469
+ | 8.0 | 22064 | - | - | - | 0.8932 |
470
+
471
+ * The bold row denotes the saved checkpoint.
472
+
473
+ ### Framework Versions
474
+ - Python: 3.10.12
475
+ - Sentence Transformers: 3.3.1
476
+ - Transformers: 4.48.0
477
+ - PyTorch: 2.4.1+cu121
478
+ - Accelerate: 0.34.2
479
+ - Datasets: 3.2.0
480
+ - Tokenizers: 0.21.0
481
+
482
+ ## Citation
483
+
484
+ ### BibTeX
485
+
486
+ #### Sentence Transformers
487
+ ```bibtex
488
+ @inproceedings{reimers-2019-sentence-bert,
489
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
490
+ author = "Reimers, Nils and Gurevych, Iryna",
491
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
492
+ month = "11",
493
+ year = "2019",
494
+ publisher = "Association for Computational Linguistics",
495
+ url = "https://arxiv.org/abs/1908.10084",
496
+ }
497
+ ```
498
+
499
+ <!--
500
+ ## Glossary
501
+
502
+ *Clearly define terms in order to be accessible across audiences.*
503
+ -->
504
+
505
+ <!--
506
+ ## Model Card Authors
507
+
508
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
509
+ -->
510
+
511
+ <!--
512
+ ## Model Card Contact
513
+
514
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
515
+ -->
config.json ADDED
@@ -0,0 +1,29 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "sentence-transformers/paraphrase-multilingual-mpnet-base-v2",
3
+ "architectures": [
4
+ "XLMRobertaModel"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "bos_token_id": 0,
8
+ "classifier_dropout": null,
9
+ "eos_token_id": 2,
10
+ "gradient_checkpointing": false,
11
+ "hidden_act": "gelu",
12
+ "hidden_dropout_prob": 0.1,
13
+ "hidden_size": 768,
14
+ "initializer_range": 0.02,
15
+ "intermediate_size": 3072,
16
+ "layer_norm_eps": 1e-05,
17
+ "max_position_embeddings": 514,
18
+ "model_type": "xlm-roberta",
19
+ "num_attention_heads": 12,
20
+ "num_hidden_layers": 12,
21
+ "output_past": true,
22
+ "pad_token_id": 1,
23
+ "position_embedding_type": "absolute",
24
+ "torch_dtype": "float32",
25
+ "transformers_version": "4.48.0",
26
+ "type_vocab_size": 1,
27
+ "use_cache": true,
28
+ "vocab_size": 250002
29
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "3.3.1",
4
+ "transformers": "4.48.0",
5
+ "pytorch": "2.4.1+cu121"
6
+ },
7
+ "prompts": {},
8
+ "default_prompt_name": null,
9
+ "similarity_fn_name": "cosine"
10
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2c47fcfee464447a0035b31cd65880eb3b8be3a14b106bc5a3ac1094248f7934
3
+ size 1112197096
modules.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ }
14
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 128,
3
+ "do_lower_case": false
4
+ }
sentencepiece.bpe.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:cfc8146abe2a0488e9e2a0c56de7952f7c11ab059eca145a0a727afce0db2865
3
+ size 5069051
special_tokens_map.json ADDED
@@ -0,0 +1,51 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<s>",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "cls_token": {
10
+ "content": "<s>",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "eos_token": {
17
+ "content": "</s>",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "mask_token": {
24
+ "content": "<mask>",
25
+ "lstrip": true,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "pad_token": {
31
+ "content": "<pad>",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ },
37
+ "sep_token": {
38
+ "content": "</s>",
39
+ "lstrip": false,
40
+ "normalized": false,
41
+ "rstrip": false,
42
+ "single_word": false
43
+ },
44
+ "unk_token": {
45
+ "content": "<unk>",
46
+ "lstrip": false,
47
+ "normalized": false,
48
+ "rstrip": false,
49
+ "single_word": false
50
+ }
51
+ }
tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:cad551d5600a84242d0973327029452a1e3672ba6313c2a3c3d69c4310e12719
3
+ size 17082987
tokenizer_config.json ADDED
@@ -0,0 +1,62 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "<s>",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "1": {
12
+ "content": "<pad>",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "2": {
20
+ "content": "</s>",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "3": {
28
+ "content": "<unk>",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "250001": {
36
+ "content": "<mask>",
37
+ "lstrip": true,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "bos_token": "<s>",
45
+ "clean_up_tokenization_spaces": false,
46
+ "cls_token": "<s>",
47
+ "eos_token": "</s>",
48
+ "extra_special_tokens": {},
49
+ "mask_token": "<mask>",
50
+ "max_length": 128,
51
+ "model_max_length": 128,
52
+ "pad_to_multiple_of": null,
53
+ "pad_token": "<pad>",
54
+ "pad_token_type_id": 0,
55
+ "padding_side": "right",
56
+ "sep_token": "</s>",
57
+ "stride": 0,
58
+ "tokenizer_class": "XLMRobertaTokenizer",
59
+ "truncation_side": "right",
60
+ "truncation_strategy": "longest_first",
61
+ "unk_token": "<unk>"
62
+ }