SentenceTransformer based on BAAI/bge-m3

This is a sentence-transformers model finetuned from BAAI/bge-m3. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: BAAI/bge-m3
  • Maximum Sequence Length: 1536 tokens
  • Output Dimensionality: 1024 tokens
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 1536, 'do_lower_case': False}) with Transformer model: XLMRobertaModel 
  (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("seongil-dn/bge-m3-mrl-330")
# Run inference
sentences = [
    '어떤 사람의 연금 수령액을 증가시키면 연금재정이 어려워져?',
    '한편, 제19대국회에서는 소득대체율을 높이지 않는 대신, 연금급여산식의 기준이 되는 기준소득월액의 상ㆍ하한액을 인상함으로써 가입자 전체의 소득평균을 높여 보험급여를 인상하는 방안도 논의되었다. 이 방안은 소득재분배 부문에 해당하는 국민연금의 A값을 상향하여 소득재분배 기능을 강화하는 장점을 가진 반면, 보험료가 인상되는 저소득층 가입자와 영세사업장, 그리고 고소득 사업장가입자와 사업장의 연금보험료 부담이 증가하기 때문에, 경제 및 산업계의 반발로 이어질 가능성도 있다. 또한 고소득 가입자들의 연급수급액의 증가는 시간의 경과에 따라 연금재정에 추가적인 부담을 주게 된다는 것이다.',
    '다. 재정<br>□ 저출산·고령화의 진전으로 세원이 되는 생산가능인구의 비중은 줄어들고, 연금급여 및 의료비 지출 등은 늘어남에 따라 재정수지 부담은 가중될 전망<br>― 출산율이 하락하면 전체 인구 중 생산가능인구의 비율이 감소하고 따라서 세수 감소로 이어질 가능성<br>― 반면, 고령화로 인해 연금수급자가 증가하면 연금 및 의료비 등의 재정지출 확대로 이어질 가능성<br>― 국민연금 가입자 중 노령연금 수급율은 인구감소 및 은퇴자 증가에 따라 2010년 13.3%, 2030년 41.9%, 2050년 88.5%로 급증할 전망<br>□ IMF에 따르면 GDP 대비 재정수지는 생산가능인구비율 1% 증가 시 0.06%p 개선되는 반면, 노인인구 1% 증가시 0.46%p 악화<br>― 또한, OECD는 고령화로 인해 노인관련 재정지출이 급증해 주요국의 2050년 재정수지가 적자를 기록할 것으로 전망',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 32
  • gradient_accumulation_steps: 32
  • learning_rate: 3e-05
  • weight_decay: 0.01
  • warmup_ratio: 0.05
  • fp16: True
  • gradient_checkpointing: True
  • gradient_checkpointing_kwargs: {'use_reentrant': False}
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: no
  • prediction_loss_only: True
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 8
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 32
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 3e-05
  • weight_decay: 0.01
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 3
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.05
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: True
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: True
  • gradient_checkpointing_kwargs: {'use_reentrant': False}
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • eval_use_gather_object: False
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Click to expand
Epoch Step Training Loss
0.0091 1 15.81
0.0181 2 15.9499
0.0272 3 15.3393
0.0363 4 15.4563
0.0453 5 15.5322
0.0544 6 16.0348
0.0635 7 15.3445
0.0725 8 15.7129
0.0816 9 14.4393
0.0907 10 13.4846
0.0997 11 12.5233
0.1088 12 12.1728
0.1178 13 11.9232
0.1269 14 11.5308
0.1360 15 10.7525
0.1450 16 10.393
0.1541 17 9.7346
0.1632 18 9.4875
0.1722 19 9.2608
0.1813 20 8.7966
0.1904 21 8.5579
0.1994 22 8.4993
0.2085 23 8.1505
0.2176 24 8.5027
0.2266 25 7.9795
0.2357 26 7.5782
0.2448 27 7.68
0.2538 28 7.539
0.2629 29 7.5871
0.2720 30 7.2676
0.2810 31 6.9613
0.2901 32 6.89
0.2992 33 6.7585
0.3082 34 6.7286
0.3173 35 6.754
0.3263 36 6.7466
0.3354 37 6.6096
0.3445 38 6.5864
0.3535 39 6.5235
0.3626 40 6.5429
0.3717 41 6.4971
0.3807 42 6.4463
0.3898 43 6.332
0.3989 44 6.1275
0.4079 45 6.2551
0.4170 46 6.1372
0.4261 47 6.1075
0.4351 48 6.1408
0.4442 49 6.062
0.4533 50 5.9831
0.4623 51 5.9956
0.4714 52 5.8332
0.4805 53 5.7447
0.4895 54 5.9531
0.4986 55 5.911
0.5076 56 5.8576
0.5167 57 5.8116
0.5258 58 5.6564
0.5348 59 5.7289
0.5439 60 5.7514
0.5530 61 5.5991
0.5620 62 5.553
0.5711 63 5.4728
0.5802 64 5.6212
0.5892 65 5.6554
0.5983 66 5.4389
0.6074 67 5.3669
0.6164 68 5.5667
0.6255 69 5.4106
0.6346 70 5.3122
0.6436 71 5.4145
0.6527 72 5.3794
0.6618 73 5.269
0.6708 74 5.3583
0.6799 75 5.311
0.6890 76 5.2061
0.6980 77 5.133
0.7071 78 5.4036
0.7161 79 5.2761
0.7252 80 5.0696
0.7343 81 5.3648
0.7433 82 5.0591
0.7524 83 5.074
0.7615 84 5.1789
0.7705 85 5.0147
0.7796 86 5.251
0.7887 87 5.1282
0.7977 88 5.1111
0.8068 89 5.2096
0.8159 90 5.0734
0.8249 91 4.9202
0.8340 92 5.0058
0.8431 93 5.0928
0.8521 94 4.9845
0.8612 95 5.0683
0.8703 96 5.0267
0.8793 97 5.0821
0.8884 98 4.8806
0.8975 99 5.0043
0.9065 100 4.888
0.9156 101 5.0629
0.9246 102 5.0454
0.9337 103 4.9619
0.9428 104 4.9217
0.9518 105 4.7401
0.9609 106 4.8068
0.9700 107 4.8151
0.9790 108 4.8689
0.9881 109 5.0193
0.9972 110 4.706
1.0062 111 4.8057
1.0153 112 4.7279
1.0244 113 4.7721
1.0334 114 4.7767
1.0425 115 4.669
1.0516 116 4.8533
1.0606 117 4.8634
1.0697 118 4.9135
1.0788 119 4.7629
1.0878 120 4.7479
1.0969 121 4.743
1.1059 122 4.5606
1.1150 123 4.6933
1.1241 124 4.6659
1.1331 125 4.7131
1.1422 126 4.7059
1.1513 127 4.5701
1.1603 128 4.4892
1.1694 129 4.6497
1.1785 130 4.4814
1.1875 131 4.2669
1.1966 132 4.4983
1.2057 133 4.431
1.2147 134 4.414
1.2238 135 4.3975
1.2329 136 4.3101
1.2419 137 4.3422
1.2510 138 4.476
1.2601 139 4.6629
1.2691 140 4.3559
1.2782 141 4.2049
1.2873 142 4.303
1.2963 143 4.3053
1.3054 144 4.2366
1.3144 145 4.5165
1.3235 146 4.2634
1.3326 147 4.4295
1.3416 148 4.2595
1.3507 149 4.3753
1.3598 150 4.3454
1.3688 151 4.2618
1.3779 152 4.4016
1.3870 153 4.2672
1.3960 154 4.1824
1.4051 155 4.3268
1.4142 156 4.091
1.4232 157 4.3111
1.4323 158 4.2397
1.4414 159 4.1694
1.4504 160 4.2119
1.4595 161 4.1292
1.4686 162 4.1154
1.4776 163 4.1638
1.4867 164 4.3548
1.4958 165 4.2137
1.5048 166 4.1888
1.5139 167 4.2609
1.5229 168 4.2644
1.5320 169 4.2183
1.5411 170 4.2414
1.5501 171 4.242
1.5592 172 4.0547
1.5683 173 4.1509
1.5773 174 4.247
1.5864 175 4.3103
1.5955 176 4.0845
1.6045 177 4.0918
1.6136 178 4.1582
1.6227 179 4.2982
1.6317 180 4.0515
1.6408 181 4.0738
1.6499 182 4.2416
1.6589 183 4.1212
1.6680 184 4.174
1.6771 185 4.1369
1.6861 186 3.9908
1.6952 187 4.1155
1.7042 188 3.9893
1.7133 189 4.2362
1.7224 190 4.074
1.7314 191 4.0604
1.7405 192 4.0065
1.7496 193 4.0041
1.7586 194 4.0428
1.7677 195 4.0094
1.7768 196 3.962
1.7858 197 4.1932
1.7949 198 4.133
1.8040 199 4.1344
1.8130 200 4.1004
1.8221 201 4.0633
1.8312 202 4.0545
1.8402 203 4.0434
1.8493 204 4.0576
1.8584 205 4.0892
1.8674 206 4.1945
1.8765 207 4.0809
1.8856 208 4.0655
1.8946 209 4.155
1.9037 210 4.0801
1.9127 211 4.0837
1.9218 212 4.1487
1.9309 213 4.0574
1.9399 214 4.0952
1.9490 215 4.0414
1.9581 216 3.9645
1.9671 217 4.0327
1.9762 218 3.9183
1.9853 219 4.1204
1.9943 220 4.0043
2.0034 221 3.904
2.0125 222 4.0489
2.0215 223 4.0316
2.0306 224 3.9649
2.0397 225 3.891
2.0487 226 4.0352
2.0578 227 4.1811
2.0669 228 4.1212
2.0759 229 4.2356
2.0850 230 4.1295
2.0941 231 4.0231
2.1031 232 3.914
2.1122 233 3.916
2.1212 234 3.8657
2.1303 235 4.0986
2.1394 236 3.9774
2.1484 237 3.9112
2.1575 238 3.8232
2.1666 239 3.85
2.1756 240 3.8874
2.1847 241 3.6777
2.1938 242 3.7898
2.2028 243 3.8527
2.2119 244 3.7038
2.2210 245 3.9404
2.2300 246 3.7468
2.2391 247 3.7905
2.2482 248 3.8356
2.2572 249 3.9682
2.2663 250 3.9372
2.2754 251 3.7579
2.2844 252 3.6927
2.2935 253 3.7372
2.3025 254 3.6125
2.3116 255 4.0475
2.3207 256 3.7422
2.3297 257 3.8646
2.3388 258 3.6637
2.3479 259 3.8496
2.3569 260 3.753
2.3660 261 3.7632
2.3751 262 3.7097
2.3841 263 3.8584
2.3932 264 3.6547
2.4023 265 3.7595
2.4113 266 3.6346
2.4204 267 3.8937
2.4295 268 3.7423
2.4385 269 3.8051
2.4476 270 3.7131
2.4567 271 3.6623
2.4657 272 3.7444
2.4748 273 3.7229
2.4839 274 3.7874
2.4929 275 3.714
2.5020 276 3.6972
2.5110 277 3.7421
2.5201 278 3.8071
2.5292 279 3.7042
2.5382 280 3.7569
2.5473 281 3.8477
2.5564 282 3.7502
2.5654 283 3.7096
2.5745 284 3.7251
2.5836 285 3.8462
2.5926 286 3.747
2.6017 287 3.6436
2.6108 288 3.7176
2.6198 289 3.8406
2.6289 290 3.6416
2.6380 291 3.6793
2.6470 292 3.7892
2.6561 293 3.7827
2.6652 294 3.6192
2.6742 295 3.9168
2.6833 296 3.7271
2.6924 297 3.6852
2.7014 298 3.5507
2.7105 299 3.8567
2.7195 300 3.8098
2.7286 301 3.6685
2.7377 302 3.6163
2.7467 303 3.7439
2.7558 304 3.6212
2.7649 305 3.62
2.7739 306 3.6728
2.7830 307 3.7061
2.7921 308 3.8473
2.8011 309 3.7974
2.8102 310 3.6624
2.8193 311 3.7357
2.8283 312 3.7277
2.8374 313 3.6717
2.8465 314 3.7568
2.8555 315 3.6942
2.8646 316 3.7497
2.8737 317 3.7765
2.8827 318 3.709
2.8918 319 3.8016
2.9008 320 3.7998
2.9099 321 3.76
2.9190 322 3.748
2.9280 323 3.7235
2.9371 324 3.7455
2.9462 325 3.8345
2.9552 326 3.6403
2.9643 327 3.754
2.9734 328 3.6126
2.9824 329 3.7963
2.9915 330 3.8263

Framework Versions

  • Python: 3.10.12
  • Sentence Transformers: 3.2.1
  • Transformers: 4.44.2
  • PyTorch: 2.3.1+cu121
  • Accelerate: 1.1.1
  • Datasets: 2.21.0
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MatryoshkaLoss

@misc{kusupati2024matryoshka,
    title={Matryoshka Representation Learning},
    author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
    year={2024},
    eprint={2205.13147},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
4
Safetensors
Model size
568M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.

Model tree for seongil-dn/bge-m3-mrl-330

Base model

BAAI/bge-m3
Finetuned
(197)
this model