SentenceTransformer based on BAAI/bge-m3

This is a sentence-transformers model finetuned from BAAI/bge-m3. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: BAAI/bge-m3
  • Maximum Sequence Length: 8192 tokens
  • Output Dimensionality: 1024 tokens
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 8192, 'do_lower_case': False}) with Transformer model: XLMRobertaModel 
  (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("seregadgl101/test_bge_2_10ep")
# Run inference
sentences = [
    'набор моя первая кухня',
    'кухонные наборы',
    'ea sports fc 23 ps4',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Semantic Similarity

Metric Value
pearson_cosine 0.9702
spearman_cosine 0.9169
pearson_manhattan 0.9696
spearman_manhattan 0.9166
pearson_euclidean 0.9696
spearman_euclidean 0.9166
pearson_dot 0.9631
spearman_dot 0.9173
pearson_max 0.9702
spearman_max 0.9173

Training Details

Training Dataset

Unnamed Dataset

  • Size: 4,532 training samples
  • Columns: sentence1, sentence2, and score
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2 score
    type string string float
    details
    • min: 4 tokens
    • mean: 14.45 tokens
    • max: 48 tokens
    • min: 3 tokens
    • mean: 13.09 tokens
    • max: 51 tokens
    • min: 0.0
    • mean: 0.6
    • max: 1.0
  • Samples:
    sentence1 sentence2 score
    батут evo jump internal 12ft батут evo jump internal 12ft 1.0
    наручные часы orient casual наручные часы orient 1.0
    электрический духовой шкаф weissgauff eov 19 mw электрический духовой шкаф weissgauff eov 19 mx 0.4
  • Loss: CoSENTLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "pairwise_cos_sim"
    }
    

Evaluation Dataset

Unnamed Dataset

  • Size: 504 evaluation samples
  • Columns: sentence1, sentence2, and score
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2 score
    type string string float
    details
    • min: 4 tokens
    • mean: 14.93 tokens
    • max: 48 tokens
    • min: 4 tokens
    • mean: 13.1 tokens
    • max: 40 tokens
    • min: 0.0
    • mean: 0.59
    • max: 1.0
  • Samples:
    sentence1 sentence2 score
    потолочный светильник yeelight smart led ceiling light c2001s500 yeelight smart led ceiling light c2001s500 1.0
    канцелярские принадлежности канцелярские принадлежности разные 0.4
    usb-магнитола acv avs-1718g автомагнитола acv avs-1718g 1.0
  • Loss: CoSENTLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "pairwise_cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • learning_rate: 2e-05
  • num_train_epochs: 10
  • warmup_ratio: 0.1
  • save_only_model: True
  • seed: 33
  • fp16: True
  • load_best_model_at_end: True

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 8
  • per_device_eval_batch_size: 8
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 10
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: True
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 33
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional

Training Logs

Click to expand
Epoch Step Training Loss loss sts-dev_spearman_cosine
0.0882 50 - 2.7444 0.4991
0.1764 100 - 2.5535 0.6093
0.2646 150 - 2.3365 0.6761
0.3527 200 - 2.1920 0.7247
0.4409 250 - 2.2210 0.7446
0.5291 300 - 2.1432 0.7610
0.6173 350 - 2.2488 0.7769
0.7055 400 - 2.3736 0.7749
0.7937 450 - 2.0688 0.7946
0.8818 500 2.3647 2.5331 0.7879
0.9700 550 - 2.1087 0.7742
1.0582 600 - 2.1302 0.8068
1.1464 650 - 2.2669 0.8114
1.2346 700 - 2.0269 0.8039
1.3228 750 - 2.2095 0.8138
1.4109 800 - 2.5288 0.8190
1.4991 850 - 2.3442 0.8222
1.5873 900 - 2.3759 0.8289
1.6755 950 - 2.1893 0.8280
1.7637 1000 2.0682 2.0056 0.8426
1.8519 1050 - 2.0832 0.8527
1.9400 1100 - 2.0336 0.8515
2.0282 1150 - 2.0571 0.8591
2.1164 1200 - 2.1516 0.8565
2.2046 1250 - 2.2035 0.8602
2.2928 1300 - 2.5294 0.8513
2.3810 1350 - 2.4177 0.8647
2.4691 1400 - 2.1630 0.8709
2.5573 1450 - 2.1279 0.8661
2.6455 1500 1.678 2.1639 0.8744
2.7337 1550 - 2.2592 0.8799
2.8219 1600 - 2.2288 0.8822
2.9101 1650 - 2.2427 0.8831
2.9982 1700 - 2.4380 0.8776
3.0864 1750 - 2.1689 0.8826
3.1746 1800 - 1.8099 0.8868
3.2628 1850 - 2.0881 0.8832
3.3510 1900 - 2.0785 0.8892
3.4392 1950 - 2.2512 0.8865
3.5273 2000 1.2168 2.1249 0.8927
3.6155 2050 - 2.1179 0.8950
3.7037 2100 - 2.1932 0.8973
3.7919 2150 - 2.2628 0.8967
3.8801 2200 - 2.0764 0.8972
3.9683 2250 - 1.9575 0.9012
4.0564 2300 - 2.3302 0.8985
4.1446 2350 - 2.3008 0.8980
4.2328 2400 - 2.2886 0.8968
4.3210 2450 - 2.1694 0.8973
4.4092 2500 1.0851 2.1102 0.9010
4.4974 2550 - 2.2596 0.9021
4.5855 2600 - 2.1944 0.9019
4.6737 2650 - 2.0728 0.9029
4.7619 2700 - 2.4573 0.9031
4.8501 2750 - 2.2306 0.9057
4.9383 2800 - 2.2637 0.9068
5.0265 2850 - 2.5110 0.9068
5.1146 2900 - 2.6613 0.9042
5.2028 2950 - 2.4713 0.9070
5.2910 3000 0.8143 2.3709 0.9082
5.3792 3050 - 2.6083 0.9058
5.4674 3100 - 2.5377 0.9044
5.5556 3150 - 2.3146 0.9071
5.6437 3200 - 2.2603 0.9085
5.7319 3250 - 2.5842 0.9068
5.8201 3300 - 2.6045 0.9093
5.9083 3350 - 2.6207 0.9103
5.9965 3400 - 2.5992 0.9098
6.0847 3450 - 2.7799 0.9090
6.1728 3500 0.5704 2.7198 0.9098
6.2610 3550 - 2.9783 0.9089
6.3492 3600 - 2.4165 0.9120
6.4374 3650 - 2.4488 0.9122
6.5256 3700 - 2.6764 0.9113
6.6138 3750 - 2.5327 0.9130
6.7019 3800 - 2.5875 0.9129
6.7901 3850 - 2.7036 0.9130
6.8783 3900 - 2.7566 0.9120
6.9665 3950 - 2.5488 0.9127
7.0547 4000 0.4287 2.8512 0.9127
7.1429 4050 - 2.7361 0.9128
7.2310 4100 - 2.7434 0.9135
7.3192 4150 - 2.9410 0.9129
7.4074 4200 - 2.9452 0.9126
7.4956 4250 - 2.8665 0.9140
7.5838 4300 - 2.8215 0.9145
7.6720 4350 - 2.6978 0.9147
7.7601 4400 - 2.8445 0.9143
7.8483 4450 - 2.6041 0.9155
7.9365 4500 0.3099 2.7219 0.9155
8.0247 4550 - 2.7180 0.9160
8.1129 4600 - 2.6906 0.9160
8.2011 4650 - 2.8628 0.9156
8.2892 4700 - 2.7820 0.9158
8.3774 4750 - 2.8457 0.9157
8.4656 4800 - 2.7286 0.9160
8.5538 4850 - 2.7131 0.9164
8.6420 4900 - 2.8368 0.9165
8.7302 4950 - 2.8033 0.9167
8.8183 5000 0.2342 2.7307 0.9169
8.9065 5050 - 2.8483 0.9167
8.9947 5100 - 2.9736 0.9167
9.0829 5150 - 2.9151 0.9168
9.1711 5200 - 2.9375 0.9167
9.2593 5250 - 2.9968 0.9168
9.3474 5300 - 3.0024 0.9167
9.4356 5350 - 2.9444 0.9167
9.5238 5400 - 2.9477 0.9167
9.6120 5450 - 2.9205 0.9168
9.7002 5500 0.1639 2.9286 0.9167
9.7884 5550 - 2.9421 0.9168
9.8765 5600 - 2.9733 0.9168
9.9647 5650 - 2.9777 0.9169
  • The bold row denotes the saved checkpoint.

Framework Versions

  • Python: 3.10.12
  • Sentence Transformers: 3.0.1
  • Transformers: 4.41.2
  • PyTorch: 2.1.2+cu121
  • Accelerate: 0.31.0
  • Datasets: 2.20.0
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

CoSENTLoss

@online{kexuefm-8847,
    title={CoSENT: A more efficient sentence vector scheme than Sentence-BERT},
    author={Su Jianlin},
    year={2022},
    month={Jan},
    url={https://kexue.fm/archives/8847},
}
Downloads last month
15
Safetensors
Model size
568M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for seregadgl101/test_bge_2_10ep

Base model

BAAI/bge-m3
Finetuned
(191)
this model

Evaluation results