philipp-zettl's picture
Add new SentenceTransformer model.
bbd7653 verified
|
raw
history blame
19.5 kB
metadata
language: []
library_name: sentence-transformers
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - generated_from_trainer
  - dataset_size:844
  - loss:CoSENTLoss
base_model: sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
datasets: []
metrics:
  - pearson_cosine
  - spearman_cosine
  - pearson_manhattan
  - spearman_manhattan
  - pearson_euclidean
  - spearman_euclidean
  - pearson_dot
  - spearman_dot
  - pearson_max
  - spearman_max
widget:
  - source_sentence: Help fix a problem with my device’s battery life
    sentences:
      - order query
      - faq query
      - technical support query
  - source_sentence: 订购一双运动鞋
    sentences:
      - service request
      - feedback query
      - product query
  - source_sentence: 告诉我如何更改我的密码
    sentences:
      - support query
      - product query
      - faq query
  - source_sentence: Get information on the next local festival
    sentences:
      - event inquiry
      - service request
      - account query
  - source_sentence: Change the currency for my payment
    sentences:
      - product query
      - payment query
      - faq query
pipeline_tag: sentence-similarity
model-index:
  - name: >-
      SentenceTransformer based on
      sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
    results:
      - task:
          type: semantic-similarity
          name: Semantic Similarity
        dataset:
          name: MiniLM dev
          type: MiniLM-dev
        metrics:
          - type: pearson_cosine
            value: 0.7356955662825808
            name: Pearson Cosine
          - type: spearman_cosine
            value: 0.7320761390174187
            name: Spearman Cosine
          - type: pearson_manhattan
            value: 0.6240041985776243
            name: Pearson Manhattan
          - type: spearman_manhattan
            value: 0.6179783414452009
            name: Spearman Manhattan
          - type: pearson_euclidean
            value: 0.6321466982201008
            name: Pearson Euclidean
          - type: spearman_euclidean
            value: 0.6296964936282937
            name: Spearman Euclidean
          - type: pearson_dot
            value: 0.7491168439451736
            name: Pearson Dot
          - type: spearman_dot
            value: 0.7592129124940543
            name: Spearman Dot
          - type: pearson_max
            value: 0.7491168439451736
            name: Pearson Max
          - type: spearman_max
            value: 0.7592129124940543
            name: Spearman Max
      - task:
          type: semantic-similarity
          name: Semantic Similarity
        dataset:
          name: MiniLM test
          type: MiniLM-test
        metrics:
          - type: pearson_cosine
            value: 0.7687106130417081
            name: Pearson Cosine
          - type: spearman_cosine
            value: 0.7552108666502075
            name: Spearman Cosine
          - type: pearson_manhattan
            value: 0.7462708006775693
            name: Pearson Manhattan
          - type: spearman_manhattan
            value: 0.7365483246407295
            name: Spearman Manhattan
          - type: pearson_euclidean
            value: 0.7545194410402545
            name: Pearson Euclidean
          - type: spearman_euclidean
            value: 0.7465016803791179
            name: Spearman Euclidean
          - type: pearson_dot
            value: 0.7251488155932073
            name: Pearson Dot
          - type: spearman_dot
            value: 0.7390366635753267
            name: Spearman Dot
          - type: pearson_max
            value: 0.7687106130417081
            name: Pearson Max
          - type: spearman_max
            value: 0.7552108666502075
            name: Spearman Max

SentenceTransformer based on sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2

This is a sentence-transformers model finetuned from sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 128, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("philipp-zettl/MiniLM-similarity-small")
# Run inference
sentences = [
    'Change the currency for my payment',
    'payment query',
    'faq query',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Semantic Similarity

Metric Value
pearson_cosine 0.7357
spearman_cosine 0.7321
pearson_manhattan 0.624
spearman_manhattan 0.618
pearson_euclidean 0.6321
spearman_euclidean 0.6297
pearson_dot 0.7491
spearman_dot 0.7592
pearson_max 0.7491
spearman_max 0.7592

Semantic Similarity

Metric Value
pearson_cosine 0.7687
spearman_cosine 0.7552
pearson_manhattan 0.7463
spearman_manhattan 0.7365
pearson_euclidean 0.7545
spearman_euclidean 0.7465
pearson_dot 0.7251
spearman_dot 0.739
pearson_max 0.7687
spearman_max 0.7552

Training Details

Training Dataset

Unnamed Dataset

  • Size: 844 training samples
  • Columns: sentence1, sentence2, and score
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2 score
    type string string float
    details
    • min: 6 tokens
    • mean: 10.8 tokens
    • max: 19 tokens
    • min: 4 tokens
    • mean: 5.33 tokens
    • max: 6 tokens
    • min: 0.0
    • mean: 0.49
    • max: 1.0
  • Samples:
    sentence1 sentence2 score
    Update the payment method for my order order query 1.0
    Не могу установить новое обновление, помогите! support query 1.0
    Помогите мне изменить настройки конфиденциальности support query 1.0
  • Loss: CoSENTLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "pairwise_cos_sim"
    }
    

Evaluation Dataset

Unnamed Dataset

  • Size: 106 evaluation samples
  • Columns: sentence1, sentence2, and score
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2 score
    type string string float
    details
    • min: 6 tokens
    • mean: 10.79 tokens
    • max: 15 tokens
    • min: 4 tokens
    • mean: 5.27 tokens
    • max: 6 tokens
    • min: 0.0
    • mean: 0.51
    • max: 1.0
  • Samples:
    sentence1 sentence2 score
    帮我修复系统错误 support query 1.0
    Je veux commander une pizza product query 1.0
    Fix problems with my device’s Bluetooth connection technical support query 1.0
  • Loss: CoSENTLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "pairwise_cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • learning_rate: 2e-05
  • num_train_epochs: 2
  • warmup_ratio: 0.1
  • fp16: True
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 8
  • per_device_eval_batch_size: 8
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 2
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss loss MiniLM-dev_spearman_cosine MiniLM-test_spearman_cosine
0.0943 10 4.0771 2.2054 0.2529 -
0.1887 20 4.4668 1.8221 0.3528 -
0.2830 30 2.5459 1.5545 0.4638 -
0.3774 40 2.1926 1.3145 0.5569 -
0.4717 50 0.9001 1.1653 0.6285 -
0.5660 60 1.4049 1.0734 0.6834 -
0.6604 70 0.7204 0.9951 0.6988 -
0.7547 80 1.4023 1.1213 0.6945 -
0.8491 90 0.2315 1.2931 0.6414 -
0.9434 100 0.0018 1.3904 0.6180 -
1.0377 110 0.0494 1.2889 0.6322 -
1.1321 120 0.3156 1.2461 0.6402 -
1.2264 130 1.8153 1.0844 0.6716 -
1.3208 140 0.2638 0.9939 0.6957 -
1.4151 150 0.5454 0.9545 0.7056 -
1.5094 160 0.3421 0.9699 0.7062 -
1.6038 170 0.0035 0.9521 0.7093 -
1.6981 180 0.0401 0.8988 0.7160 -
1.7925 190 0.8138 0.8619 0.7271 -
1.8868 200 0.0236 0.8449 0.7315 -
1.9811 210 0.0012 0.8438 0.7321 -
2.0 212 - - - 0.7552

Framework Versions

  • Python: 3.10.14
  • Sentence Transformers: 3.0.1
  • Transformers: 4.41.2
  • PyTorch: 2.3.1+cu121
  • Accelerate: 0.33.0
  • Datasets: 2.21.0
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

CoSENTLoss

@online{kexuefm-8847,
    title={CoSENT: A more efficient sentence vector scheme than Sentence-BERT},
    author={Su Jianlin},
    year={2022},
    month={Jan},
    url={https://kexue.fm/archives/8847},
}