bobox's picture
Training in progress, step 160, checkpoint
596bcdc verified
metadata
base_model: microsoft/deberta-v3-small
datasets:
  - tals/vitaminc
language:
  - en
library_name: sentence-transformers
metrics:
  - pearson_cosine
  - spearman_cosine
  - pearson_manhattan
  - spearman_manhattan
  - pearson_euclidean
  - spearman_euclidean
  - pearson_dot
  - spearman_dot
  - pearson_max
  - spearman_max
  - cosine_accuracy
  - cosine_accuracy_threshold
  - cosine_f1
  - cosine_f1_threshold
  - cosine_precision
  - cosine_recall
  - cosine_ap
  - dot_accuracy
  - dot_accuracy_threshold
  - dot_f1
  - dot_f1_threshold
  - dot_precision
  - dot_recall
  - dot_ap
  - manhattan_accuracy
  - manhattan_accuracy_threshold
  - manhattan_f1
  - manhattan_f1_threshold
  - manhattan_precision
  - manhattan_recall
  - manhattan_ap
  - euclidean_accuracy
  - euclidean_accuracy_threshold
  - euclidean_f1
  - euclidean_f1_threshold
  - euclidean_precision
  - euclidean_recall
  - euclidean_ap
  - max_accuracy
  - max_accuracy_threshold
  - max_f1
  - max_f1_threshold
  - max_precision
  - max_recall
  - max_ap
pipeline_tag: sentence-similarity
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - generated_from_trainer
  - dataset_size:225247
  - loss:CachedGISTEmbedLoss
widget:
  - source_sentence: how long to grill boneless skinless chicken breasts in oven
    sentences:
      - "[ syll. a-ka-hi, ak-ahi ] The baby boy name Akahi is also used as a girl name. Its pronunciation is AA K AA HHiy â\x80\_. Akahi's origin, as well as its use, is in the Hawaiian language. The name's meaning is never before. Akahi is infrequently used as a baby name for boys."
      - >-
        October consists of 31 days. November has 30 days. When you add both
        together they have 61 days.
      - >-
        Heat a grill or grill pan. When the grill is hot, place the chicken on
        the grill and cook for about 4 minutes per side, or until cooked
        through. You can also bake the thawed chicken in a 375 degree F oven for
        15 minutes, or until cooked through.
  - source_sentence: >-
      More than 273 people have died from the 2019-20 coronavirus outside
      mainland China .
    sentences:
      - >-
        More than 3,700 people have died : around 3,100 in mainland China and
        around 550 in all other countries combined .
      - >-
        More than 3,200 people have died : almost 3,000 in mainland China and
        around 275 in other countries .
      - more than 4,900 deaths have been attributed to COVID-19 .
  - source_sentence: Most red algae species live in oceans.
    sentences:
      - Where do most red algae species live?
      - Which layer of the earth is molten?
      - >-
        As a diver descends, the increase in pressure causes the body’s air
        pockets in the ears and lungs to do what?
  - source_sentence: >-
      Binary compounds of carbon with less electronegative elements are called
      carbides.
    sentences:
      - What are four children born at one birth called?
      - >-
        Binary compounds of carbon with less electronegative elements are called
        what?
      - The water cycle involves movement of water between air and what?
  - source_sentence: What is the basic monetary unit of Iceland?
    sentences:
      - >-
        Ao dai - Vietnamese traditional dress - YouTube Ao dai - Vietnamese
        traditional dress Want to watch this again later? Sign in to add this
        video to a playlist. Need to report the video? Sign in to report
        inappropriate content. Rating is available when the video has been
        rented. This feature is not available right now. Please try again later.
        Uploaded on Jul 8, 2009 Simple, yet charming, graceful and elegant, áo
        dài was designed to praise the slender beauty of Vietnamese women. The
        dress is a genius combination of ancient and modern. It shows every
        curve on the girl's body, creating sexiness for the wearer, yet it still
        preserves the traditional feminine grace of Vietnamese women with its
        charming flowing flaps. The simplicity of áo dài makes it convenient and
        practical, something that other Asian traditional clothes lack. The
        waist-length slits of the flaps allow every movement of the legs:
        walking, running, riding a bicycle, climbing a tree, doing high kicks.
        The looseness of the pants allows comfortability. As a girl walks in áo
        dài, the movements of the flaps make it seem like she's not walking but
        floating in the air. This breath-taking beautiful image of a Vietnamese
        girl walking in áo dài has been an inspiration for generations of
        Vietnamese poets, novelists, artists and has left a deep impression for
        every foreigner who has visited the country. Category
      - >-
        Icelandic monetary unit - definition of Icelandic monetary unit by The
        Free Dictionary Icelandic monetary unit - definition of Icelandic
        monetary unit by The Free Dictionary
        http://www.thefreedictionary.com/Icelandic+monetary+unit Related to
        Icelandic monetary unit: Icelandic Old Krona ThesaurusAntonymsRelated
        WordsSynonymsLegend: monetary unit - a unit of money Icelandic krona ,
        krona - the basic unit of money in Iceland eyrir - 100 aurar equal 1
        krona in Iceland Want to thank TFD for its existence? Tell a friend
        about us , add a link to this page, or visit the webmaster's page for
        free fun content . Link to this page: Copyright © 2003-2017 Farlex, Inc
        Disclaimer All content on this website, including dictionary, thesaurus,
        literature, geography, and other reference data is for informational
        purposes only. This information should not be considered complete, up to
        date, and is not intended to be used in place of a visit, consultation,
        or advice of a legal, medical, or any other professional.
      - >-
        Food-Info.net : E-numbers : E140: Chlorophyll CI 75810, Natural Green 3,
        Chlorophyll A, Magnesium chlorophyll Origin: Natural green colour,
        present in all plants and algae. Commercially extracted from nettles,
        grass and alfalfa. Function & characteristics:
model-index:
  - name: SentenceTransformer based on microsoft/deberta-v3-small
    results:
      - task:
          type: semantic-similarity
          name: Semantic Similarity
        dataset:
          name: sts test
          type: sts-test
        metrics:
          - type: pearson_cosine
            value: 0.2853943019391156
            name: Pearson Cosine
          - type: spearman_cosine
            value: 0.31414239162305135
            name: Spearman Cosine
          - type: pearson_manhattan
            value: 0.3110310476615048
            name: Pearson Manhattan
          - type: spearman_manhattan
            value: 0.3366243060620438
            name: Spearman Manhattan
          - type: pearson_euclidean
            value: 0.29405773952219494
            name: Pearson Euclidean
          - type: spearman_euclidean
            value: 0.3141516551339523
            name: Spearman Euclidean
          - type: pearson_dot
            value: 0.28526334639473966
            name: Pearson Dot
          - type: spearman_dot
            value: 0.31380407209449446
            name: Spearman Dot
          - type: pearson_max
            value: 0.3110310476615048
            name: Pearson Max
          - type: spearman_max
            value: 0.3366243060620438
            name: Spearman Max
      - task:
          type: binary-classification
          name: Binary Classification
        dataset:
          name: allNLI dev
          type: allNLI-dev
        metrics:
          - type: cosine_accuracy
            value: 0.66796875
            name: Cosine Accuracy
          - type: cosine_accuracy_threshold
            value: 0.9767438173294067
            name: Cosine Accuracy Threshold
          - type: cosine_f1
            value: 0.5100182149362477
            name: Cosine F1
          - type: cosine_f1_threshold
            value: 0.8540960550308228
            name: Cosine F1 Threshold
          - type: cosine_precision
            value: 0.3723404255319149
            name: Cosine Precision
          - type: cosine_recall
            value: 0.8092485549132948
            name: Cosine Recall
          - type: cosine_ap
            value: 0.38624833037583434
            name: Cosine Ap
          - type: dot_accuracy
            value: 0.66796875
            name: Dot Accuracy
          - type: dot_accuracy_threshold
            value: 750.345458984375
            name: Dot Accuracy Threshold
          - type: dot_f1
            value: 0.5100182149362477
            name: Dot F1
          - type: dot_f1_threshold
            value: 656.0940551757812
            name: Dot F1 Threshold
          - type: dot_precision
            value: 0.3723404255319149
            name: Dot Precision
          - type: dot_recall
            value: 0.8092485549132948
            name: Dot Recall
          - type: dot_ap
            value: 0.3862261253421553
            name: Dot Ap
          - type: manhattan_accuracy
            value: 0.6640625
            name: Manhattan Accuracy
          - type: manhattan_accuracy_threshold
            value: 78.52637481689453
            name: Manhattan Accuracy Threshold
          - type: manhattan_f1
            value: 0.5062388591800357
            name: Manhattan F1
          - type: manhattan_f1_threshold
            value: 285.7745361328125
            name: Manhattan F1 Threshold
          - type: manhattan_precision
            value: 0.36597938144329895
            name: Manhattan Precision
          - type: manhattan_recall
            value: 0.8208092485549133
            name: Manhattan Recall
          - type: manhattan_ap
            value: 0.3898187083180651
            name: Manhattan Ap
          - type: euclidean_accuracy
            value: 0.66796875
            name: Euclidean Accuracy
          - type: euclidean_accuracy_threshold
            value: 5.977196216583252
            name: Euclidean Accuracy Threshold
          - type: euclidean_f1
            value: 0.5100182149362477
            name: Euclidean F1
          - type: euclidean_f1_threshold
            value: 14.971920013427734
            name: Euclidean F1 Threshold
          - type: euclidean_precision
            value: 0.3723404255319149
            name: Euclidean Precision
          - type: euclidean_recall
            value: 0.8092485549132948
            name: Euclidean Recall
          - type: euclidean_ap
            value: 0.38624380046547035
            name: Euclidean Ap
          - type: max_accuracy
            value: 0.66796875
            name: Max Accuracy
          - type: max_accuracy_threshold
            value: 750.345458984375
            name: Max Accuracy Threshold
          - type: max_f1
            value: 0.5100182149362477
            name: Max F1
          - type: max_f1_threshold
            value: 656.0940551757812
            name: Max F1 Threshold
          - type: max_precision
            value: 0.3723404255319149
            name: Max Precision
          - type: max_recall
            value: 0.8208092485549133
            name: Max Recall
          - type: max_ap
            value: 0.3898187083180651
            name: Max Ap
      - task:
          type: binary-classification
          name: Binary Classification
        dataset:
          name: Qnli dev
          type: Qnli-dev
        metrics:
          - type: cosine_accuracy
            value: 0.62890625
            name: Cosine Accuracy
          - type: cosine_accuracy_threshold
            value: 0.9045097827911377
            name: Cosine Accuracy Threshold
          - type: cosine_f1
            value: 0.6397415185783522
            name: Cosine F1
          - type: cosine_f1_threshold
            value: 0.8351442813873291
            name: Cosine F1 Threshold
          - type: cosine_precision
            value: 0.5169712793733682
            name: Cosine Precision
          - type: cosine_recall
            value: 0.8389830508474576
            name: Cosine Recall
          - type: cosine_ap
            value: 0.6193527955003784
            name: Cosine Ap
          - type: dot_accuracy
            value: 0.62890625
            name: Dot Accuracy
          - type: dot_accuracy_threshold
            value: 694.7778930664062
            name: Dot Accuracy Threshold
          - type: dot_f1
            value: 0.6397415185783522
            name: Dot F1
          - type: dot_f1_threshold
            value: 641.4969482421875
            name: Dot F1 Threshold
          - type: dot_precision
            value: 0.5169712793733682
            name: Dot Precision
          - type: dot_recall
            value: 0.8389830508474576
            name: Dot Recall
          - type: dot_ap
            value: 0.6194150916988216
            name: Dot Ap
          - type: manhattan_accuracy
            value: 0.646484375
            name: Manhattan Accuracy
          - type: manhattan_accuracy_threshold
            value: 245.2164306640625
            name: Manhattan Accuracy Threshold
          - type: manhattan_f1
            value: 0.6521060842433698
            name: Manhattan F1
          - type: manhattan_f1_threshold
            value: 303.317626953125
            name: Manhattan F1 Threshold
          - type: manhattan_precision
            value: 0.5160493827160494
            name: Manhattan Precision
          - type: manhattan_recall
            value: 0.885593220338983
            name: Manhattan Recall
          - type: manhattan_ap
            value: 0.6417015148414534
            name: Manhattan Ap
          - type: euclidean_accuracy
            value: 0.62890625
            name: Euclidean Accuracy
          - type: euclidean_accuracy_threshold
            value: 12.111844062805176
            name: Euclidean Accuracy Threshold
          - type: euclidean_f1
            value: 0.6397415185783522
            name: Euclidean F1
          - type: euclidean_f1_threshold
            value: 15.914146423339844
            name: Euclidean F1 Threshold
          - type: euclidean_precision
            value: 0.5169712793733682
            name: Euclidean Precision
          - type: euclidean_recall
            value: 0.8389830508474576
            name: Euclidean Recall
          - type: euclidean_ap
            value: 0.6193576186776235
            name: Euclidean Ap
          - type: max_accuracy
            value: 0.646484375
            name: Max Accuracy
          - type: max_accuracy_threshold
            value: 694.7778930664062
            name: Max Accuracy Threshold
          - type: max_f1
            value: 0.6521060842433698
            name: Max F1
          - type: max_f1_threshold
            value: 641.4969482421875
            name: Max F1 Threshold
          - type: max_precision
            value: 0.5169712793733682
            name: Max Precision
          - type: max_recall
            value: 0.885593220338983
            name: Max Recall
          - type: max_ap
            value: 0.6417015148414534
            name: Max Ap

SentenceTransformer based on microsoft/deberta-v3-small

This is a sentence-transformers model finetuned from microsoft/deberta-v3-small. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: microsoft/deberta-v3-small
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 768 tokens
  • Similarity Function: Cosine Similarity
  • Language: en

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: DebertaV2Model 
  (1): AdvancedWeightedPooling(
    (linear_cls): Linear(in_features=768, out_features=768, bias=True)
    (mha): MultiheadAttention(
      (out_proj): NonDynamicallyQuantizableLinear(in_features=768, out_features=768, bias=True)
    )
    (layernorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
    (layernorm2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
  )
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("bobox/DeBERTa3-s-CustomPooling-test1-checkpoints-tmp")
# Run inference
sentences = [
    'What is the basic monetary unit of Iceland?',
    "Icelandic monetary unit - definition of Icelandic monetary unit by The Free Dictionary Icelandic monetary unit - definition of Icelandic monetary unit by The Free Dictionary http://www.thefreedictionary.com/Icelandic+monetary+unit Related to Icelandic monetary unit: Icelandic Old Krona ThesaurusAntonymsRelated WordsSynonymsLegend: monetary unit - a unit of money Icelandic krona , krona - the basic unit of money in Iceland eyrir - 100 aurar equal 1 krona in Iceland Want to thank TFD for its existence? Tell a friend about us , add a link to this page, or visit the webmaster's page for free fun content . Link to this page: Copyright © 2003-2017 Farlex, Inc Disclaimer All content on this website, including dictionary, thesaurus, literature, geography, and other reference data is for informational purposes only. This information should not be considered complete, up to date, and is not intended to be used in place of a visit, consultation, or advice of a legal, medical, or any other professional.",
    'Food-Info.net : E-numbers : E140: Chlorophyll CI 75810, Natural Green 3, Chlorophyll A, Magnesium chlorophyll Origin: Natural green colour, present in all plants and algae. Commercially extracted from nettles, grass and alfalfa. Function & characteristics:',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Semantic Similarity

Metric Value
pearson_cosine 0.2854
spearman_cosine 0.3141
pearson_manhattan 0.311
spearman_manhattan 0.3366
pearson_euclidean 0.2941
spearman_euclidean 0.3142
pearson_dot 0.2853
spearman_dot 0.3138
pearson_max 0.311
spearman_max 0.3366

Binary Classification

Metric Value
cosine_accuracy 0.668
cosine_accuracy_threshold 0.9767
cosine_f1 0.51
cosine_f1_threshold 0.8541
cosine_precision 0.3723
cosine_recall 0.8092
cosine_ap 0.3862
dot_accuracy 0.668
dot_accuracy_threshold 750.3455
dot_f1 0.51
dot_f1_threshold 656.0941
dot_precision 0.3723
dot_recall 0.8092
dot_ap 0.3862
manhattan_accuracy 0.6641
manhattan_accuracy_threshold 78.5264
manhattan_f1 0.5062
manhattan_f1_threshold 285.7745
manhattan_precision 0.366
manhattan_recall 0.8208
manhattan_ap 0.3898
euclidean_accuracy 0.668
euclidean_accuracy_threshold 5.9772
euclidean_f1 0.51
euclidean_f1_threshold 14.9719
euclidean_precision 0.3723
euclidean_recall 0.8092
euclidean_ap 0.3862
max_accuracy 0.668
max_accuracy_threshold 750.3455
max_f1 0.51
max_f1_threshold 656.0941
max_precision 0.3723
max_recall 0.8208
max_ap 0.3898

Binary Classification

Metric Value
cosine_accuracy 0.6289
cosine_accuracy_threshold 0.9045
cosine_f1 0.6397
cosine_f1_threshold 0.8351
cosine_precision 0.517
cosine_recall 0.839
cosine_ap 0.6194
dot_accuracy 0.6289
dot_accuracy_threshold 694.7779
dot_f1 0.6397
dot_f1_threshold 641.4969
dot_precision 0.517
dot_recall 0.839
dot_ap 0.6194
manhattan_accuracy 0.6465
manhattan_accuracy_threshold 245.2164
manhattan_f1 0.6521
manhattan_f1_threshold 303.3176
manhattan_precision 0.516
manhattan_recall 0.8856
manhattan_ap 0.6417
euclidean_accuracy 0.6289
euclidean_accuracy_threshold 12.1118
euclidean_f1 0.6397
euclidean_f1_threshold 15.9141
euclidean_precision 0.517
euclidean_recall 0.839
euclidean_ap 0.6194
max_accuracy 0.6465
max_accuracy_threshold 694.7779
max_f1 0.6521
max_f1_threshold 641.4969
max_precision 0.517
max_recall 0.8856
max_ap 0.6417

Training Details

Evaluation Dataset

vitaminc-pairs

  • Dataset: vitaminc-pairs at be6febb
  • Size: 128 evaluation samples
  • Columns: claim and evidence
  • Approximate statistics based on the first 128 samples:
    claim evidence
    type string string
    details
    • min: 9 tokens
    • mean: 21.42 tokens
    • max: 41 tokens
    • min: 11 tokens
    • mean: 35.55 tokens
    • max: 79 tokens
  • Samples:
    claim evidence
    Dragon Con had over 5000 guests . Among the more than 6000 guests and musical performers at the 2009 convention were such notables as Patrick Stewart , William Shatner , Leonard Nimoy , Terry Gilliam , Bruce Boxleitner , James Marsters , and Mary McDonnell .
    COVID-19 has reached more than 185 countries . As of , more than cases of COVID-19 have been reported in more than 190 countries and 200 territories , resulting in more than deaths .
    In March , Italy had 3.6x times more cases of coronavirus than China . As of 12 March , among nations with at least one million citizens , Italy has the world 's highest per capita rate of positive coronavirus cases at 206.1 cases per million people ( 3.6x times the rate of China ) and is the country with the second-highest number of positive cases as well as of deaths in the world , after China .
  • Loss: CachedGISTEmbedLoss with these parameters:
    {'guide': SentenceTransformer(
      (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
      (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
      (2): Normalize()
    ), 'temperature': 0.025}
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 42
  • per_device_eval_batch_size: 128
  • gradient_accumulation_steps: 2
  • learning_rate: 3e-05
  • weight_decay: 0.001
  • lr_scheduler_type: cosine_with_min_lr
  • lr_scheduler_kwargs: {'num_cycles': 0.5, 'min_lr': 1e-05}
  • warmup_ratio: 0.25
  • save_safetensors: False
  • fp16: True
  • push_to_hub: True
  • hub_model_id: bobox/DeBERTa3-s-CustomPooling-test1-checkpoints-tmp
  • hub_strategy: all_checkpoints
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 42
  • per_device_eval_batch_size: 128
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 2
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 3e-05
  • weight_decay: 0.001
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 3
  • max_steps: -1
  • lr_scheduler_type: cosine_with_min_lr
  • lr_scheduler_kwargs: {'num_cycles': 0.5, 'min_lr': 1e-05}
  • warmup_ratio: 0.25
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: False
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: True
  • resume_from_checkpoint: None
  • hub_model_id: bobox/DeBERTa3-s-CustomPooling-test1-checkpoints-tmp
  • hub_strategy: all_checkpoints
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Click to expand
Epoch Step Training Loss vitaminc-pairs loss negation-triplets loss scitail-pairs-pos loss scitail-pairs-qa loss xsum-pairs loss sciq pairs loss qasc pairs loss openbookqa pairs loss msmarco pairs loss nq pairs loss trivia pairs loss gooaq pairs loss paws-pos loss global dataset loss sts-test_spearman_cosine allNLI-dev_max_ap Qnli-dev_max_ap
0.0009 1 5.8564 - - - - - - - - - - - - - - - - -
0.0018 2 7.1716 - - - - - - - - - - - - - - - - -
0.0027 3 5.9095 - - - - - - - - - - - - - - - - -
0.0035 4 5.0841 - - - - - - - - - - - - - - - - -
0.0044 5 4.0184 - - - - - - - - - - - - - - - - -
0.0053 6 6.2191 - - - - - - - - - - - - - - - - -
0.0062 7 5.6124 - - - - - - - - - - - - - - - - -
0.0071 8 3.9544 - - - - - - - - - - - - - - - - -
0.0080 9 4.7149 - - - - - - - - - - - - - - - - -
0.0088 10 4.9616 - - - - - - - - - - - - - - - - -
0.0097 11 5.2794 - - - - - - - - - - - - - - - - -
0.0106 12 8.8704 - - - - - - - - - - - - - - - - -
0.0115 13 6.0707 - - - - - - - - - - - - - - - - -
0.0124 14 5.4071 - - - - - - - - - - - - - - - - -
0.0133 15 6.9104 - - - - - - - - - - - - - - - - -
0.0142 16 6.0276 - - - - - - - - - - - - - - - - -
0.0150 17 6.737 - - - - - - - - - - - - - - - - -
0.0159 18 6.5354 - - - - - - - - - - - - - - - - -
0.0168 19 5.206 - - - - - - - - - - - - - - - - -
0.0177 20 5.2469 - - - - - - - - - - - - - - - - -
0.0186 21 5.3771 - - - - - - - - - - - - - - - - -
0.0195 22 4.979 - - - - - - - - - - - - - - - - -
0.0204 23 4.7909 - - - - - - - - - - - - - - - - -
0.0212 24 4.9086 - - - - - - - - - - - - - - - - -
0.0221 25 4.8826 - - - - - - - - - - - - - - - - -
0.0230 26 8.2266 - - - - - - - - - - - - - - - - -
0.0239 27 8.3024 - - - - - - - - - - - - - - - - -
0.0248 28 5.8745 - - - - - - - - - - - - - - - - -
0.0257 29 4.7298 - - - - - - - - - - - - - - - - -
0.0265 30 5.4614 - - - - - - - - - - - - - - - - -
0.0274 31 5.8594 - - - - - - - - - - - - - - - - -
0.0283 32 5.2401 - - - - - - - - - - - - - - - - -
0.0292 33 5.1579 - - - - - - - - - - - - - - - - -
0.0301 34 5.2181 - - - - - - - - - - - - - - - - -
0.0310 35 4.6328 - - - - - - - - - - - - - - - - -
0.0319 36 2.121 - - - - - - - - - - - - - - - - -
0.0327 37 5.9026 - - - - - - - - - - - - - - - - -
0.0336 38 7.3796 - - - - - - - - - - - - - - - - -
0.0345 39 5.5361 - - - - - - - - - - - - - - - - -
0.0354 40 4.0243 2.9018 5.6903 2.1136 2.8052 6.5831 0.8882 4.1148 5.0966 10.3911 10.9032 7.1904 8.1935 1.3943 5.6716 0.1879 0.3385 0.5781
0.0363 41 4.9072 - - - - - - - - - - - - - - - - -
0.0372 42 3.4439 - - - - - - - - - - - - - - - - -
0.0381 43 4.9787 - - - - - - - - - - - - - - - - -
0.0389 44 5.8318 - - - - - - - - - - - - - - - - -
0.0398 45 5.3226 - - - - - - - - - - - - - - - - -
0.0407 46 5.1181 - - - - - - - - - - - - - - - - -
0.0416 47 4.7834 - - - - - - - - - - - - - - - - -
0.0425 48 6.6303 - - - - - - - - - - - - - - - - -
0.0434 49 5.8171 - - - - - - - - - - - - - - - - -
0.0442 50 5.1962 - - - - - - - - - - - - - - - - -
0.0451 51 5.2096 - - - - - - - - - - - - - - - - -
0.0460 52 5.0943 - - - - - - - - - - - - - - - - -
0.0469 53 4.9038 - - - - - - - - - - - - - - - - -
0.0478 54 4.6479 - - - - - - - - - - - - - - - - -
0.0487 55 5.5098 - - - - - - - - - - - - - - - - -
0.0496 56 4.6979 - - - - - - - - - - - - - - - - -
0.0504 57 3.1969 - - - - - - - - - - - - - - - - -
0.0513 58 4.4127 - - - - - - - - - - - - - - - - -
0.0522 59 3.7746 - - - - - - - - - - - - - - - - -
0.0531 60 4.5378 - - - - - - - - - - - - - - - - -
0.0540 61 5.0209 - - - - - - - - - - - - - - - - -
0.0549 62 6.5936 - - - - - - - - - - - - - - - - -
0.0558 63 4.2315 - - - - - - - - - - - - - - - - -
0.0566 64 6.4269 - - - - - - - - - - - - - - - - -
0.0575 65 4.2644 - - - - - - - - - - - - - - - - -
0.0584 66 5.1388 - - - - - - - - - - - - - - - - -
0.0593 67 5.1852 - - - - - - - - - - - - - - - - -
0.0602 68 4.8057 - - - - - - - - - - - - - - - - -
0.0611 69 3.1725 - - - - - - - - - - - - - - - - -
0.0619 70 3.3322 - - - - - - - - - - - - - - - - -
0.0628 71 5.139 - - - - - - - - - - - - - - - - -
0.0637 72 4.307 - - - - - - - - - - - - - - - - -
0.0646 73 5.0133 - - - - - - - - - - - - - - - - -
0.0655 74 4.0507 - - - - - - - - - - - - - - - - -
0.0664 75 3.3895 - - - - - - - - - - - - - - - - -
0.0673 76 5.6736 - - - - - - - - - - - - - - - - -
0.0681 77 4.2572 - - - - - - - - - - - - - - - - -
0.0690 78 3.0796 - - - - - - - - - - - - - - - - -
0.0699 79 5.0199 - - - - - - - - - - - - - - - - -
0.0708 80 4.1414 2.7794 4.8890 1.8997 2.6761 6.2096 0.7622 3.3129 4.5498 7.2056 7.6809 6.3792 6.6567 1.3848 5.0030 0.2480 0.3513 0.5898
0.0717 81 5.8604 - - - - - - - - - - - - - - - - -
0.0726 82 4.3003 - - - - - - - - - - - - - - - - -
0.0735 83 4.4568 - - - - - - - - - - - - - - - - -
0.0743 84 4.2747 - - - - - - - - - - - - - - - - -
0.0752 85 5.52 - - - - - - - - - - - - - - - - -
0.0761 86 2.7767 - - - - - - - - - - - - - - - - -
0.0770 87 4.397 - - - - - - - - - - - - - - - - -
0.0779 88 5.4449 - - - - - - - - - - - - - - - - -
0.0788 89 4.2706 - - - - - - - - - - - - - - - - -
0.0796 90 6.4759 - - - - - - - - - - - - - - - - -
0.0805 91 4.1951 - - - - - - - - - - - - - - - - -
0.0814 92 4.6328 - - - - - - - - - - - - - - - - -
0.0823 93 4.1278 - - - - - - - - - - - - - - - - -
0.0832 94 4.1787 - - - - - - - - - - - - - - - - -
0.0841 95 5.2156 - - - - - - - - - - - - - - - - -
0.0850 96 3.1403 - - - - - - - - - - - - - - - - -
0.0858 97 4.0273 - - - - - - - - - - - - - - - - -
0.0867 98 3.0624 - - - - - - - - - - - - - - - - -
0.0876 99 4.6786 - - - - - - - - - - - - - - - - -
0.0885 100 4.1505 - - - - - - - - - - - - - - - - -
0.0894 101 2.9529 - - - - - - - - - - - - - - - - -
0.0903 102 4.7048 - - - - - - - - - - - - - - - - -
0.0912 103 4.7388 - - - - - - - - - - - - - - - - -
0.0920 104 3.7879 - - - - - - - - - - - - - - - - -
0.0929 105 4.0311 - - - - - - - - - - - - - - - - -
0.0938 106 4.1314 - - - - - - - - - - - - - - - - -
0.0947 107 4.9411 - - - - - - - - - - - - - - - - -
0.0956 108 4.1118 - - - - - - - - - - - - - - - - -
0.0965 109 3.6971 - - - - - - - - - - - - - - - - -
0.0973 110 5.605 - - - - - - - - - - - - - - - - -
0.0982 111 3.4563 - - - - - - - - - - - - - - - - -
0.0991 112 3.7422 - - - - - - - - - - - - - - - - -
0.1 113 3.8055 - - - - - - - - - - - - - - - - -
0.1009 114 5.2369 - - - - - - - - - - - - - - - - -
0.1018 115 5.6518 - - - - - - - - - - - - - - - - -
0.1027 116 3.2906 - - - - - - - - - - - - - - - - -
0.1035 117 3.4996 - - - - - - - - - - - - - - - - -
0.1044 118 3.6283 - - - - - - - - - - - - - - - - -
0.1053 119 4.1487 - - - - - - - - - - - - - - - - -
0.1062 120 4.3996 2.7279 4.3946 1.4130 2.1150 6.0486 0.7172 2.9669 4.4180 6.3022 6.8412 6.2013 6.0982 0.9474 4.3852 0.3149 0.3693 0.5975
0.1071 121 3.5291 - - - - - - - - - - - - - - - - -
0.1080 122 3.8232 - - - - - - - - - - - - - - - - -
0.1088 123 4.6035 - - - - - - - - - - - - - - - - -
0.1097 124 3.7607 - - - - - - - - - - - - - - - - -
0.1106 125 3.8461 - - - - - - - - - - - - - - - - -
0.1115 126 3.3413 - - - - - - - - - - - - - - - - -
0.1124 127 4.2777 - - - - - - - - - - - - - - - - -
0.1133 128 4.3597 - - - - - - - - - - - - - - - - -
0.1142 129 3.9046 - - - - - - - - - - - - - - - - -
0.1150 130 4.0527 - - - - - - - - - - - - - - - - -
0.1159 131 5.0883 - - - - - - - - - - - - - - - - -
0.1168 132 3.8308 - - - - - - - - - - - - - - - - -
0.1177 133 3.572 - - - - - - - - - - - - - - - - -
0.1186 134 3.4299 - - - - - - - - - - - - - - - - -
0.1195 135 4.1541 - - - - - - - - - - - - - - - - -
0.1204 136 3.584 - - - - - - - - - - - - - - - - -
0.1212 137 5.0977 - - - - - - - - - - - - - - - - -
0.1221 138 4.6769 - - - - - - - - - - - - - - - - -
0.1230 139 3.8396 - - - - - - - - - - - - - - - - -
0.1239 140 3.2875 - - - - - - - - - - - - - - - - -
0.1248 141 4.1946 - - - - - - - - - - - - - - - - -
0.1257 142 4.9602 - - - - - - - - - - - - - - - - -
0.1265 143 4.1531 - - - - - - - - - - - - - - - - -
0.1274 144 3.8351 - - - - - - - - - - - - - - - - -
0.1283 145 3.112 - - - - - - - - - - - - - - - - -
0.1292 146 2.3145 - - - - - - - - - - - - - - - - -
0.1301 147 4.0989 - - - - - - - - - - - - - - - - -
0.1310 148 3.2173 - - - - - - - - - - - - - - - - -
0.1319 149 2.7913 - - - - - - - - - - - - - - - - -
0.1327 150 3.7627 - - - - - - - - - - - - - - - - -
0.1336 151 3.3669 - - - - - - - - - - - - - - - - -
0.1345 152 2.6775 - - - - - - - - - - - - - - - - -
0.1354 153 3.2804 - - - - - - - - - - - - - - - - -
0.1363 154 3.0676 - - - - - - - - - - - - - - - - -
0.1372 155 3.1559 - - - - - - - - - - - - - - - - -
0.1381 156 2.6638 - - - - - - - - - - - - - - - - -
0.1389 157 2.8045 - - - - - - - - - - - - - - - - -
0.1398 158 4.0568 - - - - - - - - - - - - - - - - -
0.1407 159 2.7554 - - - - - - - - - - - - - - - - -
0.1416 160 3.7407 2.7439 4.6364 1.0089 1.1229 5.4870 0.6284 2.5933 4.3943 5.6565 5.9870 5.6944 5.3857 0.3622 3.4011 0.3141 0.3898 0.6417

Framework Versions

  • Python: 3.10.14
  • Sentence Transformers: 3.2.0
  • Transformers: 4.45.1
  • PyTorch: 2.4.0
  • Accelerate: 0.34.2
  • Datasets: 3.0.1
  • Tokenizers: 0.20.0

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}