bobox's picture
Training in progress, step 80, checkpoint
f2432b0 verified
metadata
base_model: microsoft/deberta-v3-small
datasets:
  - tals/vitaminc
language:
  - en
library_name: sentence-transformers
metrics:
  - pearson_cosine
  - spearman_cosine
  - pearson_manhattan
  - spearman_manhattan
  - pearson_euclidean
  - spearman_euclidean
  - pearson_dot
  - spearman_dot
  - pearson_max
  - spearman_max
  - cosine_accuracy
  - cosine_accuracy_threshold
  - cosine_f1
  - cosine_f1_threshold
  - cosine_precision
  - cosine_recall
  - cosine_ap
  - dot_accuracy
  - dot_accuracy_threshold
  - dot_f1
  - dot_f1_threshold
  - dot_precision
  - dot_recall
  - dot_ap
  - manhattan_accuracy
  - manhattan_accuracy_threshold
  - manhattan_f1
  - manhattan_f1_threshold
  - manhattan_precision
  - manhattan_recall
  - manhattan_ap
  - euclidean_accuracy
  - euclidean_accuracy_threshold
  - euclidean_f1
  - euclidean_f1_threshold
  - euclidean_precision
  - euclidean_recall
  - euclidean_ap
  - max_accuracy
  - max_accuracy_threshold
  - max_f1
  - max_f1_threshold
  - max_precision
  - max_recall
  - max_ap
pipeline_tag: sentence-similarity
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - generated_from_trainer
  - dataset_size:225247
  - loss:CachedGISTEmbedLoss
widget:
  - source_sentence: how long to grill boneless skinless chicken breasts in oven
    sentences:
      - "[ syll. a-ka-hi, ak-ahi ] The baby boy name Akahi is also used as a girl name. Its pronunciation is AA K AA HHiy â\x80\_. Akahi's origin, as well as its use, is in the Hawaiian language. The name's meaning is never before. Akahi is infrequently used as a baby name for boys."
      - >-
        October consists of 31 days. November has 30 days. When you add both
        together they have 61 days.
      - >-
        Heat a grill or grill pan. When the grill is hot, place the chicken on
        the grill and cook for about 4 minutes per side, or until cooked
        through. You can also bake the thawed chicken in a 375 degree F oven for
        15 minutes, or until cooked through.
  - source_sentence: >-
      More than 273 people have died from the 2019-20 coronavirus outside
      mainland China .
    sentences:
      - >-
        More than 3,700 people have died : around 3,100 in mainland China and
        around 550 in all other countries combined .
      - >-
        More than 3,200 people have died : almost 3,000 in mainland China and
        around 275 in other countries .
      - more than 4,900 deaths have been attributed to COVID-19 .
  - source_sentence: Most red algae species live in oceans.
    sentences:
      - Where do most red algae species live?
      - Which layer of the earth is molten?
      - >-
        As a diver descends, the increase in pressure causes the body’s air
        pockets in the ears and lungs to do what?
  - source_sentence: >-
      Binary compounds of carbon with less electronegative elements are called
      carbides.
    sentences:
      - What are four children born at one birth called?
      - >-
        Binary compounds of carbon with less electronegative elements are called
        what?
      - The water cycle involves movement of water between air and what?
  - source_sentence: What is the basic monetary unit of Iceland?
    sentences:
      - >-
        Ao dai - Vietnamese traditional dress - YouTube Ao dai - Vietnamese
        traditional dress Want to watch this again later? Sign in to add this
        video to a playlist. Need to report the video? Sign in to report
        inappropriate content. Rating is available when the video has been
        rented. This feature is not available right now. Please try again later.
        Uploaded on Jul 8, 2009 Simple, yet charming, graceful and elegant, áo
        dài was designed to praise the slender beauty of Vietnamese women. The
        dress is a genius combination of ancient and modern. It shows every
        curve on the girl's body, creating sexiness for the wearer, yet it still
        preserves the traditional feminine grace of Vietnamese women with its
        charming flowing flaps. The simplicity of áo dài makes it convenient and
        practical, something that other Asian traditional clothes lack. The
        waist-length slits of the flaps allow every movement of the legs:
        walking, running, riding a bicycle, climbing a tree, doing high kicks.
        The looseness of the pants allows comfortability. As a girl walks in áo
        dài, the movements of the flaps make it seem like she's not walking but
        floating in the air. This breath-taking beautiful image of a Vietnamese
        girl walking in áo dài has been an inspiration for generations of
        Vietnamese poets, novelists, artists and has left a deep impression for
        every foreigner who has visited the country. Category
      - >-
        Icelandic monetary unit - definition of Icelandic monetary unit by The
        Free Dictionary Icelandic monetary unit - definition of Icelandic
        monetary unit by The Free Dictionary
        http://www.thefreedictionary.com/Icelandic+monetary+unit Related to
        Icelandic monetary unit: Icelandic Old Krona ThesaurusAntonymsRelated
        WordsSynonymsLegend: monetary unit - a unit of money Icelandic krona ,
        krona - the basic unit of money in Iceland eyrir - 100 aurar equal 1
        krona in Iceland Want to thank TFD for its existence? Tell a friend
        about us , add a link to this page, or visit the webmaster's page for
        free fun content . Link to this page: Copyright © 2003-2017 Farlex, Inc
        Disclaimer All content on this website, including dictionary, thesaurus,
        literature, geography, and other reference data is for informational
        purposes only. This information should not be considered complete, up to
        date, and is not intended to be used in place of a visit, consultation,
        or advice of a legal, medical, or any other professional.
      - >-
        Food-Info.net : E-numbers : E140: Chlorophyll CI 75810, Natural Green 3,
        Chlorophyll A, Magnesium chlorophyll Origin: Natural green colour,
        present in all plants and algae. Commercially extracted from nettles,
        grass and alfalfa. Function & characteristics:
model-index:
  - name: SentenceTransformer based on microsoft/deberta-v3-small
    results:
      - task:
          type: semantic-similarity
          name: Semantic Similarity
        dataset:
          name: sts test
          type: sts-test
        metrics:
          - type: pearson_cosine
            value: 0.22248205020578934
            name: Pearson Cosine
          - type: spearman_cosine
            value: 0.24802235964390085
            name: Spearman Cosine
          - type: pearson_manhattan
            value: 0.26632593273308647
            name: Pearson Manhattan
          - type: spearman_manhattan
            value: 0.2843623073856928
            name: Spearman Manhattan
          - type: pearson_euclidean
            value: 0.2323160413842197
            name: Pearson Euclidean
          - type: spearman_euclidean
            value: 0.24799036249272113
            name: Spearman Euclidean
          - type: pearson_dot
            value: 0.22239084967931927
            name: Pearson Dot
          - type: spearman_dot
            value: 0.24791612015173234
            name: Spearman Dot
          - type: pearson_max
            value: 0.26632593273308647
            name: Pearson Max
          - type: spearman_max
            value: 0.2843623073856928
            name: Spearman Max
      - task:
          type: binary-classification
          name: Binary Classification
        dataset:
          name: allNLI dev
          type: allNLI-dev
        metrics:
          - type: cosine_accuracy
            value: 0.666015625
            name: Cosine Accuracy
          - type: cosine_accuracy_threshold
            value: 0.983686089515686
            name: Cosine Accuracy Threshold
          - type: cosine_f1
            value: 0.5065885797950219
            name: Cosine F1
          - type: cosine_f1_threshold
            value: 0.7642872333526611
            name: Cosine F1 Threshold
          - type: cosine_precision
            value: 0.3392156862745098
            name: Cosine Precision
          - type: cosine_recall
            value: 1
            name: Cosine Recall
          - type: cosine_ap
            value: 0.34411819659341086
            name: Cosine Ap
          - type: dot_accuracy
            value: 0.666015625
            name: Dot Accuracy
          - type: dot_accuracy_threshold
            value: 755.60302734375
            name: Dot Accuracy Threshold
          - type: dot_f1
            value: 0.5065885797950219
            name: Dot F1
          - type: dot_f1_threshold
            value: 587.0625
            name: Dot F1 Threshold
          - type: dot_precision
            value: 0.3392156862745098
            name: Dot Precision
          - type: dot_recall
            value: 1
            name: Dot Recall
          - type: dot_ap
            value: 0.344109544232086
            name: Dot Ap
          - type: manhattan_accuracy
            value: 0.6640625
            name: Manhattan Accuracy
          - type: manhattan_accuracy_threshold
            value: 62.69102096557617
            name: Manhattan Accuracy Threshold
          - type: manhattan_f1
            value: 0.5058479532163743
            name: Manhattan F1
          - type: manhattan_f1_threshold
            value: 337.6861877441406
            name: Manhattan F1 Threshold
          - type: manhattan_precision
            value: 0.3385518590998043
            name: Manhattan Precision
          - type: manhattan_recall
            value: 1
            name: Manhattan Recall
          - type: manhattan_ap
            value: 0.35131239981425566
            name: Manhattan Ap
          - type: euclidean_accuracy
            value: 0.666015625
            name: Euclidean Accuracy
          - type: euclidean_accuracy_threshold
            value: 5.00581693649292
            name: Euclidean Accuracy Threshold
          - type: euclidean_f1
            value: 0.5065885797950219
            name: Euclidean F1
          - type: euclidean_f1_threshold
            value: 19.022436141967773
            name: Euclidean F1 Threshold
          - type: euclidean_precision
            value: 0.3392156862745098
            name: Euclidean Precision
          - type: euclidean_recall
            value: 1
            name: Euclidean Recall
          - type: euclidean_ap
            value: 0.3441246898925644
            name: Euclidean Ap
          - type: max_accuracy
            value: 0.666015625
            name: Max Accuracy
          - type: max_accuracy_threshold
            value: 755.60302734375
            name: Max Accuracy Threshold
          - type: max_f1
            value: 0.5065885797950219
            name: Max F1
          - type: max_f1_threshold
            value: 587.0625
            name: Max F1 Threshold
          - type: max_precision
            value: 0.3392156862745098
            name: Max Precision
          - type: max_recall
            value: 1
            name: Max Recall
          - type: max_ap
            value: 0.35131239981425566
            name: Max Ap
      - task:
          type: binary-classification
          name: Binary Classification
        dataset:
          name: Qnli dev
          type: Qnli-dev
        metrics:
          - type: cosine_accuracy
            value: 0.591796875
            name: Cosine Accuracy
          - type: cosine_accuracy_threshold
            value: 0.9258557558059692
            name: Cosine Accuracy Threshold
          - type: cosine_f1
            value: 0.6291834002677376
            name: Cosine F1
          - type: cosine_f1_threshold
            value: 0.750666618347168
            name: Cosine F1 Threshold
          - type: cosine_precision
            value: 0.4598825831702544
            name: Cosine Precision
          - type: cosine_recall
            value: 0.9957627118644068
            name: Cosine Recall
          - type: cosine_ap
            value: 0.5585355274462735
            name: Cosine Ap
          - type: dot_accuracy
            value: 0.591796875
            name: Dot Accuracy
          - type: dot_accuracy_threshold
            value: 711.18359375
            name: Dot Accuracy Threshold
          - type: dot_f1
            value: 0.6291834002677376
            name: Dot F1
          - type: dot_f1_threshold
            value: 576.5970458984375
            name: Dot F1 Threshold
          - type: dot_precision
            value: 0.4598825831702544
            name: Dot Precision
          - type: dot_recall
            value: 0.9957627118644068
            name: Dot Recall
          - type: dot_ap
            value: 0.5585297234749824
            name: Dot Ap
          - type: manhattan_accuracy
            value: 0.619140625
            name: Manhattan Accuracy
          - type: manhattan_accuracy_threshold
            value: 188.09068298339844
            name: Manhattan Accuracy Threshold
          - type: manhattan_f1
            value: 0.6301775147928994
            name: Manhattan F1
          - type: manhattan_f1_threshold
            value: 237.80462646484375
            name: Manhattan F1 Threshold
          - type: manhattan_precision
            value: 0.48409090909090907
            name: Manhattan Precision
          - type: manhattan_recall
            value: 0.902542372881356
            name: Manhattan Recall
          - type: manhattan_ap
            value: 0.5898283705050701
            name: Manhattan Ap
          - type: euclidean_accuracy
            value: 0.591796875
            name: Euclidean Accuracy
          - type: euclidean_accuracy_threshold
            value: 10.672666549682617
            name: Euclidean Accuracy Threshold
          - type: euclidean_f1
            value: 0.6291834002677376
            name: Euclidean F1
          - type: euclidean_f1_threshold
            value: 19.553747177124023
            name: Euclidean F1 Threshold
          - type: euclidean_precision
            value: 0.4598825831702544
            name: Euclidean Precision
          - type: euclidean_recall
            value: 0.9957627118644068
            name: Euclidean Recall
          - type: euclidean_ap
            value: 0.5585355274462735
            name: Euclidean Ap
          - type: max_accuracy
            value: 0.619140625
            name: Max Accuracy
          - type: max_accuracy_threshold
            value: 711.18359375
            name: Max Accuracy Threshold
          - type: max_f1
            value: 0.6301775147928994
            name: Max F1
          - type: max_f1_threshold
            value: 576.5970458984375
            name: Max F1 Threshold
          - type: max_precision
            value: 0.48409090909090907
            name: Max Precision
          - type: max_recall
            value: 0.9957627118644068
            name: Max Recall
          - type: max_ap
            value: 0.5898283705050701
            name: Max Ap

SentenceTransformer based on microsoft/deberta-v3-small

This is a sentence-transformers model finetuned from microsoft/deberta-v3-small. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: microsoft/deberta-v3-small
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 768 tokens
  • Similarity Function: Cosine Similarity
  • Language: en

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: DebertaV2Model 
  (1): AdvancedWeightedPooling(
    (linear_cls): Linear(in_features=768, out_features=768, bias=True)
    (mha): MultiheadAttention(
      (out_proj): NonDynamicallyQuantizableLinear(in_features=768, out_features=768, bias=True)
    )
    (layernorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
    (layernorm2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
  )
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("bobox/DeBERTa3-s-CustomPooling-test1-checkpoints-tmp")
# Run inference
sentences = [
    'What is the basic monetary unit of Iceland?',
    "Icelandic monetary unit - definition of Icelandic monetary unit by The Free Dictionary Icelandic monetary unit - definition of Icelandic monetary unit by The Free Dictionary http://www.thefreedictionary.com/Icelandic+monetary+unit Related to Icelandic monetary unit: Icelandic Old Krona ThesaurusAntonymsRelated WordsSynonymsLegend: monetary unit - a unit of money Icelandic krona , krona - the basic unit of money in Iceland eyrir - 100 aurar equal 1 krona in Iceland Want to thank TFD for its existence? Tell a friend about us , add a link to this page, or visit the webmaster's page for free fun content . Link to this page: Copyright © 2003-2017 Farlex, Inc Disclaimer All content on this website, including dictionary, thesaurus, literature, geography, and other reference data is for informational purposes only. This information should not be considered complete, up to date, and is not intended to be used in place of a visit, consultation, or advice of a legal, medical, or any other professional.",
    'Food-Info.net : E-numbers : E140: Chlorophyll CI 75810, Natural Green 3, Chlorophyll A, Magnesium chlorophyll Origin: Natural green colour, present in all plants and algae. Commercially extracted from nettles, grass and alfalfa. Function & characteristics:',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Semantic Similarity

Metric Value
pearson_cosine 0.2225
spearman_cosine 0.248
pearson_manhattan 0.2663
spearman_manhattan 0.2844
pearson_euclidean 0.2323
spearman_euclidean 0.248
pearson_dot 0.2224
spearman_dot 0.2479
pearson_max 0.2663
spearman_max 0.2844

Binary Classification

Metric Value
cosine_accuracy 0.666
cosine_accuracy_threshold 0.9837
cosine_f1 0.5066
cosine_f1_threshold 0.7643
cosine_precision 0.3392
cosine_recall 1.0
cosine_ap 0.3441
dot_accuracy 0.666
dot_accuracy_threshold 755.603
dot_f1 0.5066
dot_f1_threshold 587.0625
dot_precision 0.3392
dot_recall 1.0
dot_ap 0.3441
manhattan_accuracy 0.6641
manhattan_accuracy_threshold 62.691
manhattan_f1 0.5058
manhattan_f1_threshold 337.6862
manhattan_precision 0.3386
manhattan_recall 1.0
manhattan_ap 0.3513
euclidean_accuracy 0.666
euclidean_accuracy_threshold 5.0058
euclidean_f1 0.5066
euclidean_f1_threshold 19.0224
euclidean_precision 0.3392
euclidean_recall 1.0
euclidean_ap 0.3441
max_accuracy 0.666
max_accuracy_threshold 755.603
max_f1 0.5066
max_f1_threshold 587.0625
max_precision 0.3392
max_recall 1.0
max_ap 0.3513

Binary Classification

Metric Value
cosine_accuracy 0.5918
cosine_accuracy_threshold 0.9259
cosine_f1 0.6292
cosine_f1_threshold 0.7507
cosine_precision 0.4599
cosine_recall 0.9958
cosine_ap 0.5585
dot_accuracy 0.5918
dot_accuracy_threshold 711.1836
dot_f1 0.6292
dot_f1_threshold 576.597
dot_precision 0.4599
dot_recall 0.9958
dot_ap 0.5585
manhattan_accuracy 0.6191
manhattan_accuracy_threshold 188.0907
manhattan_f1 0.6302
manhattan_f1_threshold 237.8046
manhattan_precision 0.4841
manhattan_recall 0.9025
manhattan_ap 0.5898
euclidean_accuracy 0.5918
euclidean_accuracy_threshold 10.6727
euclidean_f1 0.6292
euclidean_f1_threshold 19.5537
euclidean_precision 0.4599
euclidean_recall 0.9958
euclidean_ap 0.5585
max_accuracy 0.6191
max_accuracy_threshold 711.1836
max_f1 0.6302
max_f1_threshold 576.597
max_precision 0.4841
max_recall 0.9958
max_ap 0.5898

Training Details

Evaluation Dataset

vitaminc-pairs

  • Dataset: vitaminc-pairs at be6febb
  • Size: 128 evaluation samples
  • Columns: claim and evidence
  • Approximate statistics based on the first 128 samples:
    claim evidence
    type string string
    details
    • min: 9 tokens
    • mean: 21.42 tokens
    • max: 41 tokens
    • min: 11 tokens
    • mean: 35.55 tokens
    • max: 79 tokens
  • Samples:
    claim evidence
    Dragon Con had over 5000 guests . Among the more than 6000 guests and musical performers at the 2009 convention were such notables as Patrick Stewart , William Shatner , Leonard Nimoy , Terry Gilliam , Bruce Boxleitner , James Marsters , and Mary McDonnell .
    COVID-19 has reached more than 185 countries . As of , more than cases of COVID-19 have been reported in more than 190 countries and 200 territories , resulting in more than deaths .
    In March , Italy had 3.6x times more cases of coronavirus than China . As of 12 March , among nations with at least one million citizens , Italy has the world 's highest per capita rate of positive coronavirus cases at 206.1 cases per million people ( 3.6x times the rate of China ) and is the country with the second-highest number of positive cases as well as of deaths in the world , after China .
  • Loss: CachedGISTEmbedLoss with these parameters:
    {'guide': SentenceTransformer(
      (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
      (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
      (2): Normalize()
    ), 'temperature': 0.025}
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 42
  • per_device_eval_batch_size: 128
  • gradient_accumulation_steps: 2
  • learning_rate: 3e-05
  • weight_decay: 0.001
  • lr_scheduler_type: cosine_with_min_lr
  • lr_scheduler_kwargs: {'num_cycles': 0.5, 'min_lr': 1e-05}
  • warmup_ratio: 0.25
  • save_safetensors: False
  • fp16: True
  • push_to_hub: True
  • hub_model_id: bobox/DeBERTa3-s-CustomPooling-test1-checkpoints-tmp
  • hub_strategy: all_checkpoints
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 42
  • per_device_eval_batch_size: 128
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 2
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 3e-05
  • weight_decay: 0.001
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 3
  • max_steps: -1
  • lr_scheduler_type: cosine_with_min_lr
  • lr_scheduler_kwargs: {'num_cycles': 0.5, 'min_lr': 1e-05}
  • warmup_ratio: 0.25
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: False
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: True
  • resume_from_checkpoint: None
  • hub_model_id: bobox/DeBERTa3-s-CustomPooling-test1-checkpoints-tmp
  • hub_strategy: all_checkpoints
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss vitaminc-pairs loss negation-triplets loss scitail-pairs-pos loss scitail-pairs-qa loss xsum-pairs loss sciq pairs loss qasc pairs loss openbookqa pairs loss msmarco pairs loss nq pairs loss trivia pairs loss gooaq pairs loss paws-pos loss global dataset loss sts-test_spearman_cosine allNLI-dev_max_ap Qnli-dev_max_ap
0.0009 1 5.8564 - - - - - - - - - - - - - - - - -
0.0018 2 7.1716 - - - - - - - - - - - - - - - - -
0.0027 3 5.9095 - - - - - - - - - - - - - - - - -
0.0035 4 5.0841 - - - - - - - - - - - - - - - - -
0.0044 5 4.0184 - - - - - - - - - - - - - - - - -
0.0053 6 6.2191 - - - - - - - - - - - - - - - - -
0.0062 7 5.6124 - - - - - - - - - - - - - - - - -
0.0071 8 3.9544 - - - - - - - - - - - - - - - - -
0.0080 9 4.7149 - - - - - - - - - - - - - - - - -
0.0088 10 4.9616 - - - - - - - - - - - - - - - - -
0.0097 11 5.2794 - - - - - - - - - - - - - - - - -
0.0106 12 8.8704 - - - - - - - - - - - - - - - - -
0.0115 13 6.0707 - - - - - - - - - - - - - - - - -
0.0124 14 5.4071 - - - - - - - - - - - - - - - - -
0.0133 15 6.9104 - - - - - - - - - - - - - - - - -
0.0142 16 6.0276 - - - - - - - - - - - - - - - - -
0.0150 17 6.737 - - - - - - - - - - - - - - - - -
0.0159 18 6.5354 - - - - - - - - - - - - - - - - -
0.0168 19 5.206 - - - - - - - - - - - - - - - - -
0.0177 20 5.2469 - - - - - - - - - - - - - - - - -
0.0186 21 5.3771 - - - - - - - - - - - - - - - - -
0.0195 22 4.979 - - - - - - - - - - - - - - - - -
0.0204 23 4.7909 - - - - - - - - - - - - - - - - -
0.0212 24 4.9086 - - - - - - - - - - - - - - - - -
0.0221 25 4.8826 - - - - - - - - - - - - - - - - -
0.0230 26 8.2266 - - - - - - - - - - - - - - - - -
0.0239 27 8.3024 - - - - - - - - - - - - - - - - -
0.0248 28 5.8745 - - - - - - - - - - - - - - - - -
0.0257 29 4.7298 - - - - - - - - - - - - - - - - -
0.0265 30 5.4614 - - - - - - - - - - - - - - - - -
0.0274 31 5.8594 - - - - - - - - - - - - - - - - -
0.0283 32 5.2401 - - - - - - - - - - - - - - - - -
0.0292 33 5.1579 - - - - - - - - - - - - - - - - -
0.0301 34 5.2181 - - - - - - - - - - - - - - - - -
0.0310 35 4.6328 - - - - - - - - - - - - - - - - -
0.0319 36 2.121 - - - - - - - - - - - - - - - - -
0.0327 37 5.9026 - - - - - - - - - - - - - - - - -
0.0336 38 7.3796 - - - - - - - - - - - - - - - - -
0.0345 39 5.5361 - - - - - - - - - - - - - - - - -
0.0354 40 4.0243 2.9018 5.6903 2.1136 2.8052 6.5831 0.8882 4.1148 5.0966 10.3911 10.9032 7.1904 8.1935 1.3943 5.6716 0.1879 0.3385 0.5781
0.0363 41 4.9072 - - - - - - - - - - - - - - - - -
0.0372 42 3.4439 - - - - - - - - - - - - - - - - -
0.0381 43 4.9787 - - - - - - - - - - - - - - - - -
0.0389 44 5.8318 - - - - - - - - - - - - - - - - -
0.0398 45 5.3226 - - - - - - - - - - - - - - - - -
0.0407 46 5.1181 - - - - - - - - - - - - - - - - -
0.0416 47 4.7834 - - - - - - - - - - - - - - - - -
0.0425 48 6.6303 - - - - - - - - - - - - - - - - -
0.0434 49 5.8171 - - - - - - - - - - - - - - - - -
0.0442 50 5.1962 - - - - - - - - - - - - - - - - -
0.0451 51 5.2096 - - - - - - - - - - - - - - - - -
0.0460 52 5.0943 - - - - - - - - - - - - - - - - -
0.0469 53 4.9038 - - - - - - - - - - - - - - - - -
0.0478 54 4.6479 - - - - - - - - - - - - - - - - -
0.0487 55 5.5098 - - - - - - - - - - - - - - - - -
0.0496 56 4.6979 - - - - - - - - - - - - - - - - -
0.0504 57 3.1969 - - - - - - - - - - - - - - - - -
0.0513 58 4.4127 - - - - - - - - - - - - - - - - -
0.0522 59 3.7746 - - - - - - - - - - - - - - - - -
0.0531 60 4.5378 - - - - - - - - - - - - - - - - -
0.0540 61 5.0209 - - - - - - - - - - - - - - - - -
0.0549 62 6.5936 - - - - - - - - - - - - - - - - -
0.0558 63 4.2315 - - - - - - - - - - - - - - - - -
0.0566 64 6.4269 - - - - - - - - - - - - - - - - -
0.0575 65 4.2644 - - - - - - - - - - - - - - - - -
0.0584 66 5.1388 - - - - - - - - - - - - - - - - -
0.0593 67 5.1852 - - - - - - - - - - - - - - - - -
0.0602 68 4.8057 - - - - - - - - - - - - - - - - -
0.0611 69 3.1725 - - - - - - - - - - - - - - - - -
0.0619 70 3.3322 - - - - - - - - - - - - - - - - -
0.0628 71 5.139 - - - - - - - - - - - - - - - - -
0.0637 72 4.307 - - - - - - - - - - - - - - - - -
0.0646 73 5.0133 - - - - - - - - - - - - - - - - -
0.0655 74 4.0507 - - - - - - - - - - - - - - - - -
0.0664 75 3.3895 - - - - - - - - - - - - - - - - -
0.0673 76 5.6736 - - - - - - - - - - - - - - - - -
0.0681 77 4.2572 - - - - - - - - - - - - - - - - -
0.0690 78 3.0796 - - - - - - - - - - - - - - - - -
0.0699 79 5.0199 - - - - - - - - - - - - - - - - -
0.0708 80 4.1414 2.7794 4.8890 1.8997 2.6761 6.2096 0.7622 3.3129 4.5498 7.2056 7.6809 6.3792 6.6567 1.3848 5.0030 0.2480 0.3513 0.5898

Framework Versions

  • Python: 3.10.14
  • Sentence Transformers: 3.2.0
  • Transformers: 4.45.1
  • PyTorch: 2.4.0
  • Accelerate: 0.34.2
  • Datasets: 3.0.1
  • Tokenizers: 0.20.0

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}