bobox's picture
Training in progress, step 305, checkpoint
a952843 verified
|
raw
history blame
67.7 kB
metadata
base_model: microsoft/deberta-v3-small
library_name: sentence-transformers
metrics:
  - pearson_cosine
  - spearman_cosine
  - pearson_manhattan
  - spearman_manhattan
  - pearson_euclidean
  - spearman_euclidean
  - pearson_dot
  - spearman_dot
  - pearson_max
  - spearman_max
  - cosine_accuracy
  - cosine_accuracy_threshold
  - cosine_f1
  - cosine_f1_threshold
  - cosine_precision
  - cosine_recall
  - cosine_ap
  - dot_accuracy
  - dot_accuracy_threshold
  - dot_f1
  - dot_f1_threshold
  - dot_precision
  - dot_recall
  - dot_ap
  - manhattan_accuracy
  - manhattan_accuracy_threshold
  - manhattan_f1
  - manhattan_f1_threshold
  - manhattan_precision
  - manhattan_recall
  - manhattan_ap
  - euclidean_accuracy
  - euclidean_accuracy_threshold
  - euclidean_f1
  - euclidean_f1_threshold
  - euclidean_precision
  - euclidean_recall
  - euclidean_ap
  - max_accuracy
  - max_accuracy_threshold
  - max_f1
  - max_f1_threshold
  - max_precision
  - max_recall
  - max_ap
pipeline_tag: sentence-similarity
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - generated_from_trainer
  - dataset_size:32500
  - loss:GISTEmbedLoss
widget:
  - source_sentence: phase changes do not change
    sentences:
      - >-
        The major Atlantic slave trading nations, ordered by trade volume, were
        the Portuguese, the British, the Spanish, the French, the Dutch, and the
        Danish. Several had established outposts on the African coast where they
        purchased slaves from local African leaders.
      - >-
        phase changes do not change mass. Particles have mass, but mass is
        energy. 
         phase changes do not change  energy
      - >-
        According to the U.S. Census Bureau , the county is a total area of ,
        which has land and ( 0.2 % ) is water .
  - source_sentence: what jobs can you get with a bachelor degree in anthropology?
    sentences:
      - >-
        To determine the atomic weight of an element, you should add up protons
        and neutrons.
      - >-
        ['Paleontologist*', 'Archaeologist*', 'University Professor*', 'Market
        Research Analyst*', 'Primatologist.', 'Forensic Scientist*', 'Medical
        Anthropologist.', 'Museum Technician.']
      - >-
        The wingspan flies , the moth comes depending on the location from July
        to August .
  - source_sentence: Identify different forms of energy (e.g., light, sound, heat).
    sentences:
      - >-
        `` Irreplaceable '' '' remained on the chart for thirty weeks , and was
        certified double-platinum by the Recording Industry Association of
        America ( RIAA ) , denoting sales of two million downloads , and had
        sold over 3,139,000 paid digital downloads in the US as of October 2012
        , according to Nielsen SoundScan . ''
      - >-
        On Rotten Tomatoes , the film has a rating of 63 % , based on 87 reviews
        , with an average rating of 5.9/10 .
      - Heat, light, and sound are all different forms of energy.
  - source_sentence: what is so small it can only be seen with an electron microscope?
    sentences:
      - >-
        Viruses are so small that they can be seen only with an electron
        microscope.. Where most viruses are DNA, HIV is an RNA virus. 
         HIV is so small it can only be seen with an electron microscope
      - >-
        The development of modern lasers has opened many doors to both research
        and applications. A laser beam was used to measure the distance from the
        Earth to the moon. Lasers are important components of CD players. As the
        image above illustrates, lasers can provide precise focusing of beams to
        selectively destroy cancer cells in patients. The ability of a laser to
        focus precisely is due to high-quality crystals that help give rise to
        the laser beam. A variety of techniques are used to manufacture pure
        crystals for use in lasers.
      - >-
        Discussion for (a) This value is the net work done on the package. The
        person actually does more work than this, because friction opposes the
        motion. Friction does negative work and removes some of the energy the
        person expends and converts it to thermal energy. The net work equals
        the sum of the work done by each individual force. Strategy and Concept
        for (b) The forces acting on the package are gravity, the normal force,
        the force of friction, and the applied force. The normal force and force
        of gravity are each perpendicular to the displacement, and therefore do
        no work. Solution for (b) The applied force does work.
  - source_sentence: what aspects of your environment may relate to the epidemic of obesity
    sentences:
      - >-
        Jan Kromkamp ( born August 17 , 1980 in Makkinga , Netherlands ) is a
        Dutch footballer .
      - >-
        When chemicals in solution react, the proper way of writing the chemical
        formulas of the dissolved ionic compounds is in terms of the dissociated
        ions, not the complete ionic formula. A complete ionic equation is a
        chemical equation in which the dissolved ionic compounds are written as
        separated ions. Solubility rules are very useful in determining which
        ionic compounds are dissolved and which are not. For example, when
        NaCl(aq) reacts with AgNO3(aq) in a double-replacement reaction to
        precipitate AgCl(s) and form NaNO3(aq), the complete ionic equation
        includes NaCl, AgNO3, and NaNO3 written as separated ions:.
      - >-
        Genetic changes in human populations occur too slowly to be responsible
        for the obesity epidemic. Nevertheless, the variation in how people
        respond to the environment that promotes physical inactivity and intake
        of high-calorie foods suggests that genes do play a role in the
        development of obesity.
model-index:
  - name: SentenceTransformer based on microsoft/deberta-v3-small
    results:
      - task:
          type: semantic-similarity
          name: Semantic Similarity
        dataset:
          name: sts test
          type: sts-test
        metrics:
          - type: pearson_cosine
            value: 0.3774946012125992
            name: Pearson Cosine
          - type: spearman_cosine
            value: 0.4056589966976888
            name: Spearman Cosine
          - type: pearson_manhattan
            value: 0.3861982631744407
            name: Pearson Manhattan
          - type: spearman_manhattan
            value: 0.4059364545183154
            name: Spearman Manhattan
          - type: pearson_euclidean
            value: 0.38652243004790016
            name: Pearson Euclidean
          - type: spearman_euclidean
            value: 0.4056589966976888
            name: Spearman Euclidean
          - type: pearson_dot
            value: 0.3774648453085433
            name: Pearson Dot
          - type: spearman_dot
            value: 0.40563469676275316
            name: Spearman Dot
          - type: pearson_max
            value: 0.38652243004790016
            name: Pearson Max
          - type: spearman_max
            value: 0.4059364545183154
            name: Spearman Max
      - task:
          type: binary-classification
          name: Binary Classification
        dataset:
          name: allNLI dev
          type: allNLI-dev
        metrics:
          - type: cosine_accuracy
            value: 0.67578125
            name: Cosine Accuracy
          - type: cosine_accuracy_threshold
            value: 0.9427558183670044
            name: Cosine Accuracy Threshold
          - type: cosine_f1
            value: 0.5225225225225225
            name: Cosine F1
          - type: cosine_f1_threshold
            value: 0.8046966791152954
            name: Cosine F1 Threshold
          - type: cosine_precision
            value: 0.3795811518324607
            name: Cosine Precision
          - type: cosine_recall
            value: 0.838150289017341
            name: Cosine Recall
          - type: cosine_ap
            value: 0.4368751759846574
            name: Cosine Ap
          - type: dot_accuracy
            value: 0.67578125
            name: Dot Accuracy
          - type: dot_accuracy_threshold
            value: 724.1080322265625
            name: Dot Accuracy Threshold
          - type: dot_f1
            value: 0.5225225225225225
            name: Dot F1
          - type: dot_f1_threshold
            value: 618.074951171875
            name: Dot F1 Threshold
          - type: dot_precision
            value: 0.3795811518324607
            name: Dot Precision
          - type: dot_recall
            value: 0.838150289017341
            name: Dot Recall
          - type: dot_ap
            value: 0.436842886797982
            name: Dot Ap
          - type: manhattan_accuracy
            value: 0.677734375
            name: Manhattan Accuracy
          - type: manhattan_accuracy_threshold
            value: 223.6764373779297
            name: Manhattan Accuracy Threshold
          - type: manhattan_f1
            value: 0.5239852398523985
            name: Manhattan F1
          - type: manhattan_f1_threshold
            value: 372.31396484375
            name: Manhattan F1 Threshold
          - type: manhattan_precision
            value: 0.38482384823848237
            name: Manhattan Precision
          - type: manhattan_recall
            value: 0.8208092485549133
            name: Manhattan Recall
          - type: manhattan_ap
            value: 0.43892484929307635
            name: Manhattan Ap
          - type: euclidean_accuracy
            value: 0.67578125
            name: Euclidean Accuracy
          - type: euclidean_accuracy_threshold
            value: 9.377331733703613
            name: Euclidean Accuracy Threshold
          - type: euclidean_f1
            value: 0.5225225225225225
            name: Euclidean F1
          - type: euclidean_f1_threshold
            value: 17.321048736572266
            name: Euclidean F1 Threshold
          - type: euclidean_precision
            value: 0.3795811518324607
            name: Euclidean Precision
          - type: euclidean_recall
            value: 0.838150289017341
            name: Euclidean Recall
          - type: euclidean_ap
            value: 0.4368602200677977
            name: Euclidean Ap
          - type: max_accuracy
            value: 0.677734375
            name: Max Accuracy
          - type: max_accuracy_threshold
            value: 724.1080322265625
            name: Max Accuracy Threshold
          - type: max_f1
            value: 0.5239852398523985
            name: Max F1
          - type: max_f1_threshold
            value: 618.074951171875
            name: Max F1 Threshold
          - type: max_precision
            value: 0.38482384823848237
            name: Max Precision
          - type: max_recall
            value: 0.838150289017341
            name: Max Recall
          - type: max_ap
            value: 0.43892484929307635
            name: Max Ap
      - task:
          type: binary-classification
          name: Binary Classification
        dataset:
          name: Qnli dev
          type: Qnli-dev
        metrics:
          - type: cosine_accuracy
            value: 0.646484375
            name: Cosine Accuracy
          - type: cosine_accuracy_threshold
            value: 0.8057259321212769
            name: Cosine Accuracy Threshold
          - type: cosine_f1
            value: 0.6688102893890675
            name: Cosine F1
          - type: cosine_f1_threshold
            value: 0.7187118530273438
            name: Cosine F1 Threshold
          - type: cosine_precision
            value: 0.538860103626943
            name: Cosine Precision
          - type: cosine_recall
            value: 0.8813559322033898
            name: Cosine Recall
          - type: cosine_ap
            value: 0.6720663622193426
            name: Cosine Ap
          - type: dot_accuracy
            value: 0.646484375
            name: Dot Accuracy
          - type: dot_accuracy_threshold
            value: 618.8643798828125
            name: Dot Accuracy Threshold
          - type: dot_f1
            value: 0.6688102893890675
            name: Dot F1
          - type: dot_f1_threshold
            value: 552.0260009765625
            name: Dot F1 Threshold
          - type: dot_precision
            value: 0.538860103626943
            name: Dot Precision
          - type: dot_recall
            value: 0.8813559322033898
            name: Dot Recall
          - type: dot_ap
            value: 0.672083506527328
            name: Dot Ap
          - type: manhattan_accuracy
            value: 0.6484375
            name: Manhattan Accuracy
          - type: manhattan_accuracy_threshold
            value: 386.58905029296875
            name: Manhattan Accuracy Threshold
          - type: manhattan_f1
            value: 0.6645569620253164
            name: Manhattan F1
          - type: manhattan_f1_threshold
            value: 462.609130859375
            name: Manhattan F1 Threshold
          - type: manhattan_precision
            value: 0.5303030303030303
            name: Manhattan Precision
          - type: manhattan_recall
            value: 0.8898305084745762
            name: Manhattan Recall
          - type: manhattan_ap
            value: 0.6724653688821339
            name: Manhattan Ap
          - type: euclidean_accuracy
            value: 0.646484375
            name: Euclidean Accuracy
          - type: euclidean_accuracy_threshold
            value: 17.27533721923828
            name: Euclidean Accuracy Threshold
          - type: euclidean_f1
            value: 0.6688102893890675
            name: Euclidean F1
          - type: euclidean_f1_threshold
            value: 20.787063598632812
            name: Euclidean F1 Threshold
          - type: euclidean_precision
            value: 0.538860103626943
            name: Euclidean Precision
          - type: euclidean_recall
            value: 0.8813559322033898
            name: Euclidean Recall
          - type: euclidean_ap
            value: 0.6720591998758361
            name: Euclidean Ap
          - type: max_accuracy
            value: 0.6484375
            name: Max Accuracy
          - type: max_accuracy_threshold
            value: 618.8643798828125
            name: Max Accuracy Threshold
          - type: max_f1
            value: 0.6688102893890675
            name: Max F1
          - type: max_f1_threshold
            value: 552.0260009765625
            name: Max F1 Threshold
          - type: max_precision
            value: 0.538860103626943
            name: Max Precision
          - type: max_recall
            value: 0.8898305084745762
            name: Max Recall
          - type: max_ap
            value: 0.6724653688821339
            name: Max Ap

SentenceTransformer based on microsoft/deberta-v3-small

This is a sentence-transformers model finetuned from microsoft/deberta-v3-small. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: microsoft/deberta-v3-small
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 768 tokens
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: DebertaV2Model 
  (1): AdvancedWeightedPooling(
    (alpha_dropout_layer): Dropout(p=0.05, inplace=False)
    (gate_dropout_layer): Dropout(p=0.0, inplace=False)
    (linear_cls_Qpj): Linear(in_features=768, out_features=768, bias=True)
    (linear_attnOut): Linear(in_features=768, out_features=768, bias=True)
    (mha): MultiheadAttention(
      (out_proj): NonDynamicallyQuantizableLinear(in_features=768, out_features=768, bias=True)
    )
    (layernorm_output): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
    (layernorm_weightedPooing): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
    (layernorm_attnOut): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
  )
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("bobox/DeBERTa3-s-CustomPoolin-toytest4-step1-checkpoints-tmp")
# Run inference
sentences = [
    'what aspects of your environment may relate to the epidemic of obesity',
    'Genetic changes in human populations occur too slowly to be responsible for the obesity epidemic. Nevertheless, the variation in how people respond to the environment that promotes physical inactivity and intake of high-calorie foods suggests that genes do play a role in the development of obesity.',
    'When chemicals in solution react, the proper way of writing the chemical formulas of the dissolved ionic compounds is in terms of the dissociated ions, not the complete ionic formula. A complete ionic equation is a chemical equation in which the dissolved ionic compounds are written as separated ions. Solubility rules are very useful in determining which ionic compounds are dissolved and which are not. For example, when NaCl(aq) reacts with AgNO3(aq) in a double-replacement reaction to precipitate AgCl(s) and form NaNO3(aq), the complete ionic equation includes NaCl, AgNO3, and NaNO3 written as separated ions:.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Semantic Similarity

Metric Value
pearson_cosine 0.3775
spearman_cosine 0.4057
pearson_manhattan 0.3862
spearman_manhattan 0.4059
pearson_euclidean 0.3865
spearman_euclidean 0.4057
pearson_dot 0.3775
spearman_dot 0.4056
pearson_max 0.3865
spearman_max 0.4059

Binary Classification

Metric Value
cosine_accuracy 0.6758
cosine_accuracy_threshold 0.9428
cosine_f1 0.5225
cosine_f1_threshold 0.8047
cosine_precision 0.3796
cosine_recall 0.8382
cosine_ap 0.4369
dot_accuracy 0.6758
dot_accuracy_threshold 724.108
dot_f1 0.5225
dot_f1_threshold 618.075
dot_precision 0.3796
dot_recall 0.8382
dot_ap 0.4368
manhattan_accuracy 0.6777
manhattan_accuracy_threshold 223.6764
manhattan_f1 0.524
manhattan_f1_threshold 372.314
manhattan_precision 0.3848
manhattan_recall 0.8208
manhattan_ap 0.4389
euclidean_accuracy 0.6758
euclidean_accuracy_threshold 9.3773
euclidean_f1 0.5225
euclidean_f1_threshold 17.321
euclidean_precision 0.3796
euclidean_recall 0.8382
euclidean_ap 0.4369
max_accuracy 0.6777
max_accuracy_threshold 724.108
max_f1 0.524
max_f1_threshold 618.075
max_precision 0.3848
max_recall 0.8382
max_ap 0.4389

Binary Classification

Metric Value
cosine_accuracy 0.6465
cosine_accuracy_threshold 0.8057
cosine_f1 0.6688
cosine_f1_threshold 0.7187
cosine_precision 0.5389
cosine_recall 0.8814
cosine_ap 0.6721
dot_accuracy 0.6465
dot_accuracy_threshold 618.8644
dot_f1 0.6688
dot_f1_threshold 552.026
dot_precision 0.5389
dot_recall 0.8814
dot_ap 0.6721
manhattan_accuracy 0.6484
manhattan_accuracy_threshold 386.5891
manhattan_f1 0.6646
manhattan_f1_threshold 462.6091
manhattan_precision 0.5303
manhattan_recall 0.8898
manhattan_ap 0.6725
euclidean_accuracy 0.6465
euclidean_accuracy_threshold 17.2753
euclidean_f1 0.6688
euclidean_f1_threshold 20.7871
euclidean_precision 0.5389
euclidean_recall 0.8814
euclidean_ap 0.6721
max_accuracy 0.6484
max_accuracy_threshold 618.8644
max_f1 0.6688
max_f1_threshold 552.026
max_precision 0.5389
max_recall 0.8898
max_ap 0.6725

Training Details

Training Dataset

Unnamed Dataset

  • Size: 32,500 training samples
  • Columns: sentence1 and sentence2
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2
    type string string
    details
    • min: 4 tokens
    • mean: 29.39 tokens
    • max: 323 tokens
    • min: 2 tokens
    • mean: 54.42 tokens
    • max: 423 tokens
  • Samples:
    sentence1 sentence2
    In which London road is Harrod’s department store? Harrods, Brompton Road, London
    e. in solids the atoms are closely locked in position and can only vibrate, in liquids the atoms and molecules are more loosely connected and can collide with and move past one another, while in gases the atoms or molecules are free to move independently, colliding frequently. Within a substance, atoms that collide frequently and move independently of one another are most likely in a gas
    Joe Cole was unable to join West Bromwich Albion . On 16th October Joe Cole took a long hard look at himself realising that he would never get the opportunity to join West Bromwich Albion and joined Coventry City instead.
  • Loss: GISTEmbedLoss with these parameters:
    {'guide': SentenceTransformer(
      (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
      (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
      (2): Normalize()
    ), 'temperature': 0.025}
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 256
  • lr_scheduler_type: cosine_with_min_lr
  • lr_scheduler_kwargs: {'num_cycles': 0.5, 'min_lr': 3.3333333333333337e-06}
  • warmup_ratio: 0.33
  • save_safetensors: False
  • fp16: True
  • push_to_hub: True
  • hub_model_id: bobox/DeBERTa3-s-CustomPoolin-toytest4-step1-checkpoints-tmp
  • hub_strategy: all_checkpoints
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 256
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 3
  • max_steps: -1
  • lr_scheduler_type: cosine_with_min_lr
  • lr_scheduler_kwargs: {'num_cycles': 0.5, 'min_lr': 3.3333333333333337e-06}
  • warmup_ratio: 0.33
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: False
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: True
  • resume_from_checkpoint: None
  • hub_model_id: bobox/DeBERTa3-s-CustomPoolin-toytest4-step1-checkpoints-tmp
  • hub_strategy: all_checkpoints
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • eval_use_gather_object: False
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Click to expand
Epoch Step Training Loss sts-test_spearman_cosine allNLI-dev_max_ap Qnli-dev_max_ap
0.0010 1 6.0688 - - -
0.0020 2 7.5576 - - -
0.0030 3 4.6849 - - -
0.0039 4 5.4503 - - -
0.0049 5 5.6057 - - -
0.0059 6 6.3049 - - -
0.0069 7 6.8336 - - -
0.0079 8 5.0777 - - -
0.0089 9 4.8358 - - -
0.0098 10 4.641 - - -
0.0108 11 4.828 - - -
0.0118 12 5.2269 - - -
0.0128 13 5.6772 - - -
0.0138 14 5.1422 - - -
0.0148 15 6.2469 - - -
0.0157 16 4.6802 - - -
0.0167 17 4.5492 - - -
0.0177 18 4.8062 - - -
0.0187 19 7.5141 - - -
0.0197 20 5.5202 - - -
0.0207 21 6.5025 - - -
0.0217 22 7.318 - - -
0.0226 23 4.6458 - - -
0.0236 24 4.6191 - - -
0.0246 25 4.3159 - - -
0.0256 26 6.3677 - - -
0.0266 27 5.6052 - - -
0.0276 28 4.196 - - -
0.0285 29 4.4802 - - -
0.0295 30 4.9193 - - -
0.0305 31 4.0996 - - -
0.0315 32 5.6307 - - -
0.0325 33 4.5745 - - -
0.0335 34 4.4514 - - -
0.0344 35 4.0617 - - -
0.0354 36 5.0298 - - -
0.0364 37 3.9815 - - -
0.0374 38 4.0871 - - -
0.0384 39 4.2378 - - -
0.0394 40 3.8226 - - -
0.0404 41 4.3519 - - -
0.0413 42 3.6345 - - -
0.0423 43 5.0829 - - -
0.0433 44 4.6701 - - -
0.0443 45 4.1371 - - -
0.0453 46 4.2418 - - -
0.0463 47 4.4766 - - -
0.0472 48 4.4797 - - -
0.0482 49 3.8471 - - -
0.0492 50 4.3194 - - -
0.0502 51 3.9426 - - -
0.0512 52 3.5333 - - -
0.0522 53 4.2426 - - -
0.0531 54 3.9816 - - -
0.0541 55 3.663 - - -
0.0551 56 3.9057 - - -
0.0561 57 4.0345 - - -
0.0571 58 3.5233 - - -
0.0581 59 3.7999 - - -
0.0591 60 3.1885 - - -
0.0600 61 3.6013 - - -
0.0610 62 3.392 - - -
0.0620 63 3.3814 - - -
0.0630 64 4.0428 - - -
0.0640 65 3.7825 - - -
0.0650 66 3.4181 - - -
0.0659 67 3.7793 - - -
0.0669 68 3.8344 - - -
0.0679 69 3.2165 - - -
0.0689 70 3.3811 - - -
0.0699 71 3.5984 - - -
0.0709 72 3.8583 - - -
0.0719 73 3.296 - - -
0.0728 74 2.7661 - - -
0.0738 75 2.9805 - - -
0.0748 76 2.566 - - -
0.0758 77 3.258 - - -
0.0768 78 3.3804 - - -
0.0778 79 2.8828 - - -
0.0787 80 3.1077 - - -
0.0797 81 2.9441 - - -
0.0807 82 2.9465 - - -
0.0817 83 2.7088 - - -
0.0827 84 2.9215 - - -
0.0837 85 3.4698 - - -
0.0846 86 2.2414 - - -
0.0856 87 3.1601 - - -
0.0866 88 2.7714 - - -
0.0876 89 3.0311 - - -
0.0886 90 3.0336 - - -
0.0896 91 1.9358 - - -
0.0906 92 2.6031 - - -
0.0915 93 2.7515 - - -
0.0925 94 2.8496 - - -
0.0935 95 1.8015 - - -
0.0945 96 2.8138 - - -
0.0955 97 2.0597 - - -
0.0965 98 2.1053 - - -
0.0974 99 2.6785 - - -
0.0984 100 2.588 - - -
0.0994 101 2.0099 - - -
0.1004 102 2.7947 - - -
0.1014 103 2.3274 - - -
0.1024 104 2.2545 - - -
0.1033 105 2.4575 - - -
0.1043 106 2.4413 - - -
0.1053 107 2.3185 - - -
0.1063 108 2.1577 - - -
0.1073 109 2.1278 - - -
0.1083 110 2.0967 - - -
0.1093 111 2.6142 - - -
0.1102 112 1.8553 - - -
0.1112 113 2.1523 - - -
0.1122 114 2.1726 - - -
0.1132 115 1.8564 - - -
0.1142 116 1.8413 - - -
0.1152 117 2.0441 - - -
0.1161 118 2.2159 - - -
0.1171 119 2.6779 - - -
0.1181 120 2.2976 - - -
0.1191 121 1.9407 - - -
0.1201 122 1.9019 - - -
0.1211 123 2.2149 - - -
0.1220 124 1.6823 - - -
0.1230 125 1.8402 - - -
0.1240 126 1.6914 - - -
0.125 127 2.1626 - - -
0.1260 128 1.6414 - - -
0.1270 129 2.2043 - - -
0.1280 130 1.9987 - - -
0.1289 131 1.8868 - - -
0.1299 132 1.8262 - - -
0.1309 133 2.0404 - - -
0.1319 134 1.9134 - - -
0.1329 135 2.3725 - - -
0.1339 136 1.4127 - - -
0.1348 137 1.6876 - - -
0.1358 138 1.8376 - - -
0.1368 139 1.6992 - - -
0.1378 140 1.5032 - - -
0.1388 141 2.0334 - - -
0.1398 142 2.3581 - - -
0.1407 143 1.4236 - - -
0.1417 144 2.202 - - -
0.1427 145 1.7654 - - -
0.1437 146 1.5748 - - -
0.1447 147 1.7996 - - -
0.1457 148 1.7517 - - -
0.1467 149 1.8933 - - -
0.1476 150 1.2836 - - -
0.1486 151 1.7145 - - -
0.1496 152 1.6499 - - -
0.1506 153 1.8273 0.4057 0.4389 0.6725
0.1516 154 2.2859 - - -
0.1526 155 1.0833 - - -
0.1535 156 1.6829 - - -
0.1545 157 2.1464 - - -
0.1555 158 1.745 - - -
0.1565 159 1.7319 - - -
0.1575 160 1.6968 - - -
0.1585 161 1.7401 - - -
0.1594 162 1.729 - - -
0.1604 163 2.0782 - - -
0.1614 164 2.6545 - - -
0.1624 165 1.4045 - - -
0.1634 166 1.2937 - - -
0.1644 167 1.1171 - - -
0.1654 168 1.3537 - - -
0.1663 169 1.7028 - - -
0.1673 170 1.4143 - - -
0.1683 171 1.8648 - - -
0.1693 172 1.6768 - - -
0.1703 173 1.9528 - - -
0.1713 174 1.1718 - - -
0.1722 175 1.8176 - - -
0.1732 176 0.8439 - - -
0.1742 177 1.5092 - - -
0.1752 178 1.1947 - - -
0.1762 179 1.6395 - - -
0.1772 180 1.4394 - - -
0.1781 181 1.7548 - - -
0.1791 182 1.1181 - - -
0.1801 183 1.0271 - - -
0.1811 184 2.3108 - - -
0.1821 185 2.1242 - - -
0.1831 186 1.9822 - - -
0.1841 187 2.3605 - - -
0.1850 188 1.5251 - - -
0.1860 189 1.2351 - - -
0.1870 190 1.5859 - - -
0.1880 191 1.8056 - - -
0.1890 192 1.349 - - -
0.1900 193 0.893 - - -
0.1909 194 1.5122 - - -
0.1919 195 1.3875 - - -
0.1929 196 1.29 - - -
0.1939 197 2.2931 - - -
0.1949 198 1.2663 - - -
0.1959 199 1.9712 - - -
0.1969 200 2.3307 - - -
0.1978 201 1.6544 - - -
0.1988 202 1.638 - - -
0.1998 203 1.3412 - - -
0.2008 204 1.4454 - - -
0.2018 205 1.5437 - - -
0.2028 206 1.4921 - - -
0.2037 207 1.4298 - - -
0.2047 208 1.6174 - - -
0.2057 209 1.4137 - - -
0.2067 210 1.5652 - - -
0.2077 211 1.1631 - - -
0.2087 212 1.2351 - - -
0.2096 213 1.7537 - - -
0.2106 214 1.3186 - - -
0.2116 215 1.2258 - - -
0.2126 216 0.7695 - - -
0.2136 217 1.2775 - - -
0.2146 218 1.6795 - - -
0.2156 219 1.2862 - - -
0.2165 220 1.1723 - - -
0.2175 221 1.3322 - - -
0.2185 222 1.7564 - - -
0.2195 223 1.1071 - - -
0.2205 224 1.2011 - - -
0.2215 225 1.2303 - - -
0.2224 226 1.212 - - -
0.2234 227 1.0117 - - -
0.2244 228 1.1907 - - -
0.2254 229 2.1293 - - -
0.2264 230 1.3063 - - -
0.2274 231 1.2841 - - -
0.2283 232 1.3778 - - -
0.2293 233 1.2242 - - -
0.2303 234 0.9227 - - -
0.2313 235 1.2221 - - -
0.2323 236 2.1041 - - -
0.2333 237 1.3341 - - -
0.2343 238 1.0876 - - -
0.2352 239 1.3328 - - -
0.2362 240 1.2958 - - -
0.2372 241 1.1522 - - -
0.2382 242 1.7942 - - -
0.2392 243 1.1325 - - -
0.2402 244 1.6466 - - -
0.2411 245 1.4608 - - -
0.2421 246 0.6375 - - -
0.2431 247 2.0177 - - -
0.2441 248 1.2069 - - -
0.2451 249 0.7639 - - -
0.2461 250 1.3465 - - -
0.2470 251 1.064 - - -
0.2480 252 1.3757 - - -
0.2490 253 1.612 - - -
0.25 254 0.7917 - - -
0.2510 255 1.5515 - - -
0.2520 256 0.799 - - -
0.2530 257 0.9882 - - -
0.2539 258 1.1814 - - -
0.2549 259 0.6394 - - -
0.2559 260 1.4756 - - -
0.2569 261 0.5338 - - -
0.2579 262 0.9779 - - -
0.2589 263 1.5307 - - -
0.2598 264 1.1213 - - -
0.2608 265 0.9482 - - -
0.2618 266 0.9599 - - -
0.2628 267 1.4455 - - -
0.2638 268 1.6496 - - -
0.2648 269 0.7402 - - -
0.2657 270 0.7835 - - -
0.2667 271 0.7821 - - -
0.2677 272 1.5422 - - -
0.2687 273 1.0995 - - -
0.2697 274 1.378 - - -
0.2707 275 1.3562 - - -
0.2717 276 0.7376 - - -
0.2726 277 1.1678 - - -
0.2736 278 1.2989 - - -
0.2746 279 1.9559 - - -
0.2756 280 1.1237 - - -
0.2766 281 0.952 - - -
0.2776 282 1.6629 - - -
0.2785 283 1.871 - - -
0.2795 284 1.5946 - - -
0.2805 285 1.4456 - - -
0.2815 286 1.4085 - - -
0.2825 287 1.1394 - - -
0.2835 288 1.0315 - - -
0.2844 289 1.488 - - -
0.2854 290 1.4006 - - -
0.2864 291 0.9237 - - -
0.2874 292 1.163 - - -
0.2884 293 1.7037 - - -
0.2894 294 0.8715 - - -
0.2904 295 1.2101 - - -
0.2913 296 1.1179 - - -
0.2923 297 1.3986 - - -
0.2933 298 1.7068 - - -
0.2943 299 0.8695 - - -
0.2953 300 1.3778 - - -
0.2963 301 1.2834 - - -
0.2972 302 0.8123 - - -
0.2982 303 1.6521 - - -
0.2992 304 1.1064 - - -
0.3002 305 0.9578 - - -

Framework Versions

  • Python: 3.10.12
  • Sentence Transformers: 3.2.1
  • Transformers: 4.44.2
  • PyTorch: 2.5.0+cu121
  • Accelerate: 0.34.2
  • Datasets: 3.0.2
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

GISTEmbedLoss

@misc{solatorio2024gistembed,
    title={GISTEmbed: Guided In-sample Selection of Training Negatives for Text Embedding Fine-tuning},
    author={Aivin V. Solatorio},
    year={2024},
    eprint={2402.16829},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}