bobox's picture
Training in progress, step 305, checkpoint
ac0c9bd verified
metadata
base_model: microsoft/deberta-v3-small
library_name: sentence-transformers
metrics:
  - pearson_cosine
  - spearman_cosine
  - pearson_manhattan
  - spearman_manhattan
  - pearson_euclidean
  - spearman_euclidean
  - pearson_dot
  - spearman_dot
  - pearson_max
  - spearman_max
  - cosine_accuracy
  - cosine_accuracy_threshold
  - cosine_f1
  - cosine_f1_threshold
  - cosine_precision
  - cosine_recall
  - cosine_ap
  - dot_accuracy
  - dot_accuracy_threshold
  - dot_f1
  - dot_f1_threshold
  - dot_precision
  - dot_recall
  - dot_ap
  - manhattan_accuracy
  - manhattan_accuracy_threshold
  - manhattan_f1
  - manhattan_f1_threshold
  - manhattan_precision
  - manhattan_recall
  - manhattan_ap
  - euclidean_accuracy
  - euclidean_accuracy_threshold
  - euclidean_f1
  - euclidean_f1_threshold
  - euclidean_precision
  - euclidean_recall
  - euclidean_ap
  - max_accuracy
  - max_accuracy_threshold
  - max_f1
  - max_f1_threshold
  - max_precision
  - max_recall
  - max_ap
pipeline_tag: sentence-similarity
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - generated_from_trainer
  - dataset_size:32500
  - loss:GISTEmbedLoss
widget:
  - source_sentence: Fish hatch into larvae that are different from the adult form of species.
    sentences:
      - Fish hatch into larvae that are different from the adult form of?
      - amphibians hatch from eggs
      - >-
        A solenoid or coil wrapped around iron or certain other metals can form
        a(n) electromagnet.
  - source_sentence: >-
      About 200 countries and territories have reported coronavirus cases in
      2020 .
    sentences:
      - >-
        All-Time Olympic Games Medal Tally Analysis Home > Events > Olympics >
        Summer > Medal Tally > All-Time All-Time Olympic Games Medal Tally
        (Summer Olympics) Which country is the most successful at he Olympic
        Games? Here are the top ranked countries in terms of total medals won
        when all of the summer Games are considered (including the 2016 Rio
        Games). There are two tables presented, the first just lists the top
        countries based on the total medals won, the second table factors in how
        many Olympic Games the country appeared, averaging the total number of
        medals per Olympiad. A victory in a team sport is counted as one medal.
        The USA Has Won the Most Medals The US have clearly won the most gold
        medals and the most medals overall, more than doubling the next ranked
        country (these figures include medals won in Rio 2016). Second placed
        USSR had fewer appearances at the Olympics, and actually won more medals
        on average (see the 2nd table). The top 10 includes one country no
        longer in existence (the Soviet Union), so their medal totals will
        obviously not increase, however China is expected to continue a rapid
        rise up the ranks. With the addition of the 2016 data, China has moved
        up from 11th (in 2008) to 9th (2012) to 7th (2016). The country which
        has attended the most games without a medal is Monaco (20 Olympic
        Games), the country which has won the most medals without winning a gold
        medal is Malaysia (0 gold, 7 silver, 4 bronze). rank
      - >-
        An example of a reproductive behavior is salmon returning to their
        birthplace to lay their eggs
      - >-
        more than 664,000 cases of COVID-19 have been reported in over 190
        countries and territories , resulting in approximately 30,800 deaths .
  - source_sentence: >-
      The wave on a guitar string is transverse. the sound wave rattles a sheet
      of paper in a direction that shows the sound wave is what?
    sentences:
      - A Honda motorcycle parked in a grass driveway
      - >-
        In Panama tipping is a question of rewarding good service rather than an
        obligation. Restaurant bills don't include gratuities; adding 10% is
        customary. Bellhops and maids expect tips only in more expensive hotels,
        and $1–$2 per bag is the norm. You should also give a tip of up to $10
        per day to tour guides.
      - >-
        Figure 16.33 The wave on a guitar string is transverse. The sound wave
        rattles a sheet of paper in a direction that shows the sound wave is
        longitudinal.
  - source_sentence: The thermal production of a stove is generically used for
    sentences:
      - >-
        In total , 28 US victims were killed , while Viet Cong losses were
        killed 345 and a further 192 estimated killed .
      - a stove generates heat for cooking usually
      - >-
        A teenager has been charged over an incident in which a four-year-old
        girl was hurt when she was hit in the face by a brick thrown through a
        van window.
  - source_sentence: can sweet potatoes cause itching?
    sentences:
      - >-
        People with a true potato allergy may react immediately after touching,
        peeling, or eating potatoes. Symptoms may vary from person to person,
        but typical symptoms of a potato allergy include: rhinitis, including
        itchy or stinging eyes, a runny or stuffy nose, and sneezing.
      - riding a bike does not cause pollution
      - >-
        Dilation occurs when cell walls relax.. An aneurysm is a dilation, or
        bubble, that occurs in the wall of an artery. 
         an artery can be relaxed by dilation
model-index:
  - name: SentenceTransformer based on microsoft/deberta-v3-small
    results:
      - task:
          type: semantic-similarity
          name: Semantic Similarity
        dataset:
          name: sts test
          type: sts-test
        metrics:
          - type: pearson_cosine
            value: 0.2749904272806095
            name: Pearson Cosine
          - type: spearman_cosine
            value: 0.31159390381099095
            name: Spearman Cosine
          - type: pearson_manhattan
            value: 0.2923996087310511
            name: Pearson Manhattan
          - type: spearman_manhattan
            value: 0.3095556181083969
            name: Spearman Manhattan
          - type: pearson_euclidean
            value: 0.2934483033082174
            name: Pearson Euclidean
          - type: spearman_euclidean
            value: 0.3115817314678925
            name: Spearman Euclidean
          - type: pearson_dot
            value: 0.27496363262371837
            name: Pearson Dot
          - type: spearman_dot
            value: 0.31138581044552094
            name: Spearman Dot
          - type: pearson_max
            value: 0.2934483033082174
            name: Pearson Max
          - type: spearman_max
            value: 0.31159390381099095
            name: Spearman Max
      - task:
          type: binary-classification
          name: Binary Classification
        dataset:
          name: allNLI dev
          type: allNLI-dev
        metrics:
          - type: cosine_accuracy
            value: 0.67578125
            name: Cosine Accuracy
          - type: cosine_accuracy_threshold
            value: 0.9452645182609558
            name: Cosine Accuracy Threshold
          - type: cosine_f1
            value: 0.512
            name: Cosine F1
          - type: cosine_f1_threshold
            value: 0.8565204739570618
            name: Cosine F1 Threshold
          - type: cosine_precision
            value: 0.39143730886850153
            name: Cosine Precision
          - type: cosine_recall
            value: 0.7398843930635838
            name: Cosine Recall
          - type: cosine_ap
            value: 0.4264736612515921
            name: Cosine Ap
          - type: dot_accuracy
            value: 0.67578125
            name: Dot Accuracy
          - type: dot_accuracy_threshold
            value: 726.30615234375
            name: Dot Accuracy Threshold
          - type: dot_f1
            value: 0.512
            name: Dot F1
          - type: dot_f1_threshold
            value: 658.1103515625
            name: Dot F1 Threshold
          - type: dot_precision
            value: 0.39143730886850153
            name: Dot Precision
          - type: dot_recall
            value: 0.7398843930635838
            name: Dot Recall
          - type: dot_ap
            value: 0.42647535250956575
            name: Dot Ap
          - type: manhattan_accuracy
            value: 0.67578125
            name: Manhattan Accuracy
          - type: manhattan_accuracy_threshold
            value: 201.49061584472656
            name: Manhattan Accuracy Threshold
          - type: manhattan_f1
            value: 0.5107692307692308
            name: Manhattan F1
          - type: manhattan_f1_threshold
            value: 417.52728271484375
            name: Manhattan F1 Threshold
          - type: manhattan_precision
            value: 0.3480083857442348
            name: Manhattan Precision
          - type: manhattan_recall
            value: 0.9595375722543352
            name: Manhattan Recall
          - type: manhattan_ap
            value: 0.4252213828672732
            name: Manhattan Ap
          - type: euclidean_accuracy
            value: 0.67578125
            name: Euclidean Accuracy
          - type: euclidean_accuracy_threshold
            value: 9.171283721923828
            name: Euclidean Accuracy Threshold
          - type: euclidean_f1
            value: 0.512
            name: Euclidean F1
          - type: euclidean_f1_threshold
            value: 14.84876823425293
            name: Euclidean F1 Threshold
          - type: euclidean_precision
            value: 0.39143730886850153
            name: Euclidean Precision
          - type: euclidean_recall
            value: 0.7398843930635838
            name: Euclidean Recall
          - type: euclidean_ap
            value: 0.4264736612515921
            name: Euclidean Ap
          - type: max_accuracy
            value: 0.67578125
            name: Max Accuracy
          - type: max_accuracy_threshold
            value: 726.30615234375
            name: Max Accuracy Threshold
          - type: max_f1
            value: 0.512
            name: Max F1
          - type: max_f1_threshold
            value: 658.1103515625
            name: Max F1 Threshold
          - type: max_precision
            value: 0.39143730886850153
            name: Max Precision
          - type: max_recall
            value: 0.9595375722543352
            name: Max Recall
          - type: max_ap
            value: 0.42647535250956575
            name: Max Ap
      - task:
          type: binary-classification
          name: Binary Classification
        dataset:
          name: Qnli dev
          type: Qnli-dev
        metrics:
          - type: cosine_accuracy
            value: 0.634765625
            name: Cosine Accuracy
          - type: cosine_accuracy_threshold
            value: 0.8508153557777405
            name: Cosine Accuracy Threshold
          - type: cosine_f1
            value: 0.6505636070853462
            name: Cosine F1
          - type: cosine_f1_threshold
            value: 0.7770615816116333
            name: Cosine F1 Threshold
          - type: cosine_precision
            value: 0.5246753246753246
            name: Cosine Precision
          - type: cosine_recall
            value: 0.8559322033898306
            name: Cosine Recall
          - type: cosine_ap
            value: 0.6461335447626624
            name: Cosine Ap
          - type: dot_accuracy
            value: 0.634765625
            name: Dot Accuracy
          - type: dot_accuracy_threshold
            value: 653.7443237304688
            name: Dot Accuracy Threshold
          - type: dot_f1
            value: 0.6505636070853462
            name: Dot F1
          - type: dot_f1_threshold
            value: 597.0731811523438
            name: Dot F1 Threshold
          - type: dot_precision
            value: 0.5246753246753246
            name: Dot Precision
          - type: dot_recall
            value: 0.8559322033898306
            name: Dot Recall
          - type: dot_ap
            value: 0.6461682282377894
            name: Dot Ap
          - type: manhattan_accuracy
            value: 0.6328125
            name: Manhattan Accuracy
          - type: manhattan_accuracy_threshold
            value: 331.46282958984375
            name: Manhattan Accuracy Threshold
          - type: manhattan_f1
            value: 0.6501650165016502
            name: Manhattan F1
          - type: manhattan_f1_threshold
            value: 404.6050109863281
            name: Manhattan F1 Threshold
          - type: manhattan_precision
            value: 0.5324324324324324
            name: Manhattan Precision
          - type: manhattan_recall
            value: 0.8347457627118644
            name: Manhattan Recall
          - type: manhattan_ap
            value: 0.6431949026371255
            name: Manhattan Ap
          - type: euclidean_accuracy
            value: 0.634765625
            name: Euclidean Accuracy
          - type: euclidean_accuracy_threshold
            value: 15.141305923461914
            name: Euclidean Accuracy Threshold
          - type: euclidean_f1
            value: 0.6505636070853462
            name: Euclidean F1
          - type: euclidean_f1_threshold
            value: 18.50943946838379
            name: Euclidean F1 Threshold
          - type: euclidean_precision
            value: 0.5246753246753246
            name: Euclidean Precision
          - type: euclidean_recall
            value: 0.8559322033898306
            name: Euclidean Recall
          - type: euclidean_ap
            value: 0.6461382925406688
            name: Euclidean Ap
          - type: max_accuracy
            value: 0.634765625
            name: Max Accuracy
          - type: max_accuracy_threshold
            value: 653.7443237304688
            name: Max Accuracy Threshold
          - type: max_f1
            value: 0.6505636070853462
            name: Max F1
          - type: max_f1_threshold
            value: 597.0731811523438
            name: Max F1 Threshold
          - type: max_precision
            value: 0.5324324324324324
            name: Max Precision
          - type: max_recall
            value: 0.8559322033898306
            name: Max Recall
          - type: max_ap
            value: 0.6461682282377894
            name: Max Ap

SentenceTransformer based on microsoft/deberta-v3-small

This is a sentence-transformers model finetuned from microsoft/deberta-v3-small. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: microsoft/deberta-v3-small
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 768 tokens
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: DebertaV2Model 
  (1): AdvancedWeightedPooling(
    (alpha_dropout_layer): Dropout(p=0.01, inplace=False)
    (gate_dropout_layer): Dropout(p=0.05, inplace=False)
    (linear_cls_pj): Linear(in_features=768, out_features=768, bias=True)
    (linear_cls_Qpj): Linear(in_features=768, out_features=768, bias=True)
    (linear_mean_pj): Linear(in_features=768, out_features=768, bias=True)
    (linear_attnOut): Linear(in_features=768, out_features=768, bias=True)
    (mha): MultiheadAttention(
      (out_proj): NonDynamicallyQuantizableLinear(in_features=768, out_features=768, bias=True)
    )
    (layernorm_output): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
    (layernorm_weightedPooing): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
    (layernorm_pjCls): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
    (layernorm_pjMean): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
    (layernorm_attnOut): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
  )
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("bobox/DeBERTa3-s-CustomPoolin-toytest3-step1-checkpoints-tmp")
# Run inference
sentences = [
    'can sweet potatoes cause itching?',
    'People with a true potato allergy may react immediately after touching, peeling, or eating potatoes. Symptoms may vary from person to person, but typical symptoms of a potato allergy include: rhinitis, including itchy or stinging eyes, a runny or stuffy nose, and sneezing.',
    'riding a bike does not cause pollution',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Semantic Similarity

Metric Value
pearson_cosine 0.275
spearman_cosine 0.3116
pearson_manhattan 0.2924
spearman_manhattan 0.3096
pearson_euclidean 0.2934
spearman_euclidean 0.3116
pearson_dot 0.275
spearman_dot 0.3114
pearson_max 0.2934
spearman_max 0.3116

Binary Classification

Metric Value
cosine_accuracy 0.6758
cosine_accuracy_threshold 0.9453
cosine_f1 0.512
cosine_f1_threshold 0.8565
cosine_precision 0.3914
cosine_recall 0.7399
cosine_ap 0.4265
dot_accuracy 0.6758
dot_accuracy_threshold 726.3062
dot_f1 0.512
dot_f1_threshold 658.1104
dot_precision 0.3914
dot_recall 0.7399
dot_ap 0.4265
manhattan_accuracy 0.6758
manhattan_accuracy_threshold 201.4906
manhattan_f1 0.5108
manhattan_f1_threshold 417.5273
manhattan_precision 0.348
manhattan_recall 0.9595
manhattan_ap 0.4252
euclidean_accuracy 0.6758
euclidean_accuracy_threshold 9.1713
euclidean_f1 0.512
euclidean_f1_threshold 14.8488
euclidean_precision 0.3914
euclidean_recall 0.7399
euclidean_ap 0.4265
max_accuracy 0.6758
max_accuracy_threshold 726.3062
max_f1 0.512
max_f1_threshold 658.1104
max_precision 0.3914
max_recall 0.9595
max_ap 0.4265

Binary Classification

Metric Value
cosine_accuracy 0.6348
cosine_accuracy_threshold 0.8508
cosine_f1 0.6506
cosine_f1_threshold 0.7771
cosine_precision 0.5247
cosine_recall 0.8559
cosine_ap 0.6461
dot_accuracy 0.6348
dot_accuracy_threshold 653.7443
dot_f1 0.6506
dot_f1_threshold 597.0732
dot_precision 0.5247
dot_recall 0.8559
dot_ap 0.6462
manhattan_accuracy 0.6328
manhattan_accuracy_threshold 331.4628
manhattan_f1 0.6502
manhattan_f1_threshold 404.605
manhattan_precision 0.5324
manhattan_recall 0.8347
manhattan_ap 0.6432
euclidean_accuracy 0.6348
euclidean_accuracy_threshold 15.1413
euclidean_f1 0.6506
euclidean_f1_threshold 18.5094
euclidean_precision 0.5247
euclidean_recall 0.8559
euclidean_ap 0.6461
max_accuracy 0.6348
max_accuracy_threshold 653.7443
max_f1 0.6506
max_f1_threshold 597.0732
max_precision 0.5324
max_recall 0.8559
max_ap 0.6462

Training Details

Training Dataset

Unnamed Dataset

  • Size: 32,500 training samples
  • Columns: sentence1 and sentence2
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2
    type string string
    details
    • min: 4 tokens
    • mean: 29.6 tokens
    • max: 369 tokens
    • min: 2 tokens
    • mean: 58.01 tokens
    • max: 437 tokens
  • Samples:
    sentence1 sentence2
    The song ‘Fashion for His Love’ by Lady Gaga is a tribute to which late fashion designer? Fashion Of His Love by Lady Gaga Songfacts Fashion Of His Love by Lady Gaga Songfacts Songfacts Gaga explained in a tweet that this track from her Born This Way Special Edition album is about the late Alexander McQueen. The fashion designer committed suicide by hanging on February 11, 2010 and Gaga was deeply affected by the tragic death of McQueen, who was a close personal friend. That same month, she performed at the 2010 Brit Awards wearing one of his couture creations and she also paid tribute to her late friend by setting the date on the prison security cameras in her Telephone video as the same day that McQueen's body was discovered in his London home.
    e. in solids the atoms are closely locked in position and can only vibrate, in liquids the atoms and molecules are more loosely connected and can collide with and move past one another, while in gases the atoms or molecules are free to move independently, colliding frequently. Within a substance, atoms that collide frequently and move independently of one another are most likely in a gas
    Helen Lederer is an English comedian . Helen Lederer ( born 24 September 1954 ) is an English : //www.scotsman.com/news/now-or-never-1-1396369 comedian , writer and actress who emerged as part of the alternative comedy boom at the beginning of the 1980s .
  • Loss: GISTEmbedLoss with these parameters:
    {'guide': SentenceTransformer(
      (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
      (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
      (2): Normalize()
    ), 'temperature': 0.025}
    

Evaluation Dataset

Unnamed Dataset

  • Size: 1,664 evaluation samples
  • Columns: sentence1 and sentence2
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2
    type string string
    details
    • min: 4 tokens
    • mean: 29.01 tokens
    • max: 367 tokens
    • min: 2 tokens
    • mean: 56.14 tokens
    • max: 389 tokens
  • Samples:
    sentence1 sentence2
    What planet did the voyager 1 spacecraft visit in 1980? The Voyager 1 spacecraft visited Saturn in 1980. Voyager 2 followed in 1981. These probes sent back detailed pictures of Saturn, its rings, and some of its moons ( Figure below ). From the Voyager data, we learned what Saturn’s rings are made of. They are particles of water and ice with a little bit of dust. There are several gaps in the rings. These gaps were cleared out by moons within the rings. Gravity attracts dust and gas to the moon from the ring. This leaves a gap in the rings. Other gaps in the rings are caused by the competing forces of Saturn and its moons outside the rings.
    Diffusion Diffusion is a process where atoms or molecules move from areas of high concentration to areas of low concentration. Diffusion is the process in which a substance naturally moves from an area of higher to lower concentration.
    Who had an 80s No 1 with Don't You Want Me? The Human League - Don't You Want Me - YouTube The Human League - Don't You Want Me Want to watch this again later? Sign in to add this video to a playlist. Need to report the video? Sign in to report inappropriate content. Rating is available when the video has been rented. This feature is not available right now. Please try again later. Uploaded on Feb 27, 2009 Music video by The Human League performing Don't You Want Me (2003 Digital Remaster). Category
  • Loss: GISTEmbedLoss with these parameters:
    {'guide': SentenceTransformer(
      (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
      (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
      (2): Normalize()
    ), 'temperature': 0.025}
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 256
  • lr_scheduler_type: cosine_with_min_lr
  • lr_scheduler_kwargs: {'num_cycles': 0.5, 'min_lr': 3.3333333333333337e-06}
  • warmup_ratio: 0.33
  • save_safetensors: False
  • fp16: True
  • push_to_hub: True
  • hub_model_id: bobox/DeBERTa3-s-CustomPoolin-toytest3-step1-checkpoints-tmp
  • hub_strategy: all_checkpoints
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 256
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 3
  • max_steps: -1
  • lr_scheduler_type: cosine_with_min_lr
  • lr_scheduler_kwargs: {'num_cycles': 0.5, 'min_lr': 3.3333333333333337e-06}
  • warmup_ratio: 0.33
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: False
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: True
  • resume_from_checkpoint: None
  • hub_model_id: bobox/DeBERTa3-s-CustomPoolin-toytest3-step1-checkpoints-tmp
  • hub_strategy: all_checkpoints
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • eval_use_gather_object: False
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Click to expand
Epoch Step Training Loss Validation Loss sts-test_spearman_cosine allNLI-dev_max_ap Qnli-dev_max_ap
0.0010 1 10.4072 - - - -
0.0020 2 11.0865 - - - -
0.0030 3 9.5114 - - - -
0.0039 4 9.9584 - - - -
0.0049 5 10.068 - - - -
0.0059 6 11.0224 - - - -
0.0069 7 9.7703 - - - -
0.0079 8 10.5005 - - - -
0.0089 9 10.1987 - - - -
0.0098 10 10.0277 - - - -
0.0108 11 10.6965 - - - -
0.0118 12 10.0609 - - - -
0.0128 13 11.6214 - - - -
0.0138 14 9.4053 - - - -
0.0148 15 10.4014 - - - -
0.0157 16 10.4119 - - - -
0.0167 17 9.4658 - - - -
0.0177 18 9.2169 - - - -
0.0187 19 11.2337 - - - -
0.0197 20 11.0572 - - - -
0.0207 21 11.0452 - - - -
0.0217 22 10.31 - - - -
0.0226 23 9.1395 - - - -
0.0236 24 8.4201 - - - -
0.0246 25 8.6036 - - - -
0.0256 26 11.7579 - - - -
0.0266 27 10.1307 - - - -
0.0276 28 9.2915 - - - -
0.0285 29 9.0208 - - - -
0.0295 30 8.6867 - - - -
0.0305 31 8.0925 - - - -
0.0315 32 8.6617 - - - -
0.0325 33 8.3374 - - - -
0.0335 34 7.8566 - - - -
0.0344 35 9.0698 - - - -
0.0354 36 7.7727 - - - -
0.0364 37 7.6128 - - - -
0.0374 38 7.8762 - - - -
0.0384 39 7.5191 - - - -
0.0394 40 7.5638 - - - -
0.0404 41 7.1878 - - - -
0.0413 42 6.8878 - - - -
0.0423 43 7.5775 - - - -
0.0433 44 7.1076 - - - -
0.0443 45 6.5589 - - - -
0.0453 46 7.4456 - - - -
0.0463 47 6.8233 - - - -
0.0472 48 6.7633 - - - -
0.0482 49 6.6024 - - - -
0.0492 50 6.2778 - - - -
0.0502 51 6.1026 - - - -
0.0512 52 6.632 - - - -
0.0522 53 6.6962 - - - -
0.0531 54 5.8514 - - - -
0.0541 55 5.9951 - - - -
0.0551 56 5.4554 - - - -
0.0561 57 6.0147 - - - -
0.0571 58 5.215 - - - -
0.0581 59 6.4525 - - - -
0.0591 60 5.4048 - - - -
0.0600 61 5.0424 - - - -
0.0610 62 6.2646 - - - -
0.0620 63 5.0847 - - - -
0.0630 64 5.4415 - - - -
0.0640 65 5.2469 - - - -
0.0650 66 5.1378 - - - -
0.0659 67 5.1636 - - - -
0.0669 68 5.5596 - - - -
0.0679 69 4.9508 - - - -
0.0689 70 5.2355 - - - -
0.0699 71 4.7359 - - - -
0.0709 72 4.8947 - - - -
0.0719 73 4.6269 - - - -
0.0728 74 4.6072 - - - -
0.0738 75 4.9125 - - - -
0.0748 76 4.5856 - - - -
0.0758 77 4.7879 - - - -
0.0768 78 4.5348 - - - -
0.0778 79 4.3554 - - - -
0.0787 80 4.2984 - - - -
0.0797 81 4.5505 - - - -
0.0807 82 4.5325 - - - -
0.0817 83 4.2725 - - - -
0.0827 84 4.3054 - - - -
0.0837 85 4.5536 - - - -
0.0846 86 4.0265 - - - -
0.0856 87 4.7453 - - - -
0.0866 88 4.071 - - - -
0.0876 89 4.1582 - - - -
0.0886 90 4.1131 - - - -
0.0896 91 3.6582 - - - -
0.0906 92 4.143 - - - -
0.0915 93 4.2273 - - - -
0.0925 94 3.9321 - - - -
0.0935 95 4.2061 - - - -
0.0945 96 4.1042 - - - -
0.0955 97 3.9513 - - - -
0.0965 98 3.8627 - - - -
0.0974 99 4.3613 - - - -
0.0984 100 3.8513 - - - -
0.0994 101 3.5866 - - - -
0.1004 102 3.5239 - - - -
0.1014 103 3.5921 - - - -
0.1024 104 3.5962 - - - -
0.1033 105 4.0001 - - - -
0.1043 106 4.1374 - - - -
0.1053 107 3.9049 - - - -
0.1063 108 3.2511 - - - -
0.1073 109 3.2479 - - - -
0.1083 110 3.6414 - - - -
0.1093 111 3.6429 - - - -
0.1102 112 3.423 - - - -
0.1112 113 3.4967 - - - -
0.1122 114 3.7649 - - - -
0.1132 115 3.2845 - - - -
0.1142 116 3.356 - - - -
0.1152 117 3.2086 - - - -
0.1161 118 3.5561 - - - -
0.1171 119 3.7353 - - - -
0.1181 120 3.403 - - - -
0.1191 121 3.1009 - - - -
0.1201 122 3.2139 - - - -
0.1211 123 3.3339 - - - -
0.1220 124 2.9464 - - - -
0.1230 125 3.3366 - - - -
0.1240 126 3.0618 - - - -
0.125 127 3.0092 - - - -
0.1260 128 2.7152 - - - -
0.1270 129 2.9423 - - - -
0.1280 130 2.6569 - - - -
0.1289 131 2.8469 - - - -
0.1299 132 2.9089 - - - -
0.1309 133 2.5809 - - - -
0.1319 134 2.6987 - - - -
0.1329 135 3.2518 - - - -
0.1339 136 2.9145 - - - -
0.1348 137 2.4809 - - - -
0.1358 138 2.8264 - - - -
0.1368 139 2.5724 - - - -
0.1378 140 2.6949 - - - -
0.1388 141 2.6925 - - - -
0.1398 142 2.9311 - - - -
0.1407 143 2.5667 - - - -
0.1417 144 3.2471 - - - -
0.1427 145 2.2441 - - - -
0.1437 146 2.75 - - - -
0.1447 147 2.9669 - - - -
0.1457 148 2.736 - - - -
0.1467 149 3.104 - - - -
0.1476 150 2.2175 - - - -
0.1486 151 2.7415 - - - -
0.1496 152 1.8707 - - - -
0.1506 153 2.5961 2.2653 0.3116 0.4265 0.6462
0.1516 154 3.1149 - - - -
0.1526 155 2.2976 - - - -
0.1535 156 2.4436 - - - -
0.1545 157 2.8826 - - - -
0.1555 158 2.3664 - - - -
0.1565 159 2.2485 - - - -
0.1575 160 2.5167 - - - -
0.1585 161 1.7183 - - - -
0.1594 162 2.1003 - - - -
0.1604 163 2.5785 - - - -
0.1614 164 2.8789 - - - -
0.1624 165 2.3425 - - - -
0.1634 166 2.0966 - - - -
0.1644 167 2.1126 - - - -
0.1654 168 2.1824 - - - -
0.1663 169 2.2009 - - - -
0.1673 170 2.3796 - - - -
0.1683 171 2.3096 - - - -
0.1693 172 2.7897 - - - -
0.1703 173 2.2097 - - - -
0.1713 174 1.7508 - - - -
0.1722 175 2.353 - - - -
0.1732 176 2.4276 - - - -
0.1742 177 2.1016 - - - -
0.1752 178 1.8461 - - - -
0.1762 179 1.8104 - - - -
0.1772 180 2.6023 - - - -
0.1781 181 2.5261 - - - -
0.1791 182 2.1053 - - - -
0.1801 183 1.9712 - - - -
0.1811 184 2.4693 - - - -
0.1821 185 2.1119 - - - -
0.1831 186 2.4797 - - - -
0.1841 187 2.1587 - - - -
0.1850 188 1.9578 - - - -
0.1860 189 2.1368 - - - -
0.1870 190 2.4212 - - - -
0.1880 191 1.9591 - - - -
0.1890 192 1.5816 - - - -
0.1900 193 1.4029 - - - -
0.1909 194 1.9385 - - - -
0.1919 195 1.5596 - - - -
0.1929 196 1.6663 - - - -
0.1939 197 2.0026 - - - -
0.1949 198 2.0046 - - - -
0.1959 199 1.5016 - - - -
0.1969 200 2.184 - - - -
0.1978 201 2.3442 - - - -
0.1988 202 2.6981 - - - -
0.1998 203 2.5481 - - - -
0.2008 204 2.9798 - - - -
0.2018 205 2.287 - - - -
0.2028 206 1.9393 - - - -
0.2037 207 2.892 - - - -
0.2047 208 2.26 - - - -
0.2057 209 2.5911 - - - -
0.2067 210 2.1239 - - - -
0.2077 211 2.0683 - - - -
0.2087 212 1.768 - - - -
0.2096 213 2.5468 - - - -
0.2106 214 1.8956 - - - -
0.2116 215 2.044 - - - -
0.2126 216 1.5721 - - - -
0.2136 217 1.6278 - - - -
0.2146 218 1.7754 - - - -
0.2156 219 1.8594 - - - -
0.2165 220 1.8309 - - - -
0.2175 221 2.0619 - - - -
0.2185 222 2.3335 - - - -
0.2195 223 2.023 - - - -
0.2205 224 2.1975 - - - -
0.2215 225 1.9228 - - - -
0.2224 226 2.3565 - - - -
0.2234 227 1.896 - - - -
0.2244 228 2.0912 - - - -
0.2254 229 2.7703 - - - -
0.2264 230 1.6988 - - - -
0.2274 231 2.0406 - - - -
0.2283 232 1.9288 - - - -
0.2293 233 2.0457 - - - -
0.2303 234 1.7061 - - - -
0.2313 235 1.6244 - - - -
0.2323 236 2.0241 - - - -
0.2333 237 1.567 - - - -
0.2343 238 1.8084 - - - -
0.2352 239 2.4363 - - - -
0.2362 240 1.7532 - - - -
0.2372 241 2.0797 - - - -
0.2382 242 1.9562 - - - -
0.2392 243 1.6751 - - - -
0.2402 244 2.0265 - - - -
0.2411 245 1.6065 - - - -
0.2421 246 1.7439 - - - -
0.2431 247 2.0237 - - - -
0.2441 248 1.6128 - - - -
0.2451 249 1.6581 - - - -
0.2461 250 2.1538 - - - -
0.2470 251 2.049 - - - -
0.2480 252 1.2573 - - - -
0.2490 253 1.5619 - - - -
0.25 254 1.2611 - - - -
0.2510 255 1.3443 - - - -
0.2520 256 1.3436 - - - -
0.2530 257 2.8117 - - - -
0.2539 258 1.7563 - - - -
0.2549 259 1.3148 - - - -
0.2559 260 2.0278 - - - -
0.2569 261 1.2403 - - - -
0.2579 262 1.588 - - - -
0.2589 263 2.0071 - - - -
0.2598 264 1.5312 - - - -
0.2608 265 1.8641 - - - -
0.2618 266 1.2933 - - - -
0.2628 267 1.6262 - - - -
0.2638 268 1.721 - - - -
0.2648 269 1.4713 - - - -
0.2657 270 1.4625 - - - -
0.2667 271 1.7254 - - - -
0.2677 272 1.5108 - - - -
0.2687 273 2.1126 - - - -
0.2697 274 1.3967 - - - -
0.2707 275 1.7067 - - - -
0.2717 276 1.4847 - - - -
0.2726 277 1.6515 - - - -
0.2736 278 0.9367 - - - -
0.2746 279 2.0267 - - - -
0.2756 280 1.5023 - - - -
0.2766 281 1.1248 - - - -
0.2776 282 1.6224 - - - -
0.2785 283 1.7969 - - - -
0.2795 284 2.2498 - - - -
0.2805 285 1.7477 - - - -
0.2815 286 1.6261 - - - -
0.2825 287 2.0911 - - - -
0.2835 288 1.9519 - - - -
0.2844 289 1.3132 - - - -
0.2854 290 2.3292 - - - -
0.2864 291 1.3781 - - - -
0.2874 292 1.5753 - - - -
0.2884 293 1.4158 - - - -
0.2894 294 2.1661 - - - -
0.2904 295 1.4928 - - - -
0.2913 296 2.2825 - - - -
0.2923 297 1.7261 - - - -
0.2933 298 1.8635 - - - -
0.2943 299 0.974 - - - -
0.2953 300 1.53 - - - -
0.2963 301 1.5985 - - - -
0.2972 302 1.2169 - - - -
0.2982 303 1.771 - - - -
0.2992 304 1.4506 - - - -
0.3002 305 1.9496 - - - -

Framework Versions

  • Python: 3.10.12
  • Sentence Transformers: 3.2.1
  • Transformers: 4.44.2
  • PyTorch: 2.5.0+cu121
  • Accelerate: 0.34.2
  • Datasets: 3.0.2
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

GISTEmbedLoss

@misc{solatorio2024gistembed,
    title={GISTEmbed: Guided In-sample Selection of Training Negatives for Text Embedding Fine-tuning},
    author={Aivin V. Solatorio},
    year={2024},
    eprint={2402.16829},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}