metadata

base_model: microsoft/deberta-v3-small
library_name: sentence-transformers
metrics:
  - pearson_cosine
  - spearman_cosine
  - pearson_manhattan
  - spearman_manhattan
  - pearson_euclidean
  - spearman_euclidean
  - pearson_dot
  - spearman_dot
  - pearson_max
  - spearman_max
  - cosine_accuracy
  - cosine_accuracy_threshold
  - cosine_f1
  - cosine_f1_threshold
  - cosine_precision
  - cosine_recall
  - cosine_ap
  - dot_accuracy
  - dot_accuracy_threshold
  - dot_f1
  - dot_f1_threshold
  - dot_precision
  - dot_recall
  - dot_ap
  - manhattan_accuracy
  - manhattan_accuracy_threshold
  - manhattan_f1
  - manhattan_f1_threshold
  - manhattan_precision
  - manhattan_recall
  - manhattan_ap
  - euclidean_accuracy
  - euclidean_accuracy_threshold
  - euclidean_f1
  - euclidean_f1_threshold
  - euclidean_precision
  - euclidean_recall
  - euclidean_ap
  - max_accuracy
  - max_accuracy_threshold
  - max_f1
  - max_f1_threshold
  - max_precision
  - max_recall
  - max_ap
pipeline_tag: sentence-similarity
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - generated_from_trainer
  - dataset_size:32500
  - loss:GISTEmbedLoss
widget:
  - source_sentence: Fish hatch into larvae that are different from the adult form of species.
    sentences:
      - Fish hatch into larvae that are different from the adult form of?
      - amphibians hatch from eggs
      - >-
        A solenoid or coil wrapped around iron or certain other metals can form
        a(n) electromagnet.
  - source_sentence: >-
      About 200 countries and territories have reported coronavirus cases in
      2020 .
    sentences:
      - >-
        All-Time Olympic Games Medal Tally Analysis Home > Events > Olympics >
        Summer > Medal Tally > All-Time All-Time Olympic Games Medal Tally
        (Summer Olympics) Which country is the most successful at he Olympic
        Games? Here are the top ranked countries in terms of total medals won
        when all of the summer Games are considered (including the 2016 Rio
        Games). There are two tables presented, the first just lists the top
        countries based on the total medals won, the second table factors in how
        many Olympic Games the country appeared, averaging the total number of
        medals per Olympiad. A victory in a team sport is counted as one medal.
        The USA Has Won the Most Medals The US have clearly won the most gold
        medals and the most medals overall, more than doubling the next ranked
        country (these figures include medals won in Rio 2016). Second placed
        USSR had fewer appearances at the Olympics, and actually won more medals
        on average (see the 2nd table). The top 10 includes one country no
        longer in existence (the Soviet Union), so their medal totals will
        obviously not increase, however China is expected to continue a rapid
        rise up the ranks. With the addition of the 2016 data, China has moved
        up from 11th (in 2008) to 9th (2012) to 7th (2016). The country which
        has attended the most games without a medal is Monaco (20 Olympic
        Games), the country which has won the most medals without winning a gold
        medal is Malaysia (0 gold, 7 silver, 4 bronze). rank
      - >-
        An example of a reproductive behavior is salmon returning to their
        birthplace to lay their eggs
      - >-
        more than 664,000 cases of COVID-19 have been reported in over 190
        countries and territories , resulting in approximately 30,800 deaths .
  - source_sentence: >-
      The wave on a guitar string is transverse. the sound wave rattles a sheet
      of paper in a direction that shows the sound wave is what?
    sentences:
      - A Honda motorcycle parked in a grass driveway
      - >-
        In Panama tipping is a question of rewarding good service rather than an
        obligation. Restaurant bills don't include gratuities; adding 10% is
        customary. Bellhops and maids expect tips only in more expensive hotels,
        and $1–$2 per bag is the norm. You should also give a tip of up to $10
        per day to tour guides.
      - >-
        Figure 16.33 The wave on a guitar string is transverse. The sound wave
        rattles a sheet of paper in a direction that shows the sound wave is
        longitudinal.
  - source_sentence: The thermal production of a stove is generically used for
    sentences:
      - >-
        In total , 28 US victims were killed , while Viet Cong losses were
        killed 345 and a further 192 estimated killed .
      - a stove generates heat for cooking usually
      - >-
        A teenager has been charged over an incident in which a four-year-old
        girl was hurt when she was hit in the face by a brick thrown through a
        van window.
  - source_sentence: can sweet potatoes cause itching?
    sentences:
      - >-
        People with a true potato allergy may react immediately after touching,
        peeling, or eating potatoes. Symptoms may vary from person to person,
        but typical symptoms of a potato allergy include: rhinitis, including
        itchy or stinging eyes, a runny or stuffy nose, and sneezing.
      - riding a bike does not cause pollution
      - >-
        Dilation occurs when cell walls relax.. An aneurysm is a dilation, or
        bubble, that occurs in the wall of an artery. 
         an artery can be relaxed by dilation
model-index:
  - name: SentenceTransformer based on microsoft/deberta-v3-small
    results:
      - task:
          type: semantic-similarity
          name: Semantic Similarity
        dataset:
          name: sts test
          type: sts-test
        metrics:
          - type: pearson_cosine
            value: 0.2749904272806095
            name: Pearson Cosine
          - type: spearman_cosine
            value: 0.31159390381099095
            name: Spearman Cosine
          - type: pearson_manhattan
            value: 0.2923996087310511
            name: Pearson Manhattan
          - type: spearman_manhattan
            value: 0.3095556181083969
            name: Spearman Manhattan
          - type: pearson_euclidean
            value: 0.2934483033082174
            name: Pearson Euclidean
          - type: spearman_euclidean
            value: 0.3115817314678925
            name: Spearman Euclidean
          - type: pearson_dot
            value: 0.27496363262371837
            name: Pearson Dot
          - type: spearman_dot
            value: 0.31138581044552094
            name: Spearman Dot
          - type: pearson_max
            value: 0.2934483033082174
            name: Pearson Max
          - type: spearman_max
            value: 0.31159390381099095
            name: Spearman Max
      - task:
          type: binary-classification
          name: Binary Classification
        dataset:
          name: allNLI dev
          type: allNLI-dev
        metrics:
          - type: cosine_accuracy
            value: 0.67578125
            name: Cosine Accuracy
          - type: cosine_accuracy_threshold
            value: 0.9452645182609558
            name: Cosine Accuracy Threshold
          - type: cosine_f1
            value: 0.512
            name: Cosine F1
          - type: cosine_f1_threshold
            value: 0.8565204739570618
            name: Cosine F1 Threshold
          - type: cosine_precision
            value: 0.39143730886850153
            name: Cosine Precision
          - type: cosine_recall
            value: 0.7398843930635838
            name: Cosine Recall
          - type: cosine_ap
            value: 0.4264736612515921
            name: Cosine Ap
          - type: dot_accuracy
            value: 0.67578125
            name: Dot Accuracy
          - type: dot_accuracy_threshold
            value: 726.30615234375
            name: Dot Accuracy Threshold
          - type: dot_f1
            value: 0.512
            name: Dot F1
          - type: dot_f1_threshold
            value: 658.1103515625
            name: Dot F1 Threshold
          - type: dot_precision
            value: 0.39143730886850153
            name: Dot Precision
          - type: dot_recall
            value: 0.7398843930635838
            name: Dot Recall
          - type: dot_ap
            value: 0.42647535250956575
            name: Dot Ap
          - type: manhattan_accuracy
            value: 0.67578125
            name: Manhattan Accuracy
          - type: manhattan_accuracy_threshold
            value: 201.49061584472656
            name: Manhattan Accuracy Threshold
          - type: manhattan_f1
            value: 0.5107692307692308
            name: Manhattan F1
          - type: manhattan_f1_threshold
            value: 417.52728271484375
            name: Manhattan F1 Threshold
          - type: manhattan_precision
            value: 0.3480083857442348
            name: Manhattan Precision
          - type: manhattan_recall
            value: 0.9595375722543352
            name: Manhattan Recall
          - type: manhattan_ap
            value: 0.4252213828672732
            name: Manhattan Ap
          - type: euclidean_accuracy
            value: 0.67578125
            name: Euclidean Accuracy
          - type: euclidean_accuracy_threshold
            value: 9.171283721923828
            name: Euclidean Accuracy Threshold
          - type: euclidean_f1
            value: 0.512
            name: Euclidean F1
          - type: euclidean_f1_threshold
            value: 14.84876823425293
            name: Euclidean F1 Threshold
          - type: euclidean_precision
            value: 0.39143730886850153
            name: Euclidean Precision
          - type: euclidean_recall
            value: 0.7398843930635838
            name: Euclidean Recall
          - type: euclidean_ap
            value: 0.4264736612515921
            name: Euclidean Ap
          - type: max_accuracy
            value: 0.67578125
            name: Max Accuracy
          - type: max_accuracy_threshold
            value: 726.30615234375
            name: Max Accuracy Threshold
          - type: max_f1
            value: 0.512
            name: Max F1
          - type: max_f1_threshold
            value: 658.1103515625
            name: Max F1 Threshold
          - type: max_precision
            value: 0.39143730886850153
            name: Max Precision
          - type: max_recall
            value: 0.9595375722543352
            name: Max Recall
          - type: max_ap
            value: 0.42647535250956575
            name: Max Ap
      - task:
          type: binary-classification
          name: Binary Classification
        dataset:
          name: Qnli dev
          type: Qnli-dev
        metrics:
          - type: cosine_accuracy
            value: 0.634765625
            name: Cosine Accuracy
          - type: cosine_accuracy_threshold
            value: 0.8508153557777405
            name: Cosine Accuracy Threshold
          - type: cosine_f1
            value: 0.6505636070853462
            name: Cosine F1
          - type: cosine_f1_threshold
            value: 0.7770615816116333
            name: Cosine F1 Threshold
          - type: cosine_precision
            value: 0.5246753246753246
            name: Cosine Precision
          - type: cosine_recall
            value: 0.8559322033898306
            name: Cosine Recall
          - type: cosine_ap
            value: 0.6461335447626624
            name: Cosine Ap
          - type: dot_accuracy
            value: 0.634765625
            name: Dot Accuracy
          - type: dot_accuracy_threshold
            value: 653.7443237304688
            name: Dot Accuracy Threshold
          - type: dot_f1
            value: 0.6505636070853462
            name: Dot F1
          - type: dot_f1_threshold
            value: 597.0731811523438
            name: Dot F1 Threshold
          - type: dot_precision
            value: 0.5246753246753246
            name: Dot Precision
          - type: dot_recall
            value: 0.8559322033898306
            name: Dot Recall
          - type: dot_ap
            value: 0.6461682282377894
            name: Dot Ap
          - type: manhattan_accuracy
            value: 0.6328125
            name: Manhattan Accuracy
          - type: manhattan_accuracy_threshold
            value: 331.46282958984375
            name: Manhattan Accuracy Threshold
          - type: manhattan_f1
            value: 0.6501650165016502
            name: Manhattan F1
          - type: manhattan_f1_threshold
            value: 404.6050109863281
            name: Manhattan F1 Threshold
          - type: manhattan_precision
            value: 0.5324324324324324
            name: Manhattan Precision
          - type: manhattan_recall
            value: 0.8347457627118644
            name: Manhattan Recall
          - type: manhattan_ap
            value: 0.6431949026371255
            name: Manhattan Ap
          - type: euclidean_accuracy
            value: 0.634765625
            name: Euclidean Accuracy
          - type: euclidean_accuracy_threshold
            value: 15.141305923461914
            name: Euclidean Accuracy Threshold
          - type: euclidean_f1
            value: 0.6505636070853462
            name: Euclidean F1
          - type: euclidean_f1_threshold
            value: 18.50943946838379
            name: Euclidean F1 Threshold
          - type: euclidean_precision
            value: 0.5246753246753246
            name: Euclidean Precision
          - type: euclidean_recall
            value: 0.8559322033898306
            name: Euclidean Recall
          - type: euclidean_ap
            value: 0.6461382925406688
            name: Euclidean Ap
          - type: max_accuracy
            value: 0.634765625
            name: Max Accuracy
          - type: max_accuracy_threshold
            value: 653.7443237304688
            name: Max Accuracy Threshold
          - type: max_f1
            value: 0.6505636070853462
            name: Max F1
          - type: max_f1_threshold
            value: 597.0731811523438
            name: Max F1 Threshold
          - type: max_precision
            value: 0.5324324324324324
            name: Max Precision
          - type: max_recall
            value: 0.8559322033898306
            name: Max Recall
          - type: max_ap
            value: 0.6461682282377894
            name: Max Ap

SentenceTransformer based on microsoft/deberta-v3-small

This is a sentence-transformers model finetuned from microsoft/deberta-v3-small. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Type: Sentence Transformer
Base model: microsoft/deberta-v3-small
Maximum Sequence Length: 512 tokens
Output Dimensionality: 768 tokens
Similarity Function: Cosine Similarity

Model Sources

Documentation: Sentence Transformers Documentation
Repository: Sentence Transformers on GitHub
Hugging Face: Sentence Transformers on Hugging Face

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: DebertaV2Model 
  (1): AdvancedWeightedPooling(
    (alpha_dropout_layer): Dropout(p=0.01, inplace=False)
    (gate_dropout_layer): Dropout(p=0.05, inplace=False)
    (linear_cls_pj): Linear(in_features=768, out_features=768, bias=True)
    (linear_cls_Qpj): Linear(in_features=768, out_features=768, bias=True)
    (linear_mean_pj): Linear(in_features=768, out_features=768, bias=True)
    (linear_attnOut): Linear(in_features=768, out_features=768, bias=True)
    (mha): MultiheadAttention(
      (out_proj): NonDynamicallyQuantizableLinear(in_features=768, out_features=768, bias=True)
    )
    (layernorm_output): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
    (layernorm_weightedPooing): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
    (layernorm_pjCls): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
    (layernorm_pjMean): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
    (layernorm_attnOut): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
  )
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("bobox/DeBERTa3-s-CustomPoolin-toytest3-step1-checkpoints-tmp")
# Run inference
sentences = [
    'can sweet potatoes cause itching?',
    'People with a true potato allergy may react immediately after touching, peeling, or eating potatoes. Symptoms may vary from person to person, but typical symptoms of a potato allergy include: rhinitis, including itchy or stinging eyes, a runny or stuffy nose, and sneezing.',
    'riding a bike does not cause pollution',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Semantic Similarity

Dataset: sts-test
Evaluated with EmbeddingSimilarityEvaluator

Metric	Value
pearson_cosine	0.275
spearman_cosine	0.3116
pearson_manhattan	0.2924
spearman_manhattan	0.3096
pearson_euclidean	0.2934
spearman_euclidean	0.3116
pearson_dot	0.275
spearman_dot	0.3114
pearson_max	0.2934
spearman_max	0.3116

Binary Classification

Dataset: allNLI-dev
Evaluated with BinaryClassificationEvaluator

Metric	Value
cosine_accuracy	0.6758
cosine_accuracy_threshold	0.9453
cosine_f1	0.512
cosine_f1_threshold	0.8565
cosine_precision	0.3914
cosine_recall	0.7399
cosine_ap	0.4265
dot_accuracy	0.6758
dot_accuracy_threshold	726.3062
dot_f1	0.512
dot_f1_threshold	658.1104
dot_precision	0.3914
dot_recall	0.7399
dot_ap	0.4265
manhattan_accuracy	0.6758
manhattan_accuracy_threshold	201.4906
manhattan_f1	0.5108
manhattan_f1_threshold	417.5273
manhattan_precision	0.348
manhattan_recall	0.9595
manhattan_ap	0.4252
euclidean_accuracy	0.6758
euclidean_accuracy_threshold	9.1713
euclidean_f1	0.512
euclidean_f1_threshold	14.8488
euclidean_precision	0.3914
euclidean_recall	0.7399
euclidean_ap	0.4265
max_accuracy	0.6758
max_accuracy_threshold	726.3062
max_f1	0.512
max_f1_threshold	658.1104
max_precision	0.3914
max_recall	0.9595
max_ap	0.4265

Binary Classification

Dataset: Qnli-dev
Evaluated with BinaryClassificationEvaluator

Metric	Value
cosine_accuracy	0.6348
cosine_accuracy_threshold	0.8508
cosine_f1	0.6506
cosine_f1_threshold	0.7771
cosine_precision	0.5247
cosine_recall	0.8559
cosine_ap	0.6461
dot_accuracy	0.6348
dot_accuracy_threshold	653.7443
dot_f1	0.6506
dot_f1_threshold	597.0732
dot_precision	0.5247
dot_recall	0.8559
dot_ap	0.6462
manhattan_accuracy	0.6328
manhattan_accuracy_threshold	331.4628
manhattan_f1	0.6502
manhattan_f1_threshold	404.605
manhattan_precision	0.5324
manhattan_recall	0.8347
manhattan_ap	0.6432
euclidean_accuracy	0.6348
euclidean_accuracy_threshold	15.1413
euclidean_f1	0.6506
euclidean_f1_threshold	18.5094
euclidean_precision	0.5247
euclidean_recall	0.8559
euclidean_ap	0.6461
max_accuracy	0.6348
max_accuracy_threshold	653.7443
max_f1	0.6506
max_f1_threshold	597.0732
max_precision	0.5324
max_recall	0.8559
max_ap	0.6462

Training Details

Training Dataset

Unnamed Dataset

Size: 32,500 training samples
Columns: sentence1 and sentence2
Approximate statistics based on the first 1000 samples:
sentence1 sentence2
type string string
details
min: 4 tokens
mean: 29.6 tokens
max: 369 tokens

min: 2 tokens
mean: 58.01 tokens
max: 437 tokens

	sentence1	sentence2
type	string	string
details	min: 4 tokens mean: 29.6 tokens max: 369 tokens	min: 2 tokens mean: 58.01 tokens max: 437 tokens

Samples:

sentence1	sentence2
`The song ‘Fashion for His Love’ by Lady Gaga is a tribute to which late fashion designer?`	Fashion Of His Love by Lady Gaga Songfacts Fashion Of His Love by Lady Gaga Songfacts Songfacts Gaga explained in a tweet that this track from her Born This Way Special Edition album is about the late Alexander McQueen. The fashion designer committed suicide by hanging on February 11, 2010 and Gaga was deeply affected by the tragic death of McQueen, who was a close personal friend. That same month, she performed at the 2010 Brit Awards wearing one of his couture creations and she also paid tribute to her late friend by setting the date on the prison security cameras in her Telephone video as the same day that McQueen's body was discovered in his London home.
`e. in solids the atoms are closely locked in position and can only vibrate, in liquids the atoms and molecules are more loosely connected and can collide with and move past one another, while in gases the atoms or molecules are free to move independently, colliding frequently.`	`Within a substance, atoms that collide frequently and move independently of one another are most likely in a gas`
`Helen Lederer is an English comedian .`	`Helen Lederer ( born 24 September 1954 ) is an English : //www.scotsman.com/news/now-or-never-1-1396369 comedian , writer and actress who emerged as part of the alternative comedy boom at the beginning of the 1980s .`

Loss: GISTEmbedLoss with these parameters:

{'guide': SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
), 'temperature': 0.025}

Evaluation Dataset

Unnamed Dataset

Size: 1,664 evaluation samples
Columns: sentence1 and sentence2
Approximate statistics based on the first 1000 samples:
sentence1 sentence2
type string string
details
min: 4 tokens
mean: 29.01 tokens
max: 367 tokens

min: 2 tokens
mean: 56.14 tokens
max: 389 tokens

	sentence1	sentence2
type	string	string
details	min: 4 tokens mean: 29.01 tokens max: 367 tokens	min: 2 tokens mean: 56.14 tokens max: 389 tokens

Samples:

sentence1	sentence2
`What planet did the voyager 1 spacecraft visit in 1980?`	The Voyager 1 spacecraft visited Saturn in 1980. Voyager 2 followed in 1981. These probes sent back detailed pictures of Saturn, its rings, and some of its moons ( Figure below ). From the Voyager data, we learned what Saturn’s rings are made of. They are particles of water and ice with a little bit of dust. There are several gaps in the rings. These gaps were cleared out by moons within the rings. Gravity attracts dust and gas to the moon from the ring. This leaves a gap in the rings. Other gaps in the rings are caused by the competing forces of Saturn and its moons outside the rings.
`Diffusion Diffusion is a process where atoms or molecules move from areas of high concentration to areas of low concentration.`	`Diffusion is the process in which a substance naturally moves from an area of higher to lower concentration.`
`Who had an 80s No 1 with Don't You Want Me?`	`The Human League - Don't You Want Me - YouTube The Human League - Don't You Want Me Want to watch this again later? Sign in to add this video to a playlist. Need to report the video? Sign in to report inappropriate content. Rating is available when the video has been rented. This feature is not available right now. Please try again later. Uploaded on Feb 27, 2009 Music video by The Human League performing Don't You Want Me (2003 Digital Remaster). Category`

Loss: GISTEmbedLoss with these parameters:

{'guide': SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
), 'temperature': 0.025}

Training Hyperparameters

Non-Default Hyperparameters

eval_strategy: steps
per_device_train_batch_size: 32
per_device_eval_batch_size: 256
lr_scheduler_type: cosine_with_min_lr
lr_scheduler_kwargs: {'num_cycles': 0.5, 'min_lr': 3.3333333333333337e-06}
warmup_ratio: 0.33
save_safetensors: False
fp16: True
push_to_hub: True
hub_model_id: bobox/DeBERTa3-s-CustomPoolin-toytest3-step1-checkpoints-tmp
hub_strategy: all_checkpoints
batch_sampler: no_duplicates

All Hyperparameters

Click to expand

overwrite_output_dir: False
do_predict: False
eval_strategy: steps
prediction_loss_only: True
per_device_train_batch_size: 32
per_device_eval_batch_size: 256
per_gpu_train_batch_size: None
per_gpu_eval_batch_size: None
gradient_accumulation_steps: 1
eval_accumulation_steps: None
torch_empty_cache_steps: None
learning_rate: 5e-05
weight_decay: 0.0
adam_beta1: 0.9
adam_beta2: 0.999
adam_epsilon: 1e-08
max_grad_norm: 1.0
num_train_epochs: 3
max_steps: -1
lr_scheduler_type: cosine_with_min_lr
lr_scheduler_kwargs: {'num_cycles': 0.5, 'min_lr': 3.3333333333333337e-06}
warmup_ratio: 0.33
warmup_steps: 0
log_level: passive
log_level_replica: warning
log_on_each_node: True
logging_nan_inf_filter: True
save_safetensors: False
save_on_each_node: False
save_only_model: False
restore_callback_states_from_checkpoint: False
no_cuda: False
use_cpu: False
use_mps_device: False
seed: 42
data_seed: None
jit_mode_eval: False
use_ipex: False
bf16: False
fp16: True
fp16_opt_level: O1
half_precision_backend: auto
bf16_full_eval: False
fp16_full_eval: False
tf32: None
local_rank: 0
ddp_backend: None
tpu_num_cores: None
tpu_metrics_debug: False
debug: []
dataloader_drop_last: False
dataloader_num_workers: 0
dataloader_prefetch_factor: None
past_index: -1
disable_tqdm: False
remove_unused_columns: True
label_names: None
load_best_model_at_end: False
ignore_data_skip: False
fsdp: []
fsdp_min_num_params: 0
fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
fsdp_transformer_layer_cls_to_wrap: None
accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
deepspeed: None
label_smoothing_factor: 0.0
optim: adamw_torch
optim_args: None
adafactor: False
group_by_length: False
length_column_name: length
ddp_find_unused_parameters: None
ddp_bucket_cap_mb: None
ddp_broadcast_buffers: False
dataloader_pin_memory: True
dataloader_persistent_workers: False
skip_memory_metrics: True
use_legacy_prediction_loop: False
push_to_hub: True
resume_from_checkpoint: None
hub_model_id: bobox/DeBERTa3-s-CustomPoolin-toytest3-step1-checkpoints-tmp
hub_strategy: all_checkpoints
hub_private_repo: False
hub_always_push: False
gradient_checkpointing: False
gradient_checkpointing_kwargs: None
include_inputs_for_metrics: False
eval_do_concat_batches: True
fp16_backend: auto
push_to_hub_model_id: None
push_to_hub_organization: None
mp_parameters:
auto_find_batch_size: False
full_determinism: False
torchdynamo: None
ray_scope: last
ddp_timeout: 1800
torch_compile: False
torch_compile_backend: None
torch_compile_mode: None
dispatch_batches: None
split_batches: None
include_tokens_per_second: False
include_num_input_tokens_seen: False
neftune_noise_alpha: None
optim_target_modules: None
batch_eval_metrics: False
eval_on_start: False
eval_use_gather_object: False
batch_sampler: no_duplicates
multi_dataset_batch_sampler: proportional

Training Logs

Click to expand

Epoch	Step	Training Loss	Validation Loss	sts-test_spearman_cosine	allNLI-dev_max_ap	Qnli-dev_max_ap
0.0010	1	10.4072	-	-	-	-
0.0020	2	11.0865	-	-	-	-
0.0030	3	9.5114	-	-	-	-
0.0039	4	9.9584	-	-	-	-
0.0049	5	10.068	-	-	-	-
0.0059	6	11.0224	-	-	-	-
0.0069	7	9.7703	-	-	-	-
0.0079	8	10.5005	-	-	-	-
0.0089	9	10.1987	-	-	-	-
0.0098	10	10.0277	-	-	-	-
0.0108	11	10.6965	-	-	-	-
0.0118	12	10.0609	-	-	-	-
0.0128	13	11.6214	-	-	-	-
0.0138	14	9.4053	-	-	-	-
0.0148	15	10.4014	-	-	-	-
0.0157	16	10.4119	-	-	-	-
0.0167	17	9.4658	-	-	-	-
0.0177	18	9.2169	-	-	-	-
0.0187	19	11.2337	-	-	-	-
0.0197	20	11.0572	-	-	-	-
0.0207	21	11.0452	-	-	-	-
0.0217	22	10.31	-	-	-	-
0.0226	23	9.1395	-	-	-	-
0.0236	24	8.4201	-	-	-	-
0.0246	25	8.6036	-	-	-	-
0.0256	26	11.7579	-	-	-	-
0.0266	27	10.1307	-	-	-	-
0.0276	28	9.2915	-	-	-	-
0.0285	29	9.0208	-	-	-	-
0.0295	30	8.6867	-	-	-	-
0.0305	31	8.0925	-	-	-	-
0.0315	32	8.6617	-	-	-	-
0.0325	33	8.3374	-	-	-	-
0.0335	34	7.8566	-	-	-	-
0.0344	35	9.0698	-	-	-	-
0.0354	36	7.7727	-	-	-	-
0.0364	37	7.6128	-	-	-	-
0.0374	38	7.8762	-	-	-	-
0.0384	39	7.5191	-	-	-	-
0.0394	40	7.5638	-	-	-	-
0.0404	41	7.1878	-	-	-	-
0.0413	42	6.8878	-	-	-	-
0.0423	43	7.5775	-	-	-	-
0.0433	44	7.1076	-	-	-	-
0.0443	45	6.5589	-	-	-	-
0.0453	46	7.4456	-	-	-	-
0.0463	47	6.8233	-	-	-	-
0.0472	48	6.7633	-	-	-	-
0.0482	49	6.6024	-	-	-	-
0.0492	50	6.2778	-	-	-	-
0.0502	51	6.1026	-	-	-	-
0.0512	52	6.632	-	-	-	-
0.0522	53	6.6962	-	-	-	-
0.0531	54	5.8514	-	-	-	-
0.0541	55	5.9951	-	-	-	-
0.0551	56	5.4554	-	-	-	-
0.0561	57	6.0147	-	-	-	-
0.0571	58	5.215	-	-	-	-
0.0581	59	6.4525	-	-	-	-
0.0591	60	5.4048	-	-	-	-
0.0600	61	5.0424	-	-	-	-
0.0610	62	6.2646	-	-	-	-
0.0620	63	5.0847	-	-	-	-
0.0630	64	5.4415	-	-	-	-
0.0640	65	5.2469	-	-	-	-
0.0650	66	5.1378	-	-	-	-
0.0659	67	5.1636	-	-	-	-
0.0669	68	5.5596	-	-	-	-
0.0679	69	4.9508	-	-	-	-
0.0689	70	5.2355	-	-	-	-
0.0699	71	4.7359	-	-	-	-
0.0709	72	4.8947	-	-	-	-
0.0719	73	4.6269	-	-	-	-
0.0728	74	4.6072	-	-	-	-
0.0738	75	4.9125	-	-	-	-
0.0748	76	4.5856	-	-	-	-
0.0758	77	4.7879	-	-	-	-
0.0768	78	4.5348	-	-	-	-
0.0778	79	4.3554	-	-	-	-
0.0787	80	4.2984	-	-	-	-
0.0797	81	4.5505	-	-	-	-
0.0807	82	4.5325	-	-	-	-
0.0817	83	4.2725	-	-	-	-
0.0827	84	4.3054	-	-	-	-
0.0837	85	4.5536	-	-	-	-
0.0846	86	4.0265	-	-	-	-
0.0856	87	4.7453	-	-	-	-
0.0866	88	4.071	-	-	-	-
0.0876	89	4.1582	-	-	-	-
0.0886	90	4.1131	-	-	-	-
0.0896	91	3.6582	-	-	-	-
0.0906	92	4.143	-	-	-	-
0.0915	93	4.2273	-	-	-	-
0.0925	94	3.9321	-	-	-	-
0.0935	95	4.2061	-	-	-	-
0.0945	96	4.1042	-	-	-	-
0.0955	97	3.9513	-	-	-	-
0.0965	98	3.8627	-	-	-	-
0.0974	99	4.3613	-	-	-	-
0.0984	100	3.8513	-	-	-	-
0.0994	101	3.5866	-	-	-	-
0.1004	102	3.5239	-	-	-	-
0.1014	103	3.5921	-	-	-	-
0.1024	104	3.5962	-	-	-	-
0.1033	105	4.0001	-	-	-	-
0.1043	106	4.1374	-	-	-	-
0.1053	107	3.9049	-	-	-	-
0.1063	108	3.2511	-	-	-	-
0.1073	109	3.2479	-	-	-	-
0.1083	110	3.6414	-	-	-	-
0.1093	111	3.6429	-	-	-	-
0.1102	112	3.423	-	-	-	-
0.1112	113	3.4967	-	-	-	-
0.1122	114	3.7649	-	-	-	-
0.1132	115	3.2845	-	-	-	-
0.1142	116	3.356	-	-	-	-
0.1152	117	3.2086	-	-	-	-
0.1161	118	3.5561	-	-	-	-
0.1171	119	3.7353	-	-	-	-
0.1181	120	3.403	-	-	-	-
0.1191	121	3.1009	-	-	-	-
0.1201	122	3.2139	-	-	-	-
0.1211	123	3.3339	-	-	-	-
0.1220	124	2.9464	-	-	-	-
0.1230	125	3.3366	-	-	-	-
0.1240	126	3.0618	-	-	-	-
0.125	127	3.0092	-	-	-	-
0.1260	128	2.7152	-	-	-	-
0.1270	129	2.9423	-	-	-	-
0.1280	130	2.6569	-	-	-	-
0.1289	131	2.8469	-	-	-	-
0.1299	132	2.9089	-	-	-	-
0.1309	133	2.5809	-	-	-	-
0.1319	134	2.6987	-	-	-	-
0.1329	135	3.2518	-	-	-	-
0.1339	136	2.9145	-	-	-	-
0.1348	137	2.4809	-	-	-	-
0.1358	138	2.8264	-	-	-	-
0.1368	139	2.5724	-	-	-	-
0.1378	140	2.6949	-	-	-	-
0.1388	141	2.6925	-	-	-	-
0.1398	142	2.9311	-	-	-	-
0.1407	143	2.5667	-	-	-	-
0.1417	144	3.2471	-	-	-	-
0.1427	145	2.2441	-	-	-	-
0.1437	146	2.75	-	-	-	-
0.1447	147	2.9669	-	-	-	-
0.1457	148	2.736	-	-	-	-
0.1467	149	3.104	-	-	-	-
0.1476	150	2.2175	-	-	-	-
0.1486	151	2.7415	-	-	-	-
0.1496	152	1.8707	-	-	-	-
0.1506	153	2.5961	2.2653	0.3116	0.4265	0.6462
0.1516	154	3.1149	-	-	-	-
0.1526	155	2.2976	-	-	-	-
0.1535	156	2.4436	-	-	-	-
0.1545	157	2.8826	-	-	-	-
0.1555	158	2.3664	-	-	-	-
0.1565	159	2.2485	-	-	-	-
0.1575	160	2.5167	-	-	-	-
0.1585	161	1.7183	-	-	-	-
0.1594	162	2.1003	-	-	-	-
0.1604	163	2.5785	-	-	-	-
0.1614	164	2.8789	-	-	-	-
0.1624	165	2.3425	-	-	-	-
0.1634	166	2.0966	-	-	-	-
0.1644	167	2.1126	-	-	-	-
0.1654	168	2.1824	-	-	-	-
0.1663	169	2.2009	-	-	-	-
0.1673	170	2.3796	-	-	-	-
0.1683	171	2.3096	-	-	-	-
0.1693	172	2.7897	-	-	-	-
0.1703	173	2.2097	-	-	-	-
0.1713	174	1.7508	-	-	-	-
0.1722	175	2.353	-	-	-	-
0.1732	176	2.4276	-	-	-	-
0.1742	177	2.1016	-	-	-	-
0.1752	178	1.8461	-	-	-	-
0.1762	179	1.8104	-	-	-	-
0.1772	180	2.6023	-	-	-	-
0.1781	181	2.5261	-	-	-	-
0.1791	182	2.1053	-	-	-	-
0.1801	183	1.9712	-	-	-	-
0.1811	184	2.4693	-	-	-	-
0.1821	185	2.1119	-	-	-	-
0.1831	186	2.4797	-	-	-	-
0.1841	187	2.1587	-	-	-	-
0.1850	188	1.9578	-	-	-	-
0.1860	189	2.1368	-	-	-	-
0.1870	190	2.4212	-	-	-	-
0.1880	191	1.9591	-	-	-	-
0.1890	192	1.5816	-	-	-	-
0.1900	193	1.4029	-	-	-	-
0.1909	194	1.9385	-	-	-	-
0.1919	195	1.5596	-	-	-	-
0.1929	196	1.6663	-	-	-	-
0.1939	197	2.0026	-	-	-	-
0.1949	198	2.0046	-	-	-	-
0.1959	199	1.5016	-	-	-	-
0.1969	200	2.184	-	-	-	-
0.1978	201	2.3442	-	-	-	-
0.1988	202	2.6981	-	-	-	-
0.1998	203	2.5481	-	-	-	-
0.2008	204	2.9798	-	-	-	-
0.2018	205	2.287	-	-	-	-
0.2028	206	1.9393	-	-	-	-
0.2037	207	2.892	-	-	-	-
0.2047	208	2.26	-	-	-	-
0.2057	209	2.5911	-	-	-	-
0.2067	210	2.1239	-	-	-	-
0.2077	211	2.0683	-	-	-	-
0.2087	212	1.768	-	-	-	-
0.2096	213	2.5468	-	-	-	-
0.2106	214	1.8956	-	-	-	-
0.2116	215	2.044	-	-	-	-
0.2126	216	1.5721	-	-	-	-
0.2136	217	1.6278	-	-	-	-
0.2146	218	1.7754	-	-	-	-
0.2156	219	1.8594	-	-	-	-
0.2165	220	1.8309	-	-	-	-
0.2175	221	2.0619	-	-	-	-
0.2185	222	2.3335	-	-	-	-
0.2195	223	2.023	-	-	-	-
0.2205	224	2.1975	-	-	-	-
0.2215	225	1.9228	-	-	-	-
0.2224	226	2.3565	-	-	-	-
0.2234	227	1.896	-	-	-	-
0.2244	228	2.0912	-	-	-	-
0.2254	229	2.7703	-	-	-	-
0.2264	230	1.6988	-	-	-	-
0.2274	231	2.0406	-	-	-	-
0.2283	232	1.9288	-	-	-	-
0.2293	233	2.0457	-	-	-	-
0.2303	234	1.7061	-	-	-	-
0.2313	235	1.6244	-	-	-	-
0.2323	236	2.0241	-	-	-	-
0.2333	237	1.567	-	-	-	-
0.2343	238	1.8084	-	-	-	-
0.2352	239	2.4363	-	-	-	-
0.2362	240	1.7532	-	-	-	-
0.2372	241	2.0797	-	-	-	-
0.2382	242	1.9562	-	-	-	-
0.2392	243	1.6751	-	-	-	-
0.2402	244	2.0265	-	-	-	-
0.2411	245	1.6065	-	-	-	-
0.2421	246	1.7439	-	-	-	-
0.2431	247	2.0237	-	-	-	-
0.2441	248	1.6128	-	-	-	-
0.2451	249	1.6581	-	-	-	-
0.2461	250	2.1538	-	-	-	-
0.2470	251	2.049	-	-	-	-
0.2480	252	1.2573	-	-	-	-
0.2490	253	1.5619	-	-	-	-
0.25	254	1.2611	-	-	-	-
0.2510	255	1.3443	-	-	-	-
0.2520	256	1.3436	-	-	-	-
0.2530	257	2.8117	-	-	-	-
0.2539	258	1.7563	-	-	-	-
0.2549	259	1.3148	-	-	-	-
0.2559	260	2.0278	-	-	-	-
0.2569	261	1.2403	-	-	-	-
0.2579	262	1.588	-	-	-	-
0.2589	263	2.0071	-	-	-	-
0.2598	264	1.5312	-	-	-	-
0.2608	265	1.8641	-	-	-	-
0.2618	266	1.2933	-	-	-	-
0.2628	267	1.6262	-	-	-	-
0.2638	268	1.721	-	-	-	-
0.2648	269	1.4713	-	-	-	-
0.2657	270	1.4625	-	-	-	-
0.2667	271	1.7254	-	-	-	-
0.2677	272	1.5108	-	-	-	-
0.2687	273	2.1126	-	-	-	-
0.2697	274	1.3967	-	-	-	-
0.2707	275	1.7067	-	-	-	-
0.2717	276	1.4847	-	-	-	-
0.2726	277	1.6515	-	-	-	-
0.2736	278	0.9367	-	-	-	-
0.2746	279	2.0267	-	-	-	-
0.2756	280	1.5023	-	-	-	-
0.2766	281	1.1248	-	-	-	-
0.2776	282	1.6224	-	-	-	-
0.2785	283	1.7969	-	-	-	-
0.2795	284	2.2498	-	-	-	-
0.2805	285	1.7477	-	-	-	-
0.2815	286	1.6261	-	-	-	-
0.2825	287	2.0911	-	-	-	-
0.2835	288	1.9519	-	-	-	-
0.2844	289	1.3132	-	-	-	-
0.2854	290	2.3292	-	-	-	-
0.2864	291	1.3781	-	-	-	-
0.2874	292	1.5753	-	-	-	-
0.2884	293	1.4158	-	-	-	-
0.2894	294	2.1661	-	-	-	-
0.2904	295	1.4928	-	-	-	-
0.2913	296	2.2825	-	-	-	-
0.2923	297	1.7261	-	-	-	-
0.2933	298	1.8635	-	-	-	-
0.2943	299	0.974	-	-	-	-
0.2953	300	1.53	-	-	-	-
0.2963	301	1.5985	-	-	-	-
0.2972	302	1.2169	-	-	-	-
0.2982	303	1.771	-	-	-	-
0.2992	304	1.4506	-	-	-	-
0.3002	305	1.9496	-	-	-	-

Framework Versions

Python: 3.10.12
Sentence Transformers: 3.2.1
Transformers: 4.44.2
PyTorch: 2.5.0+cu121
Accelerate: 0.34.2
Datasets: 3.0.2
Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

GISTEmbedLoss

@misc{solatorio2024gistembed,
    title={GISTEmbed: Guided In-sample Selection of Training Negatives for Text Embedding Fine-tuning},
    author={Aivin V. Solatorio},
    year={2024},
    eprint={2402.16829},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}