metadata

base_model: microsoft/deberta-v3-small
library_name: sentence-transformers
metrics:
  - pearson_cosine
  - spearman_cosine
  - pearson_manhattan
  - spearman_manhattan
  - pearson_euclidean
  - spearman_euclidean
  - pearson_dot
  - spearman_dot
  - pearson_max
  - spearman_max
  - cosine_accuracy
  - cosine_accuracy_threshold
  - cosine_f1
  - cosine_f1_threshold
  - cosine_precision
  - cosine_recall
  - cosine_ap
  - dot_accuracy
  - dot_accuracy_threshold
  - dot_f1
  - dot_f1_threshold
  - dot_precision
  - dot_recall
  - dot_ap
  - manhattan_accuracy
  - manhattan_accuracy_threshold
  - manhattan_f1
  - manhattan_f1_threshold
  - manhattan_precision
  - manhattan_recall
  - manhattan_ap
  - euclidean_accuracy
  - euclidean_accuracy_threshold
  - euclidean_f1
  - euclidean_f1_threshold
  - euclidean_precision
  - euclidean_recall
  - euclidean_ap
  - max_accuracy
  - max_accuracy_threshold
  - max_f1
  - max_f1_threshold
  - max_precision
  - max_recall
  - max_ap
pipeline_tag: sentence-similarity
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - generated_from_trainer
  - dataset_size:32500
  - loss:GISTEmbedLoss
widget:
  - source_sentence: phase changes do not change
    sentences:
      - >-
        The major Atlantic slave trading nations, ordered by trade volume, were
        the Portuguese, the British, the Spanish, the French, the Dutch, and the
        Danish. Several had established outposts on the African coast where they
        purchased slaves from local African leaders.
      - >-
        phase changes do not change mass. Particles have mass, but mass is
        energy. 
         phase changes do not change  energy
      - >-
        According to the U.S. Census Bureau , the county is a total area of ,
        which has land and ( 0.2 % ) is water .
  - source_sentence: what jobs can you get with a bachelor degree in anthropology?
    sentences:
      - >-
        To determine the atomic weight of an element, you should add up protons
        and neutrons.
      - >-
        ['Paleontologist*', 'Archaeologist*', 'University Professor*', 'Market
        Research Analyst*', 'Primatologist.', 'Forensic Scientist*', 'Medical
        Anthropologist.', 'Museum Technician.']
      - >-
        The wingspan flies , the moth comes depending on the location from July
        to August .
  - source_sentence: Identify different forms of energy (e.g., light, sound, heat).
    sentences:
      - >-
        `` Irreplaceable '' '' remained on the chart for thirty weeks , and was
        certified double-platinum by the Recording Industry Association of
        America ( RIAA ) , denoting sales of two million downloads , and had
        sold over 3,139,000 paid digital downloads in the US as of October 2012
        , according to Nielsen SoundScan . ''
      - >-
        On Rotten Tomatoes , the film has a rating of 63 % , based on 87 reviews
        , with an average rating of 5.9/10 .
      - Heat, light, and sound are all different forms of energy.
  - source_sentence: what is so small it can only be seen with an electron microscope?
    sentences:
      - >-
        Viruses are so small that they can be seen only with an electron
        microscope.. Where most viruses are DNA, HIV is an RNA virus. 
         HIV is so small it can only be seen with an electron microscope
      - >-
        The development of modern lasers has opened many doors to both research
        and applications. A laser beam was used to measure the distance from the
        Earth to the moon. Lasers are important components of CD players. As the
        image above illustrates, lasers can provide precise focusing of beams to
        selectively destroy cancer cells in patients. The ability of a laser to
        focus precisely is due to high-quality crystals that help give rise to
        the laser beam. A variety of techniques are used to manufacture pure
        crystals for use in lasers.
      - >-
        Discussion for (a) This value is the net work done on the package. The
        person actually does more work than this, because friction opposes the
        motion. Friction does negative work and removes some of the energy the
        person expends and converts it to thermal energy. The net work equals
        the sum of the work done by each individual force. Strategy and Concept
        for (b) The forces acting on the package are gravity, the normal force,
        the force of friction, and the applied force. The normal force and force
        of gravity are each perpendicular to the displacement, and therefore do
        no work. Solution for (b) The applied force does work.
  - source_sentence: what aspects of your environment may relate to the epidemic of obesity
    sentences:
      - >-
        Jan Kromkamp ( born August 17 , 1980 in Makkinga , Netherlands ) is a
        Dutch footballer .
      - >-
        When chemicals in solution react, the proper way of writing the chemical
        formulas of the dissolved ionic compounds is in terms of the dissociated
        ions, not the complete ionic formula. A complete ionic equation is a
        chemical equation in which the dissolved ionic compounds are written as
        separated ions. Solubility rules are very useful in determining which
        ionic compounds are dissolved and which are not. For example, when
        NaCl(aq) reacts with AgNO3(aq) in a double-replacement reaction to
        precipitate AgCl(s) and form NaNO3(aq), the complete ionic equation
        includes NaCl, AgNO3, and NaNO3 written as separated ions:.
      - >-
        Genetic changes in human populations occur too slowly to be responsible
        for the obesity epidemic. Nevertheless, the variation in how people
        respond to the environment that promotes physical inactivity and intake
        of high-calorie foods suggests that genes do play a role in the
        development of obesity.
model-index:
  - name: SentenceTransformer based on microsoft/deberta-v3-small
    results:
      - task:
          type: semantic-similarity
          name: Semantic Similarity
        dataset:
          name: sts test
          type: sts-test
        metrics:
          - type: pearson_cosine
            value: 0.3774946012125992
            name: Pearson Cosine
          - type: spearman_cosine
            value: 0.4056589966976888
            name: Spearman Cosine
          - type: pearson_manhattan
            value: 0.3861982631744407
            name: Pearson Manhattan
          - type: spearman_manhattan
            value: 0.4059364545183154
            name: Spearman Manhattan
          - type: pearson_euclidean
            value: 0.38652243004790016
            name: Pearson Euclidean
          - type: spearman_euclidean
            value: 0.4056589966976888
            name: Spearman Euclidean
          - type: pearson_dot
            value: 0.3774648453085433
            name: Pearson Dot
          - type: spearman_dot
            value: 0.40563469676275316
            name: Spearman Dot
          - type: pearson_max
            value: 0.38652243004790016
            name: Pearson Max
          - type: spearman_max
            value: 0.4059364545183154
            name: Spearman Max
      - task:
          type: binary-classification
          name: Binary Classification
        dataset:
          name: allNLI dev
          type: allNLI-dev
        metrics:
          - type: cosine_accuracy
            value: 0.67578125
            name: Cosine Accuracy
          - type: cosine_accuracy_threshold
            value: 0.9427558183670044
            name: Cosine Accuracy Threshold
          - type: cosine_f1
            value: 0.5225225225225225
            name: Cosine F1
          - type: cosine_f1_threshold
            value: 0.8046966791152954
            name: Cosine F1 Threshold
          - type: cosine_precision
            value: 0.3795811518324607
            name: Cosine Precision
          - type: cosine_recall
            value: 0.838150289017341
            name: Cosine Recall
          - type: cosine_ap
            value: 0.4368751759846574
            name: Cosine Ap
          - type: dot_accuracy
            value: 0.67578125
            name: Dot Accuracy
          - type: dot_accuracy_threshold
            value: 724.1080322265625
            name: Dot Accuracy Threshold
          - type: dot_f1
            value: 0.5225225225225225
            name: Dot F1
          - type: dot_f1_threshold
            value: 618.074951171875
            name: Dot F1 Threshold
          - type: dot_precision
            value: 0.3795811518324607
            name: Dot Precision
          - type: dot_recall
            value: 0.838150289017341
            name: Dot Recall
          - type: dot_ap
            value: 0.436842886797982
            name: Dot Ap
          - type: manhattan_accuracy
            value: 0.677734375
            name: Manhattan Accuracy
          - type: manhattan_accuracy_threshold
            value: 223.6764373779297
            name: Manhattan Accuracy Threshold
          - type: manhattan_f1
            value: 0.5239852398523985
            name: Manhattan F1
          - type: manhattan_f1_threshold
            value: 372.31396484375
            name: Manhattan F1 Threshold
          - type: manhattan_precision
            value: 0.38482384823848237
            name: Manhattan Precision
          - type: manhattan_recall
            value: 0.8208092485549133
            name: Manhattan Recall
          - type: manhattan_ap
            value: 0.43892484929307635
            name: Manhattan Ap
          - type: euclidean_accuracy
            value: 0.67578125
            name: Euclidean Accuracy
          - type: euclidean_accuracy_threshold
            value: 9.377331733703613
            name: Euclidean Accuracy Threshold
          - type: euclidean_f1
            value: 0.5225225225225225
            name: Euclidean F1
          - type: euclidean_f1_threshold
            value: 17.321048736572266
            name: Euclidean F1 Threshold
          - type: euclidean_precision
            value: 0.3795811518324607
            name: Euclidean Precision
          - type: euclidean_recall
            value: 0.838150289017341
            name: Euclidean Recall
          - type: euclidean_ap
            value: 0.4368602200677977
            name: Euclidean Ap
          - type: max_accuracy
            value: 0.677734375
            name: Max Accuracy
          - type: max_accuracy_threshold
            value: 724.1080322265625
            name: Max Accuracy Threshold
          - type: max_f1
            value: 0.5239852398523985
            name: Max F1
          - type: max_f1_threshold
            value: 618.074951171875
            name: Max F1 Threshold
          - type: max_precision
            value: 0.38482384823848237
            name: Max Precision
          - type: max_recall
            value: 0.838150289017341
            name: Max Recall
          - type: max_ap
            value: 0.43892484929307635
            name: Max Ap
      - task:
          type: binary-classification
          name: Binary Classification
        dataset:
          name: Qnli dev
          type: Qnli-dev
        metrics:
          - type: cosine_accuracy
            value: 0.646484375
            name: Cosine Accuracy
          - type: cosine_accuracy_threshold
            value: 0.8057259321212769
            name: Cosine Accuracy Threshold
          - type: cosine_f1
            value: 0.6688102893890675
            name: Cosine F1
          - type: cosine_f1_threshold
            value: 0.7187118530273438
            name: Cosine F1 Threshold
          - type: cosine_precision
            value: 0.538860103626943
            name: Cosine Precision
          - type: cosine_recall
            value: 0.8813559322033898
            name: Cosine Recall
          - type: cosine_ap
            value: 0.6720663622193426
            name: Cosine Ap
          - type: dot_accuracy
            value: 0.646484375
            name: Dot Accuracy
          - type: dot_accuracy_threshold
            value: 618.8643798828125
            name: Dot Accuracy Threshold
          - type: dot_f1
            value: 0.6688102893890675
            name: Dot F1
          - type: dot_f1_threshold
            value: 552.0260009765625
            name: Dot F1 Threshold
          - type: dot_precision
            value: 0.538860103626943
            name: Dot Precision
          - type: dot_recall
            value: 0.8813559322033898
            name: Dot Recall
          - type: dot_ap
            value: 0.672083506527328
            name: Dot Ap
          - type: manhattan_accuracy
            value: 0.6484375
            name: Manhattan Accuracy
          - type: manhattan_accuracy_threshold
            value: 386.58905029296875
            name: Manhattan Accuracy Threshold
          - type: manhattan_f1
            value: 0.6645569620253164
            name: Manhattan F1
          - type: manhattan_f1_threshold
            value: 462.609130859375
            name: Manhattan F1 Threshold
          - type: manhattan_precision
            value: 0.5303030303030303
            name: Manhattan Precision
          - type: manhattan_recall
            value: 0.8898305084745762
            name: Manhattan Recall
          - type: manhattan_ap
            value: 0.6724653688821339
            name: Manhattan Ap
          - type: euclidean_accuracy
            value: 0.646484375
            name: Euclidean Accuracy
          - type: euclidean_accuracy_threshold
            value: 17.27533721923828
            name: Euclidean Accuracy Threshold
          - type: euclidean_f1
            value: 0.6688102893890675
            name: Euclidean F1
          - type: euclidean_f1_threshold
            value: 20.787063598632812
            name: Euclidean F1 Threshold
          - type: euclidean_precision
            value: 0.538860103626943
            name: Euclidean Precision
          - type: euclidean_recall
            value: 0.8813559322033898
            name: Euclidean Recall
          - type: euclidean_ap
            value: 0.6720591998758361
            name: Euclidean Ap
          - type: max_accuracy
            value: 0.6484375
            name: Max Accuracy
          - type: max_accuracy_threshold
            value: 618.8643798828125
            name: Max Accuracy Threshold
          - type: max_f1
            value: 0.6688102893890675
            name: Max F1
          - type: max_f1_threshold
            value: 552.0260009765625
            name: Max F1 Threshold
          - type: max_precision
            value: 0.538860103626943
            name: Max Precision
          - type: max_recall
            value: 0.8898305084745762
            name: Max Recall
          - type: max_ap
            value: 0.6724653688821339
            name: Max Ap

SentenceTransformer based on microsoft/deberta-v3-small

This is a sentence-transformers model finetuned from microsoft/deberta-v3-small. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Type: Sentence Transformer
Base model: microsoft/deberta-v3-small
Maximum Sequence Length: 512 tokens
Output Dimensionality: 768 tokens
Similarity Function: Cosine Similarity

Model Sources

Documentation: Sentence Transformers Documentation
Repository: Sentence Transformers on GitHub
Hugging Face: Sentence Transformers on Hugging Face

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: DebertaV2Model 
  (1): AdvancedWeightedPooling(
    (alpha_dropout_layer): Dropout(p=0.05, inplace=False)
    (gate_dropout_layer): Dropout(p=0.0, inplace=False)
    (linear_cls_Qpj): Linear(in_features=768, out_features=768, bias=True)
    (linear_attnOut): Linear(in_features=768, out_features=768, bias=True)
    (mha): MultiheadAttention(
      (out_proj): NonDynamicallyQuantizableLinear(in_features=768, out_features=768, bias=True)
    )
    (layernorm_output): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
    (layernorm_weightedPooing): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
    (layernorm_attnOut): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
  )
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("bobox/DeBERTa3-s-CustomPoolin-toytest4-step1-checkpoints-tmp")
# Run inference
sentences = [
    'what aspects of your environment may relate to the epidemic of obesity',
    'Genetic changes in human populations occur too slowly to be responsible for the obesity epidemic. Nevertheless, the variation in how people respond to the environment that promotes physical inactivity and intake of high-calorie foods suggests that genes do play a role in the development of obesity.',
    'When chemicals in solution react, the proper way of writing the chemical formulas of the dissolved ionic compounds is in terms of the dissociated ions, not the complete ionic formula. A complete ionic equation is a chemical equation in which the dissolved ionic compounds are written as separated ions. Solubility rules are very useful in determining which ionic compounds are dissolved and which are not. For example, when NaCl(aq) reacts with AgNO3(aq) in a double-replacement reaction to precipitate AgCl(s) and form NaNO3(aq), the complete ionic equation includes NaCl, AgNO3, and NaNO3 written as separated ions:.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Semantic Similarity

Dataset: sts-test
Evaluated with EmbeddingSimilarityEvaluator

Metric	Value
pearson_cosine	0.3775
spearman_cosine	0.4057
pearson_manhattan	0.3862
spearman_manhattan	0.4059
pearson_euclidean	0.3865
spearman_euclidean	0.4057
pearson_dot	0.3775
spearman_dot	0.4056
pearson_max	0.3865
spearman_max	0.4059

Binary Classification

Dataset: allNLI-dev
Evaluated with BinaryClassificationEvaluator

Metric	Value
cosine_accuracy	0.6758
cosine_accuracy_threshold	0.9428
cosine_f1	0.5225
cosine_f1_threshold	0.8047
cosine_precision	0.3796
cosine_recall	0.8382
cosine_ap	0.4369
dot_accuracy	0.6758
dot_accuracy_threshold	724.108
dot_f1	0.5225
dot_f1_threshold	618.075
dot_precision	0.3796
dot_recall	0.8382
dot_ap	0.4368
manhattan_accuracy	0.6777
manhattan_accuracy_threshold	223.6764
manhattan_f1	0.524
manhattan_f1_threshold	372.314
manhattan_precision	0.3848
manhattan_recall	0.8208
manhattan_ap	0.4389
euclidean_accuracy	0.6758
euclidean_accuracy_threshold	9.3773
euclidean_f1	0.5225
euclidean_f1_threshold	17.321
euclidean_precision	0.3796
euclidean_recall	0.8382
euclidean_ap	0.4369
max_accuracy	0.6777
max_accuracy_threshold	724.108
max_f1	0.524
max_f1_threshold	618.075
max_precision	0.3848
max_recall	0.8382
max_ap	0.4389

Binary Classification

Dataset: Qnli-dev
Evaluated with BinaryClassificationEvaluator

Metric	Value
cosine_accuracy	0.6465
cosine_accuracy_threshold	0.8057
cosine_f1	0.6688
cosine_f1_threshold	0.7187
cosine_precision	0.5389
cosine_recall	0.8814
cosine_ap	0.6721
dot_accuracy	0.6465
dot_accuracy_threshold	618.8644
dot_f1	0.6688
dot_f1_threshold	552.026
dot_precision	0.5389
dot_recall	0.8814
dot_ap	0.6721
manhattan_accuracy	0.6484
manhattan_accuracy_threshold	386.5891
manhattan_f1	0.6646
manhattan_f1_threshold	462.6091
manhattan_precision	0.5303
manhattan_recall	0.8898
manhattan_ap	0.6725
euclidean_accuracy	0.6465
euclidean_accuracy_threshold	17.2753
euclidean_f1	0.6688
euclidean_f1_threshold	20.7871
euclidean_precision	0.5389
euclidean_recall	0.8814
euclidean_ap	0.6721
max_accuracy	0.6484
max_accuracy_threshold	618.8644
max_f1	0.6688
max_f1_threshold	552.026
max_precision	0.5389
max_recall	0.8898
max_ap	0.6725

Training Details

Training Dataset

Unnamed Dataset

Size: 32,500 training samples
Columns: sentence1 and sentence2
Approximate statistics based on the first 1000 samples:
sentence1 sentence2
type string string
details
min: 4 tokens
mean: 29.39 tokens
max: 323 tokens

min: 2 tokens
mean: 54.42 tokens
max: 423 tokens

	sentence1	sentence2
type	string	string
details	min: 4 tokens mean: 29.39 tokens max: 323 tokens	min: 2 tokens mean: 54.42 tokens max: 423 tokens

Samples:

sentence1	sentence2
`In which London road is Harrod’s department store?`	`Harrods, Brompton Road, London`
`e. in solids the atoms are closely locked in position and can only vibrate, in liquids the atoms and molecules are more loosely connected and can collide with and move past one another, while in gases the atoms or molecules are free to move independently, colliding frequently.`	`Within a substance, atoms that collide frequently and move independently of one another are most likely in a gas`
`Joe Cole was unable to join West Bromwich Albion .`	`On 16th October Joe Cole took a long hard look at himself realising that he would never get the opportunity to join West Bromwich Albion and joined Coventry City instead.`

Loss: GISTEmbedLoss with these parameters:

{'guide': SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
), 'temperature': 0.025}

Training Hyperparameters

Non-Default Hyperparameters

eval_strategy: steps
per_device_train_batch_size: 32
per_device_eval_batch_size: 256
lr_scheduler_type: cosine_with_min_lr
lr_scheduler_kwargs: {'num_cycles': 0.5, 'min_lr': 3.3333333333333337e-06}
warmup_ratio: 0.33
save_safetensors: False
fp16: True
push_to_hub: True
hub_model_id: bobox/DeBERTa3-s-CustomPoolin-toytest4-step1-checkpoints-tmp
hub_strategy: all_checkpoints
batch_sampler: no_duplicates

All Hyperparameters

Click to expand

overwrite_output_dir: False
do_predict: False
eval_strategy: steps
prediction_loss_only: True
per_device_train_batch_size: 32
per_device_eval_batch_size: 256
per_gpu_train_batch_size: None
per_gpu_eval_batch_size: None
gradient_accumulation_steps: 1
eval_accumulation_steps: None
torch_empty_cache_steps: None
learning_rate: 5e-05
weight_decay: 0.0
adam_beta1: 0.9
adam_beta2: 0.999
adam_epsilon: 1e-08
max_grad_norm: 1.0
num_train_epochs: 3
max_steps: -1
lr_scheduler_type: cosine_with_min_lr
lr_scheduler_kwargs: {'num_cycles': 0.5, 'min_lr': 3.3333333333333337e-06}
warmup_ratio: 0.33
warmup_steps: 0
log_level: passive
log_level_replica: warning
log_on_each_node: True
logging_nan_inf_filter: True
save_safetensors: False
save_on_each_node: False
save_only_model: False
restore_callback_states_from_checkpoint: False
no_cuda: False
use_cpu: False
use_mps_device: False
seed: 42
data_seed: None
jit_mode_eval: False
use_ipex: False
bf16: False
fp16: True
fp16_opt_level: O1
half_precision_backend: auto
bf16_full_eval: False
fp16_full_eval: False
tf32: None
local_rank: 0
ddp_backend: None
tpu_num_cores: None
tpu_metrics_debug: False
debug: []
dataloader_drop_last: False
dataloader_num_workers: 0
dataloader_prefetch_factor: None
past_index: -1
disable_tqdm: False
remove_unused_columns: True
label_names: None
load_best_model_at_end: False
ignore_data_skip: False
fsdp: []
fsdp_min_num_params: 0
fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
fsdp_transformer_layer_cls_to_wrap: None
accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
deepspeed: None
label_smoothing_factor: 0.0
optim: adamw_torch
optim_args: None
adafactor: False
group_by_length: False
length_column_name: length
ddp_find_unused_parameters: None
ddp_bucket_cap_mb: None
ddp_broadcast_buffers: False
dataloader_pin_memory: True
dataloader_persistent_workers: False
skip_memory_metrics: True
use_legacy_prediction_loop: False
push_to_hub: True
resume_from_checkpoint: None
hub_model_id: bobox/DeBERTa3-s-CustomPoolin-toytest4-step1-checkpoints-tmp
hub_strategy: all_checkpoints
hub_private_repo: False
hub_always_push: False
gradient_checkpointing: False
gradient_checkpointing_kwargs: None
include_inputs_for_metrics: False
eval_do_concat_batches: True
fp16_backend: auto
push_to_hub_model_id: None
push_to_hub_organization: None
mp_parameters:
auto_find_batch_size: False
full_determinism: False
torchdynamo: None
ray_scope: last
ddp_timeout: 1800
torch_compile: False
torch_compile_backend: None
torch_compile_mode: None
dispatch_batches: None
split_batches: None
include_tokens_per_second: False
include_num_input_tokens_seen: False
neftune_noise_alpha: None
optim_target_modules: None
batch_eval_metrics: False
eval_on_start: False
eval_use_gather_object: False
batch_sampler: no_duplicates
multi_dataset_batch_sampler: proportional

Training Logs

Click to expand

Epoch	Step	Training Loss	sts-test_spearman_cosine	allNLI-dev_max_ap	Qnli-dev_max_ap
0.0010	1	6.0688	-	-	-
0.0020	2	7.5576	-	-	-
0.0030	3	4.6849	-	-	-
0.0039	4	5.4503	-	-	-
0.0049	5	5.6057	-	-	-
0.0059	6	6.3049	-	-	-
0.0069	7	6.8336	-	-	-
0.0079	8	5.0777	-	-	-
0.0089	9	4.8358	-	-	-
0.0098	10	4.641	-	-	-
0.0108	11	4.828	-	-	-
0.0118	12	5.2269	-	-	-
0.0128	13	5.6772	-	-	-
0.0138	14	5.1422	-	-	-
0.0148	15	6.2469	-	-	-
0.0157	16	4.6802	-	-	-
0.0167	17	4.5492	-	-	-
0.0177	18	4.8062	-	-	-
0.0187	19	7.5141	-	-	-
0.0197	20	5.5202	-	-	-
0.0207	21	6.5025	-	-	-
0.0217	22	7.318	-	-	-
0.0226	23	4.6458	-	-	-
0.0236	24	4.6191	-	-	-
0.0246	25	4.3159	-	-	-
0.0256	26	6.3677	-	-	-
0.0266	27	5.6052	-	-	-
0.0276	28	4.196	-	-	-
0.0285	29	4.4802	-	-	-
0.0295	30	4.9193	-	-	-
0.0305	31	4.0996	-	-	-
0.0315	32	5.6307	-	-	-
0.0325	33	4.5745	-	-	-
0.0335	34	4.4514	-	-	-
0.0344	35	4.0617	-	-	-
0.0354	36	5.0298	-	-	-
0.0364	37	3.9815	-	-	-
0.0374	38	4.0871	-	-	-
0.0384	39	4.2378	-	-	-
0.0394	40	3.8226	-	-	-
0.0404	41	4.3519	-	-	-
0.0413	42	3.6345	-	-	-
0.0423	43	5.0829	-	-	-
0.0433	44	4.6701	-	-	-
0.0443	45	4.1371	-	-	-
0.0453	46	4.2418	-	-	-
0.0463	47	4.4766	-	-	-
0.0472	48	4.4797	-	-	-
0.0482	49	3.8471	-	-	-
0.0492	50	4.3194	-	-	-
0.0502	51	3.9426	-	-	-
0.0512	52	3.5333	-	-	-
0.0522	53	4.2426	-	-	-
0.0531	54	3.9816	-	-	-
0.0541	55	3.663	-	-	-
0.0551	56	3.9057	-	-	-
0.0561	57	4.0345	-	-	-
0.0571	58	3.5233	-	-	-
0.0581	59	3.7999	-	-	-
0.0591	60	3.1885	-	-	-
0.0600	61	3.6013	-	-	-
0.0610	62	3.392	-	-	-
0.0620	63	3.3814	-	-	-
0.0630	64	4.0428	-	-	-
0.0640	65	3.7825	-	-	-
0.0650	66	3.4181	-	-	-
0.0659	67	3.7793	-	-	-
0.0669	68	3.8344	-	-	-
0.0679	69	3.2165	-	-	-
0.0689	70	3.3811	-	-	-
0.0699	71	3.5984	-	-	-
0.0709	72	3.8583	-	-	-
0.0719	73	3.296	-	-	-
0.0728	74	2.7661	-	-	-
0.0738	75	2.9805	-	-	-
0.0748	76	2.566	-	-	-
0.0758	77	3.258	-	-	-
0.0768	78	3.3804	-	-	-
0.0778	79	2.8828	-	-	-
0.0787	80	3.1077	-	-	-
0.0797	81	2.9441	-	-	-
0.0807	82	2.9465	-	-	-
0.0817	83	2.7088	-	-	-
0.0827	84	2.9215	-	-	-
0.0837	85	3.4698	-	-	-
0.0846	86	2.2414	-	-	-
0.0856	87	3.1601	-	-	-
0.0866	88	2.7714	-	-	-
0.0876	89	3.0311	-	-	-
0.0886	90	3.0336	-	-	-
0.0896	91	1.9358	-	-	-
0.0906	92	2.6031	-	-	-
0.0915	93	2.7515	-	-	-
0.0925	94	2.8496	-	-	-
0.0935	95	1.8015	-	-	-
0.0945	96	2.8138	-	-	-
0.0955	97	2.0597	-	-	-
0.0965	98	2.1053	-	-	-
0.0974	99	2.6785	-	-	-
0.0984	100	2.588	-	-	-
0.0994	101	2.0099	-	-	-
0.1004	102	2.7947	-	-	-
0.1014	103	2.3274	-	-	-
0.1024	104	2.2545	-	-	-
0.1033	105	2.4575	-	-	-
0.1043	106	2.4413	-	-	-
0.1053	107	2.3185	-	-	-
0.1063	108	2.1577	-	-	-
0.1073	109	2.1278	-	-	-
0.1083	110	2.0967	-	-	-
0.1093	111	2.6142	-	-	-
0.1102	112	1.8553	-	-	-
0.1112	113	2.1523	-	-	-
0.1122	114	2.1726	-	-	-
0.1132	115	1.8564	-	-	-
0.1142	116	1.8413	-	-	-
0.1152	117	2.0441	-	-	-
0.1161	118	2.2159	-	-	-
0.1171	119	2.6779	-	-	-
0.1181	120	2.2976	-	-	-
0.1191	121	1.9407	-	-	-
0.1201	122	1.9019	-	-	-
0.1211	123	2.2149	-	-	-
0.1220	124	1.6823	-	-	-
0.1230	125	1.8402	-	-	-
0.1240	126	1.6914	-	-	-
0.125	127	2.1626	-	-	-
0.1260	128	1.6414	-	-	-
0.1270	129	2.2043	-	-	-
0.1280	130	1.9987	-	-	-
0.1289	131	1.8868	-	-	-
0.1299	132	1.8262	-	-	-
0.1309	133	2.0404	-	-	-
0.1319	134	1.9134	-	-	-
0.1329	135	2.3725	-	-	-
0.1339	136	1.4127	-	-	-
0.1348	137	1.6876	-	-	-
0.1358	138	1.8376	-	-	-
0.1368	139	1.6992	-	-	-
0.1378	140	1.5032	-	-	-
0.1388	141	2.0334	-	-	-
0.1398	142	2.3581	-	-	-
0.1407	143	1.4236	-	-	-
0.1417	144	2.202	-	-	-
0.1427	145	1.7654	-	-	-
0.1437	146	1.5748	-	-	-
0.1447	147	1.7996	-	-	-
0.1457	148	1.7517	-	-	-
0.1467	149	1.8933	-	-	-
0.1476	150	1.2836	-	-	-
0.1486	151	1.7145	-	-	-
0.1496	152	1.6499	-	-	-
0.1506	153	1.8273	0.4057	0.4389	0.6725
0.1516	154	2.2859	-	-	-
0.1526	155	1.0833	-	-	-
0.1535	156	1.6829	-	-	-
0.1545	157	2.1464	-	-	-
0.1555	158	1.745	-	-	-
0.1565	159	1.7319	-	-	-
0.1575	160	1.6968	-	-	-
0.1585	161	1.7401	-	-	-
0.1594	162	1.729	-	-	-
0.1604	163	2.0782	-	-	-
0.1614	164	2.6545	-	-	-
0.1624	165	1.4045	-	-	-
0.1634	166	1.2937	-	-	-
0.1644	167	1.1171	-	-	-
0.1654	168	1.3537	-	-	-
0.1663	169	1.7028	-	-	-
0.1673	170	1.4143	-	-	-
0.1683	171	1.8648	-	-	-
0.1693	172	1.6768	-	-	-
0.1703	173	1.9528	-	-	-
0.1713	174	1.1718	-	-	-
0.1722	175	1.8176	-	-	-
0.1732	176	0.8439	-	-	-
0.1742	177	1.5092	-	-	-
0.1752	178	1.1947	-	-	-
0.1762	179	1.6395	-	-	-
0.1772	180	1.4394	-	-	-
0.1781	181	1.7548	-	-	-
0.1791	182	1.1181	-	-	-
0.1801	183	1.0271	-	-	-
0.1811	184	2.3108	-	-	-
0.1821	185	2.1242	-	-	-
0.1831	186	1.9822	-	-	-
0.1841	187	2.3605	-	-	-
0.1850	188	1.5251	-	-	-
0.1860	189	1.2351	-	-	-
0.1870	190	1.5859	-	-	-
0.1880	191	1.8056	-	-	-
0.1890	192	1.349	-	-	-
0.1900	193	0.893	-	-	-
0.1909	194	1.5122	-	-	-
0.1919	195	1.3875	-	-	-
0.1929	196	1.29	-	-	-
0.1939	197	2.2931	-	-	-
0.1949	198	1.2663	-	-	-
0.1959	199	1.9712	-	-	-
0.1969	200	2.3307	-	-	-
0.1978	201	1.6544	-	-	-
0.1988	202	1.638	-	-	-
0.1998	203	1.3412	-	-	-
0.2008	204	1.4454	-	-	-
0.2018	205	1.5437	-	-	-
0.2028	206	1.4921	-	-	-
0.2037	207	1.4298	-	-	-
0.2047	208	1.6174	-	-	-
0.2057	209	1.4137	-	-	-
0.2067	210	1.5652	-	-	-
0.2077	211	1.1631	-	-	-
0.2087	212	1.2351	-	-	-
0.2096	213	1.7537	-	-	-
0.2106	214	1.3186	-	-	-
0.2116	215	1.2258	-	-	-
0.2126	216	0.7695	-	-	-
0.2136	217	1.2775	-	-	-
0.2146	218	1.6795	-	-	-
0.2156	219	1.2862	-	-	-
0.2165	220	1.1723	-	-	-
0.2175	221	1.3322	-	-	-
0.2185	222	1.7564	-	-	-
0.2195	223	1.1071	-	-	-
0.2205	224	1.2011	-	-	-
0.2215	225	1.2303	-	-	-
0.2224	226	1.212	-	-	-
0.2234	227	1.0117	-	-	-
0.2244	228	1.1907	-	-	-
0.2254	229	2.1293	-	-	-
0.2264	230	1.3063	-	-	-
0.2274	231	1.2841	-	-	-
0.2283	232	1.3778	-	-	-
0.2293	233	1.2242	-	-	-
0.2303	234	0.9227	-	-	-
0.2313	235	1.2221	-	-	-
0.2323	236	2.1041	-	-	-
0.2333	237	1.3341	-	-	-
0.2343	238	1.0876	-	-	-
0.2352	239	1.3328	-	-	-
0.2362	240	1.2958	-	-	-
0.2372	241	1.1522	-	-	-
0.2382	242	1.7942	-	-	-
0.2392	243	1.1325	-	-	-
0.2402	244	1.6466	-	-	-
0.2411	245	1.4608	-	-	-
0.2421	246	0.6375	-	-	-
0.2431	247	2.0177	-	-	-
0.2441	248	1.2069	-	-	-
0.2451	249	0.7639	-	-	-
0.2461	250	1.3465	-	-	-
0.2470	251	1.064	-	-	-
0.2480	252	1.3757	-	-	-
0.2490	253	1.612	-	-	-
0.25	254	0.7917	-	-	-
0.2510	255	1.5515	-	-	-
0.2520	256	0.799	-	-	-
0.2530	257	0.9882	-	-	-
0.2539	258	1.1814	-	-	-
0.2549	259	0.6394	-	-	-
0.2559	260	1.4756	-	-	-
0.2569	261	0.5338	-	-	-
0.2579	262	0.9779	-	-	-
0.2589	263	1.5307	-	-	-
0.2598	264	1.1213	-	-	-
0.2608	265	0.9482	-	-	-
0.2618	266	0.9599	-	-	-
0.2628	267	1.4455	-	-	-
0.2638	268	1.6496	-	-	-
0.2648	269	0.7402	-	-	-
0.2657	270	0.7835	-	-	-
0.2667	271	0.7821	-	-	-
0.2677	272	1.5422	-	-	-
0.2687	273	1.0995	-	-	-
0.2697	274	1.378	-	-	-
0.2707	275	1.3562	-	-	-
0.2717	276	0.7376	-	-	-
0.2726	277	1.1678	-	-	-
0.2736	278	1.2989	-	-	-
0.2746	279	1.9559	-	-	-
0.2756	280	1.1237	-	-	-
0.2766	281	0.952	-	-	-
0.2776	282	1.6629	-	-	-
0.2785	283	1.871	-	-	-
0.2795	284	1.5946	-	-	-
0.2805	285	1.4456	-	-	-
0.2815	286	1.4085	-	-	-
0.2825	287	1.1394	-	-	-
0.2835	288	1.0315	-	-	-
0.2844	289	1.488	-	-	-
0.2854	290	1.4006	-	-	-
0.2864	291	0.9237	-	-	-
0.2874	292	1.163	-	-	-
0.2884	293	1.7037	-	-	-
0.2894	294	0.8715	-	-	-
0.2904	295	1.2101	-	-	-
0.2913	296	1.1179	-	-	-
0.2923	297	1.3986	-	-	-
0.2933	298	1.7068	-	-	-
0.2943	299	0.8695	-	-	-
0.2953	300	1.3778	-	-	-
0.2963	301	1.2834	-	-	-
0.2972	302	0.8123	-	-	-
0.2982	303	1.6521	-	-	-
0.2992	304	1.1064	-	-	-
0.3002	305	0.9578	-	-	-

Framework Versions

Python: 3.10.12
Sentence Transformers: 3.2.1
Transformers: 4.44.2
PyTorch: 2.5.0+cu121
Accelerate: 0.34.2
Datasets: 3.0.2
Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

GISTEmbedLoss

@misc{solatorio2024gistembed,
    title={GISTEmbed: Guided In-sample Selection of Training Negatives for Text Embedding Fine-tuning},
    author={Aivin V. Solatorio},
    year={2024},
    eprint={2402.16829},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}