bobox's picture
Training in progress, step 305, checkpoint
472b81e verified
metadata
base_model: microsoft/deberta-v3-small
library_name: sentence-transformers
metrics:
  - pearson_cosine
  - spearman_cosine
  - pearson_manhattan
  - spearman_manhattan
  - pearson_euclidean
  - spearman_euclidean
  - pearson_dot
  - spearman_dot
  - pearson_max
  - spearman_max
  - cosine_accuracy
  - cosine_accuracy_threshold
  - cosine_f1
  - cosine_f1_threshold
  - cosine_precision
  - cosine_recall
  - cosine_ap
  - dot_accuracy
  - dot_accuracy_threshold
  - dot_f1
  - dot_f1_threshold
  - dot_precision
  - dot_recall
  - dot_ap
  - manhattan_accuracy
  - manhattan_accuracy_threshold
  - manhattan_f1
  - manhattan_f1_threshold
  - manhattan_precision
  - manhattan_recall
  - manhattan_ap
  - euclidean_accuracy
  - euclidean_accuracy_threshold
  - euclidean_f1
  - euclidean_f1_threshold
  - euclidean_precision
  - euclidean_recall
  - euclidean_ap
  - max_accuracy
  - max_accuracy_threshold
  - max_f1
  - max_f1_threshold
  - max_precision
  - max_recall
  - max_ap
pipeline_tag: sentence-similarity
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - generated_from_trainer
  - dataset_size:32500
  - loss:GISTEmbedLoss
widget:
  - source_sentence: A picture of a white gas range with figurines above.
    sentences:
      - A nerdy woman brushing her teeth with a friend nearby.
      - a white stove turned off with a digital clock
      - >-
        The plasma membrane also contains other molecules, primarily other
        lipids and proteins. The green molecules in Figure above , for example,
        are the lipid cholesterol. Molecules of cholesterol help the plasma
        membrane keep its shape. Many of the proteins in the plasma membrane
        assist other substances in crossing the membrane.
  - source_sentence: who makes the kentucky derby garland of roses
    sentences:
      - >-
        Accrington strengthened their position in the play-off places with a
        hard-fought win over struggling Dagenham.
      - >-
        tidal energy can be used to produce electricity. Ocean thermal is energy
        derived from waves and also from tidal waves. 
         Ocean thermal energy can be used to produce electricity.
      - >-
        Kentucky Derby Trophy The Kroger Company has been the official florist
        of the Kentucky Derby since 1987. After taking over the duties from the
        Kingsley Walker florist, Kroger began constructing the prestigious
        garland in one of its local stores for the public to view on Derby Eve.
        The preservation of the garland and crowds of spectators watching its
        construction are a testament to the prestige and mystique of the Garland
        of Roses.
  - source_sentence: what is the difference between a general sense and a special sense?
    sentences:
      - >-
        Ian Curtis ( of Touching from a distance) Ian Kevin Curtis was an
        English musician and singer-songwriter. He is best known as the lead
        singer and lyricist of the post-punk band Joy Division. Joy Division
        released its debut album, Unknown Pleasures, in 1979 and recorded its
        follow-up, Closer, in 1980. Curtis, who suffered from epilepsy and
        depression, committed suicide on 18 May 1980, on the eve of Joy
        Division's first North American tour, resulting in the band's
        dissolution and the subsequent formation of New Order. Curtis was known
        for his baritone voice, dance style, and songwriting filled with imagery
        of desolation, emptiness and alienation. In 1995, Curtis's widow Deborah
        published Touching from a Distance: Ian Curtis and Joy Division, a
        biography of the singer. His life and death Ian Kevin Curtis was an
        English musician and singer-songwriter. He is best known as the lead
        singer and lyricist of the post-punk band Joy Division. Joy Division
        released its debut album, Unknown Pleasures, in 1979 and recorded its
        follow-up, Closer, in 1980. Curtis, who suffered from epilepsy and
        depression, committed suicide on 18 May 1980, on the eve of Joy
        Division's first North American tour, resulting in the band's
        dissolution and the subsequent formation of New Order. Curtis was known
        for his baritone voice, dance style, and songwriting filled with imagery
        of desolation, emptiness and alienation. In 1995, Curtis's widow Deborah
        published Touching from a Distance: Ian Curtis and Joy Division, a
        biography of the singer. His life and death have been dramatised in the
        films 24 Hour Party People (2002) and Control (2007). ...more
      - >-
        The human body has two basic types of senses, called special senses and
        general senses. Special senses have specialized sense organs that gather
        sensory information and change it into nerve impulses. ... General
        senses, in contrast, are all associated with the sense of touch. They
        lack special sense organs.
      - >-
        Captain Hook Barrie states in the novel that "Hook was not his true
        name. To reveal who he really was would even at this date set the
        country in a blaze", and relates that Peter Pan began their rivalry by
        feeding the pirate's hand to the crocodile. He is said to be
        "Blackbeard's bo'sun" and "the only man of whom Barbecue was afraid".[5]
        (In Robert Louis Stevenson's Treasure Island, one of the names Long John
        Silver goes by is Barbecue.)[6]
  - source_sentence: >-
      Retzius was born in Stockholm , son of the anatomist Anders Jahan Retzius
      ( and grandson of the naturalist and chemist Anders Retzius ) .
    sentences:
      - >-
        Retzius was born in Stockholm , the son of anatomist Anders Jahan
        Retzius ( and grandson of the naturalist and chemist Anders Retzius ) .
      - >-
        As of 14 March , over 156,000 cases of COVID-19 have been reported in
        around 140 countries and territories ; more than 5,800 people have died
        from the disease and around 75,000 have recovered .
      - A person sitting on a stool on the street.
  - source_sentence: who was the first person who made the violin
    sentences:
      - >-
        Alice in Chains Alice in Chains is an American rock band from Seattle,
        Washington, formed in 1987 by guitarist and vocalist Jerry Cantrell and
        drummer Sean Kinney,[1] who recruited bassist Mike Starr[1] and lead
        vocalist Layne Staley.[1][2][3] Starr was replaced by Mike Inez in
        1993.[4] After Staley's death in 2002, William DuVall joined in 2006 as
        co-lead vocalist and rhythm guitarist. The band took its name from
        Staley's previous group, the glam metal band Alice N' Chains.[5][2]
      - as distance from an object decreases , that object will appear larger
      - >-
        Violin The first makers of violins probably borrowed from various
        developments of the Byzantine lira. These included the rebec;[13] the
        Arabic rebab; the vielle (also known as the fidel or viuola); and the
        lira da braccio[11][14] The violin in its present form emerged in early
        16th-century northern Italy. The earliest pictures of violins, albeit
        with three strings, are seen in northern Italy around 1530, at around
        the same time as the words "violino" and "vyollon" are seen in Italian
        and French documents. One of the earliest explicit descriptions of the
        instrument, including its tuning, is from the Epitome musical by Jambe
        de Fer, published in Lyon in 1556.[15] By this time, the violin had
        already begun to spread throughout Europe.
model-index:
  - name: SentenceTransformer based on microsoft/deberta-v3-small
    results:
      - task:
          type: semantic-similarity
          name: Semantic Similarity
        dataset:
          name: sts test
          type: sts-test
        metrics:
          - type: pearson_cosine
            value: 0.1561600438268545
            name: Pearson Cosine
          - type: spearman_cosine
            value: 0.22356441354815124
            name: Spearman Cosine
          - type: pearson_manhattan
            value: 0.2216924674035587
            name: Pearson Manhattan
          - type: spearman_manhattan
            value: 0.24997065610359018
            name: Spearman Manhattan
          - type: pearson_euclidean
            value: 0.1908690981304929
            name: Pearson Euclidean
          - type: spearman_euclidean
            value: 0.22363767136304896
            name: Spearman Euclidean
          - type: pearson_dot
            value: 0.15588248423807516
            name: Pearson Dot
          - type: spearman_dot
            value: 0.22337189362164545
            name: Spearman Dot
          - type: pearson_max
            value: 0.2216924674035587
            name: Pearson Max
          - type: spearman_max
            value: 0.24997065610359018
            name: Spearman Max
      - task:
          type: binary-classification
          name: Binary Classification
        dataset:
          name: allNLI dev
          type: allNLI-dev
        metrics:
          - type: cosine_accuracy
            value: 0.666015625
            name: Cosine Accuracy
          - type: cosine_accuracy_threshold
            value: 0.9797871112823486
            name: Cosine Accuracy Threshold
          - type: cosine_f1
            value: 0.504258943781942
            name: Cosine F1
          - type: cosine_f1_threshold
            value: 0.8929213285446167
            name: Cosine F1 Threshold
          - type: cosine_precision
            value: 0.357487922705314
            name: Cosine Precision
          - type: cosine_recall
            value: 0.8554913294797688
            name: Cosine Recall
          - type: cosine_ap
            value: 0.4008449937025217
            name: Cosine Ap
          - type: dot_accuracy
            value: 0.666015625
            name: Dot Accuracy
          - type: dot_accuracy_threshold
            value: 752.6634521484375
            name: Dot Accuracy Threshold
          - type: dot_f1
            value: 0.504258943781942
            name: Dot F1
          - type: dot_f1_threshold
            value: 685.9220581054688
            name: Dot F1 Threshold
          - type: dot_precision
            value: 0.357487922705314
            name: Dot Precision
          - type: dot_recall
            value: 0.8554913294797688
            name: Dot Recall
          - type: dot_ap
            value: 0.40071344979441287
            name: Dot Ap
          - type: manhattan_accuracy
            value: 0.66796875
            name: Manhattan Accuracy
          - type: manhattan_accuracy_threshold
            value: 144.52613830566406
            name: Manhattan Accuracy Threshold
          - type: manhattan_f1
            value: 0.5075987841945289
            name: Manhattan F1
          - type: manhattan_f1_threshold
            value: 267.046875
            name: Manhattan F1 Threshold
          - type: manhattan_precision
            value: 0.3443298969072165
            name: Manhattan Precision
          - type: manhattan_recall
            value: 0.9653179190751445
            name: Manhattan Recall
          - type: manhattan_ap
            value: 0.4008700157620745
            name: Manhattan Ap
          - type: euclidean_accuracy
            value: 0.666015625
            name: Euclidean Accuracy
          - type: euclidean_accuracy_threshold
            value: 5.572628974914551
            name: Euclidean Accuracy Threshold
          - type: euclidean_f1
            value: 0.504258943781942
            name: Euclidean F1
          - type: euclidean_f1_threshold
            value: 12.826179504394531
            name: Euclidean F1 Threshold
          - type: euclidean_precision
            value: 0.357487922705314
            name: Euclidean Precision
          - type: euclidean_recall
            value: 0.8554913294797688
            name: Euclidean Recall
          - type: euclidean_ap
            value: 0.40083962142052487
            name: Euclidean Ap
          - type: max_accuracy
            value: 0.66796875
            name: Max Accuracy
          - type: max_accuracy_threshold
            value: 752.6634521484375
            name: Max Accuracy Threshold
          - type: max_f1
            value: 0.5075987841945289
            name: Max F1
          - type: max_f1_threshold
            value: 685.9220581054688
            name: Max F1 Threshold
          - type: max_precision
            value: 0.357487922705314
            name: Max Precision
          - type: max_recall
            value: 0.9653179190751445
            name: Max Recall
          - type: max_ap
            value: 0.4008700157620745
            name: Max Ap
      - task:
          type: binary-classification
          name: Binary Classification
        dataset:
          name: Qnli dev
          type: Qnli-dev
        metrics:
          - type: cosine_accuracy
            value: 0.591796875
            name: Cosine Accuracy
          - type: cosine_accuracy_threshold
            value: 0.9479926824569702
            name: Cosine Accuracy Threshold
          - type: cosine_f1
            value: 0.6291834002677376
            name: Cosine F1
          - type: cosine_f1_threshold
            value: 0.7761930823326111
            name: Cosine F1 Threshold
          - type: cosine_precision
            value: 0.4598825831702544
            name: Cosine Precision
          - type: cosine_recall
            value: 0.9957627118644068
            name: Cosine Recall
          - type: cosine_ap
            value: 0.5658036772817674
            name: Cosine Ap
          - type: dot_accuracy
            value: 0.59375
            name: Dot Accuracy
          - type: dot_accuracy_threshold
            value: 724.091064453125
            name: Dot Accuracy Threshold
          - type: dot_f1
            value: 0.6291834002677376
            name: Dot F1
          - type: dot_f1_threshold
            value: 596.2498779296875
            name: Dot F1 Threshold
          - type: dot_precision
            value: 0.4598825831702544
            name: Dot Precision
          - type: dot_recall
            value: 0.9957627118644068
            name: Dot Recall
          - type: dot_ap
            value: 0.5657459555147606
            name: Dot Ap
          - type: manhattan_accuracy
            value: 0.6171875
            name: Manhattan Accuracy
          - type: manhattan_accuracy_threshold
            value: 202.07958984375
            name: Manhattan Accuracy Threshold
          - type: manhattan_f1
            value: 0.6291834002677376
            name: Manhattan F1
          - type: manhattan_f1_threshold
            value: 307.9236145019531
            name: Manhattan F1 Threshold
          - type: manhattan_precision
            value: 0.4598825831702544
            name: Manhattan Precision
          - type: manhattan_recall
            value: 0.9957627118644068
            name: Manhattan Recall
          - type: manhattan_ap
            value: 0.5891966424964378
            name: Manhattan Ap
          - type: euclidean_accuracy
            value: 0.591796875
            name: Euclidean Accuracy
          - type: euclidean_accuracy_threshold
            value: 8.938886642456055
            name: Euclidean Accuracy Threshold
          - type: euclidean_f1
            value: 0.6291834002677376
            name: Euclidean F1
          - type: euclidean_f1_threshold
            value: 18.542938232421875
            name: Euclidean F1 Threshold
          - type: euclidean_precision
            value: 0.4598825831702544
            name: Euclidean Precision
          - type: euclidean_recall
            value: 0.9957627118644068
            name: Euclidean Recall
          - type: euclidean_ap
            value: 0.5658036772817674
            name: Euclidean Ap
          - type: max_accuracy
            value: 0.6171875
            name: Max Accuracy
          - type: max_accuracy_threshold
            value: 724.091064453125
            name: Max Accuracy Threshold
          - type: max_f1
            value: 0.6291834002677376
            name: Max F1
          - type: max_f1_threshold
            value: 596.2498779296875
            name: Max F1 Threshold
          - type: max_precision
            value: 0.4598825831702544
            name: Max Precision
          - type: max_recall
            value: 0.9957627118644068
            name: Max Recall
          - type: max_ap
            value: 0.5891966424964378
            name: Max Ap

SentenceTransformer based on microsoft/deberta-v3-small

This is a sentence-transformers model finetuned from microsoft/deberta-v3-small. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: microsoft/deberta-v3-small
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 768 tokens
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: DebertaV2Model 
  (1): AdvancedWeightedPooling(
    (linear_cls_pj): Linear(in_features=768, out_features=768, bias=True)
    (linear_cls_Qpj): Linear(in_features=768, out_features=768, bias=True)
    (linear_mean_pj): Linear(in_features=768, out_features=768, bias=True)
    (linear_attnOut): Linear(in_features=768, out_features=768, bias=True)
    (mha): MultiheadAttention(
      (out_proj): NonDynamicallyQuantizableLinear(in_features=768, out_features=768, bias=True)
    )
    (layernorm_output): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
    (layernorm_weightedPooing): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
    (layernorm_pjCls): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
    (layernorm_pjMean): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
    (layernorm_attnOut): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
  )
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("bobox/DeBERTa3-s-CustomPoolin-toytest-step1-checkpoints-tmp")
# Run inference
sentences = [
    'who was the first person who made the violin',
    'Violin The first makers of violins probably borrowed from various developments of the Byzantine lira. These included the rebec;[13] the Arabic rebab; the vielle (also known as the fidel or viuola); and the lira da braccio[11][14] The violin in its present form emerged in early 16th-century northern Italy. The earliest pictures of violins, albeit with three strings, are seen in northern Italy around 1530, at around the same time as the words "violino" and "vyollon" are seen in Italian and French documents. One of the earliest explicit descriptions of the instrument, including its tuning, is from the Epitome musical by Jambe de Fer, published in Lyon in 1556.[15] By this time, the violin had already begun to spread throughout Europe.',
    "Alice in Chains Alice in Chains is an American rock band from Seattle, Washington, formed in 1987 by guitarist and vocalist Jerry Cantrell and drummer Sean Kinney,[1] who recruited bassist Mike Starr[1] and lead vocalist Layne Staley.[1][2][3] Starr was replaced by Mike Inez in 1993.[4] After Staley's death in 2002, William DuVall joined in 2006 as co-lead vocalist and rhythm guitarist. The band took its name from Staley's previous group, the glam metal band Alice N' Chains.[5][2]",
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Semantic Similarity

Metric Value
pearson_cosine 0.1562
spearman_cosine 0.2236
pearson_manhattan 0.2217
spearman_manhattan 0.25
pearson_euclidean 0.1909
spearman_euclidean 0.2236
pearson_dot 0.1559
spearman_dot 0.2234
pearson_max 0.2217
spearman_max 0.25

Binary Classification

Metric Value
cosine_accuracy 0.666
cosine_accuracy_threshold 0.9798
cosine_f1 0.5043
cosine_f1_threshold 0.8929
cosine_precision 0.3575
cosine_recall 0.8555
cosine_ap 0.4008
dot_accuracy 0.666
dot_accuracy_threshold 752.6635
dot_f1 0.5043
dot_f1_threshold 685.9221
dot_precision 0.3575
dot_recall 0.8555
dot_ap 0.4007
manhattan_accuracy 0.668
manhattan_accuracy_threshold 144.5261
manhattan_f1 0.5076
manhattan_f1_threshold 267.0469
manhattan_precision 0.3443
manhattan_recall 0.9653
manhattan_ap 0.4009
euclidean_accuracy 0.666
euclidean_accuracy_threshold 5.5726
euclidean_f1 0.5043
euclidean_f1_threshold 12.8262
euclidean_precision 0.3575
euclidean_recall 0.8555
euclidean_ap 0.4008
max_accuracy 0.668
max_accuracy_threshold 752.6635
max_f1 0.5076
max_f1_threshold 685.9221
max_precision 0.3575
max_recall 0.9653
max_ap 0.4009

Binary Classification

Metric Value
cosine_accuracy 0.5918
cosine_accuracy_threshold 0.948
cosine_f1 0.6292
cosine_f1_threshold 0.7762
cosine_precision 0.4599
cosine_recall 0.9958
cosine_ap 0.5658
dot_accuracy 0.5938
dot_accuracy_threshold 724.0911
dot_f1 0.6292
dot_f1_threshold 596.2499
dot_precision 0.4599
dot_recall 0.9958
dot_ap 0.5657
manhattan_accuracy 0.6172
manhattan_accuracy_threshold 202.0796
manhattan_f1 0.6292
manhattan_f1_threshold 307.9236
manhattan_precision 0.4599
manhattan_recall 0.9958
manhattan_ap 0.5892
euclidean_accuracy 0.5918
euclidean_accuracy_threshold 8.9389
euclidean_f1 0.6292
euclidean_f1_threshold 18.5429
euclidean_precision 0.4599
euclidean_recall 0.9958
euclidean_ap 0.5658
max_accuracy 0.6172
max_accuracy_threshold 724.0911
max_f1 0.6292
max_f1_threshold 596.2499
max_precision 0.4599
max_recall 0.9958
max_ap 0.5892

Training Details

Training Dataset

Unnamed Dataset

  • Size: 32,500 training samples
  • Columns: sentence1 and sentence2
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2
    type string string
    details
    • min: 4 tokens
    • mean: 29.3 tokens
    • max: 343 tokens
    • min: 2 tokens
    • mean: 57.53 tokens
    • max: 512 tokens
  • Samples:
    sentence1 sentence2
    A Slippery Dick is what type of creature? The Slippery Dick (Juvenile) - Whats That Fish! Description Also known as Sand-reef Wrasses and Slippery Dick Wrasse. Found singly or in pairs or in groups constantly circling around reefs, sea grass beds and sandy areas. Colours highly variable especially between juvenile to adult. They feed on hard shell invertebrates. Length - 18cm Depth - 2-12m Widespread Western Atlantic & Caribbean Most reef fish seen by divers during the day are grazers, that cruise around just above the surface of the coral or snoop into crevices looking for algae, worms and small crustaceans. Wrasses have small protruding teeth and graze the bottom taking in a variety of snails, worms, crabs, shrimps and eggs. Any hard coats or thick shells are then ground down by their pharyngeal jaws and the delicacies inside digested. From juvenile to adult wrasses dramatically alter their colour and body shapes. Wrasses are always on the go during the day, but are the first to go to bed and the last to rise. Small wrasses dive below the sand to sleep and larger wrasses wedge themselves in crevasses. Related creatures Heads up! Many creatures change during their life. Juvenile fish become adults and some change shape or their colour. Some species change sex and others just get older. The following creature(s) are known relatives of the Slippery Dick (Juvenile). Click the image(s) to explore further or hover over to get a better view! Slippery Dick
    e. in solids the atoms are closely locked in position and can only vibrate, in liquids the atoms and molecules are more loosely connected and can collide with and move past one another, while in gases the atoms or molecules are free to move independently, colliding frequently. Within a substance, atoms that collide frequently and move independently of one another are most likely in a gas
    In December 2015 , the film was ranked # 192 on IMDb . As of December 2015 , it is the # 192 highest rated film on IMDb.
  • Loss: GISTEmbedLoss with these parameters:
    {'guide': SentenceTransformer(
      (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
      (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
      (2): Normalize()
    ), 'temperature': 0.025}
    

Evaluation Dataset

Unnamed Dataset

  • Size: 1,664 evaluation samples
  • Columns: sentence1 and sentence2
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2
    type string string
    details
    • min: 4 tokens
    • mean: 28.74 tokens
    • max: 330 tokens
    • min: 2 tokens
    • mean: 56.55 tokens
    • max: 512 tokens
  • Samples:
    sentence1 sentence2
    What component of an organism, made up of many cells, in turn makes up an organ?
    Diffusion Diffusion is a process where atoms or molecules move from areas of high concentration to areas of low concentration. Diffusion is the process in which a substance naturally moves from an area of higher to lower concentration.
    In the 1966 movie The Good, The Bad And The Ugly, Clint Eastwood played the Good" and Lee van Cleef played "the Bad", but who played "the Ugly"? View All Photos (10) Movie Info In the last and the best installment of his so-called "Dollars" trilogy of Sergio Leone-directed "spaghetti westerns," Clint Eastwood reprised the role of a taciturn, enigmatic loner. Here he searches for a cache of stolen gold against rivals the Bad (Lee Van Cleef), a ruthless bounty hunter, and the Ugly (Eli Wallach), a Mexican bandit. Though dubbed "the Good," Eastwood's character is not much better than his opponents -- he is just smarter and shoots faster. The film's title reveals its ironic attitude toward the canonized heroes of the classical western. "The real West was the world of violence, fear, and brutal instincts," claimed Leone. "In pursuit of profit there is no such thing as good and evil, generosity or deviousness; everything depends on chance, and not the best wins but the luckiest." Immensely entertaining and beautifully shot in Techniscope by Tonino Delli Colli, the movie is a virtually definitive "spaghetti western," rivaled only by Leone's own Once Upon a Time in the West (1968). The main musical theme by Ennio Morricone hit #1 on the British pop charts. Originally released in Italy at 177 minutes, the movie was later cut for its international release. ~ Yuri German, Rovi Rating:
  • Loss: GISTEmbedLoss with these parameters:
    {'guide': SentenceTransformer(
      (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
      (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
      (2): Normalize()
    ), 'temperature': 0.025}
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 256
  • lr_scheduler_type: cosine_with_min_lr
  • lr_scheduler_kwargs: {'num_cycles': 0.5, 'min_lr': 3.3333333333333337e-06}
  • warmup_ratio: 0.33
  • save_safetensors: False
  • fp16: True
  • push_to_hub: True
  • hub_model_id: bobox/DeBERTa3-s-CustomPoolin-toytest-step1-checkpoints-tmp
  • hub_strategy: all_checkpoints
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 256
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 3
  • max_steps: -1
  • lr_scheduler_type: cosine_with_min_lr
  • lr_scheduler_kwargs: {'num_cycles': 0.5, 'min_lr': 3.3333333333333337e-06}
  • warmup_ratio: 0.33
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: False
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: True
  • resume_from_checkpoint: None
  • hub_model_id: bobox/DeBERTa3-s-CustomPoolin-toytest-step1-checkpoints-tmp
  • hub_strategy: all_checkpoints
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • eval_use_gather_object: False
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Click to expand
Epoch Step Training Loss Validation Loss sts-test_spearman_cosine allNLI-dev_max_ap Qnli-dev_max_ap
0.0010 1 4.9603 - - - -
0.0020 2 28.2529 - - - -
0.0030 3 27.6365 - - - -
0.0039 4 6.1387 - - - -
0.0049 5 5.5753 - - - -
0.0059 6 5.6951 - - - -
0.0069 7 6.3533 - - - -
0.0079 8 27.3848 - - - -
0.0089 9 3.8501 - - - -
0.0098 10 27.911 - - - -
0.0108 11 4.9042 - - - -
0.0118 12 6.8003 - - - -
0.0128 13 5.7317 - - - -
0.0138 14 20.261 - - - -
0.0148 15 27.9051 - - - -
0.0157 16 5.5959 - - - -
0.0167 17 5.8052 - - - -
0.0177 18 4.5088 - - - -
0.0187 19 7.3472 - - - -
0.0197 20 5.8668 - - - -
0.0207 21 6.4083 - - - -
0.0217 22 6.011 - - - -
0.0226 23 5.2394 - - - -
0.0236 24 4.2966 - - - -
0.0246 25 26.605 - - - -
0.0256 26 6.2067 - - - -
0.0266 27 6.0346 - - - -
0.0276 28 5.4676 - - - -
0.0285 29 6.4292 - - - -
0.0295 30 26.6452 - - - -
0.0305 31 18.8401 - - - -
0.0315 32 7.4531 - - - -
0.0325 33 4.8286 - - - -
0.0335 34 5.0078 - - - -
0.0344 35 5.4115 - - - -
0.0354 36 5.4196 - - - -
0.0364 37 4.5023 - - - -
0.0374 38 5.376 - - - -
0.0384 39 5.2303 - - - -
0.0394 40 5.6694 - - - -
0.0404 41 4.7825 - - - -
0.0413 42 4.6507 - - - -
0.0423 43 24.2072 - - - -
0.0433 44 4.9285 - - - -
0.0443 45 6.326 - - - -
0.0453 46 4.5724 - - - -
0.0463 47 4.754 - - - -
0.0472 48 5.5443 - - - -
0.0482 49 4.5764 - - - -
0.0492 50 5.1434 - - - -
0.0502 51 22.6991 - - - -
0.0512 52 5.4277 - - - -
0.0522 53 5.0178 - - - -
0.0531 54 4.8779 - - - -
0.0541 55 4.2884 - - - -
0.0551 56 16.0994 - - - -
0.0561 57 21.31 - - - -
0.0571 58 4.9721 - - - -
0.0581 59 5.143 - - - -
0.0591 60 3.5933 - - - -
0.0600 61 5.2559 - - - -
0.0610 62 4.0757 - - - -
0.0620 63 3.6612 - - - -
0.0630 64 4.7505 - - - -
0.0640 65 4.1979 - - - -
0.0650 66 3.9982 - - - -
0.0659 67 4.7065 - - - -
0.0669 68 5.3413 - - - -
0.0679 69 3.6964 - - - -
0.0689 70 17.8774 - - - -
0.0699 71 4.8154 - - - -
0.0709 72 4.8356 - - - -
0.0719 73 4.568 - - - -
0.0728 74 4.0898 - - - -
0.0738 75 3.4502 - - - -
0.0748 76 3.7733 - - - -
0.0758 77 4.5204 - - - -
0.0768 78 4.2526 - - - -
0.0778 79 4.4398 - - - -
0.0787 80 4.0988 - - - -
0.0797 81 3.9704 - - - -
0.0807 82 4.3343 - - - -
0.0817 83 4.2587 - - - -
0.0827 84 15.0149 - - - -
0.0837 85 14.6599 - - - -
0.0846 86 4.0623 - - - -
0.0856 87 3.7597 - - - -
0.0866 88 4.3433 - - - -
0.0876 89 4.0287 - - - -
0.0886 90 4.6257 - - - -
0.0896 91 13.4689 - - - -
0.0906 92 4.6583 - - - -
0.0915 93 4.2682 - - - -
0.0925 94 4.468 - - - -
0.0935 95 3.4333 - - - -
0.0945 96 12.7654 - - - -
0.0955 97 3.5577 - - - -
0.0965 98 12.5875 - - - -
0.0974 99 4.2206 - - - -
0.0984 100 3.5981 - - - -
0.0994 101 3.5575 - - - -
0.1004 102 4.0271 - - - -
0.1014 103 4.0803 - - - -
0.1024 104 4.0886 - - - -
0.1033 105 4.176 - - - -
0.1043 106 4.6653 - - - -
0.1053 107 4.3076 - - - -
0.1063 108 8.7282 - - - -
0.1073 109 3.4192 - - - -
0.1083 110 10.6027 - - - -
0.1093 111 4.0959 - - - -
0.1102 112 4.2785 - - - -
0.1112 113 3.9945 - - - -
0.1122 114 10.0652 - - - -
0.1132 115 3.8621 - - - -
0.1142 116 4.3975 - - - -
0.1152 117 9.7899 - - - -
0.1161 118 4.3812 - - - -
0.1171 119 3.8715 - - - -
0.1181 120 3.8327 - - - -
0.1191 121 3.5103 - - - -
0.1201 122 9.3158 - - - -
0.1211 123 3.7201 - - - -
0.1220 124 3.4311 - - - -
0.1230 125 3.7946 - - - -
0.1240 126 4.0456 - - - -
0.125 127 3.482 - - - -
0.1260 128 3.1901 - - - -
0.1270 129 3.414 - - - -
0.1280 130 3.4967 - - - -
0.1289 131 3.6594 - - - -
0.1299 132 8.066 - - - -
0.1309 133 3.7872 - - - -
0.1319 134 4.0023 - - - -
0.1329 135 3.7728 - - - -
0.1339 136 3.1893 - - - -
0.1348 137 3.3635 - - - -
0.1358 138 4.0195 - - - -
0.1368 139 4.1097 - - - -
0.1378 140 3.7903 - - - -
0.1388 141 3.5748 - - - -
0.1398 142 3.8104 - - - -
0.1407 143 8.0411 - - - -
0.1417 144 3.4819 - - - -
0.1427 145 3.452 - - - -
0.1437 146 3.5861 - - - -
0.1447 147 3.4324 - - - -
0.1457 148 3.521 - - - -
0.1467 149 3.8868 - - - -
0.1476 150 8.1191 - - - -
0.1486 151 3.6447 - - - -
0.1496 152 2.9436 - - - -
0.1506 153 8.1535 2.2032 0.2236 0.4009 0.5892
0.1516 154 3.9619 - - - -
0.1526 155 3.1301 - - - -
0.1535 156 3.0478 - - - -
0.1545 157 3.2986 - - - -
0.1555 158 3.2847 - - - -
0.1565 159 3.6599 - - - -
0.1575 160 3.2238 - - - -
0.1585 161 2.8897 - - - -
0.1594 162 3.9443 - - - -
0.1604 163 3.3733 - - - -
0.1614 164 3.7444 - - - -
0.1624 165 3.4813 - - - -
0.1634 166 2.6865 - - - -
0.1644 167 2.7587 - - - -
0.1654 168 3.3628 - - - -
0.1663 169 3.0035 - - - -
0.1673 170 10.1591 - - - -
0.1683 171 3.5366 - - - -
0.1693 172 8.4047 - - - -
0.1703 173 3.8643 - - - -
0.1713 174 3.3529 - - - -
0.1722 175 3.7143 - - - -
0.1732 176 3.3323 - - - -
0.1742 177 3.1206 - - - -
0.1752 178 3.1348 - - - -
0.1762 179 7.6011 - - - -
0.1772 180 3.7025 - - - -
0.1781 181 10.5662 - - - -
0.1791 182 8.966 - - - -
0.1801 183 9.426 - - - -
0.1811 184 3.0025 - - - -
0.1821 185 7.0984 - - - -
0.1831 186 7.3808 - - - -
0.1841 187 2.8657 - - - -
0.1850 188 6.5636 - - - -
0.1860 189 3.4702 - - - -
0.1870 190 5.9302 - - - -
0.1880 191 3.2406 - - - -
0.1890 192 3.4459 - - - -
0.1900 193 5.269 - - - -
0.1909 194 4.8605 - - - -
0.1919 195 2.9891 - - - -
0.1929 196 3.6681 - - - -
0.1939 197 3.1589 - - - -
0.1949 198 3.1835 - - - -
0.1959 199 3.7561 - - - -
0.1969 200 4.0891 - - - -
0.1978 201 3.563 - - - -
0.1988 202 3.7433 - - - -
0.1998 203 3.3813 - - - -
0.2008 204 5.2311 - - - -
0.2018 205 3.3494 - - - -
0.2028 206 3.3533 - - - -
0.2037 207 3.688 - - - -
0.2047 208 3.5342 - - - -
0.2057 209 4.9381 - - - -
0.2067 210 3.1839 - - - -
0.2077 211 3.0465 - - - -
0.2087 212 3.1232 - - - -
0.2096 213 4.6297 - - - -
0.2106 214 2.9834 - - - -
0.2116 215 4.2231 - - - -
0.2126 216 3.1458 - - - -
0.2136 217 3.2525 - - - -
0.2146 218 3.5971 - - - -
0.2156 219 3.5616 - - - -
0.2165 220 3.2378 - - - -
0.2175 221 2.9075 - - - -
0.2185 222 3.0391 - - - -
0.2195 223 3.5573 - - - -
0.2205 224 3.2092 - - - -
0.2215 225 3.2646 - - - -
0.2224 226 3.0886 - - - -
0.2234 227 3.5241 - - - -
0.2244 228 3.0111 - - - -
0.2254 229 3.707 - - - -
0.2264 230 5.3822 - - - -
0.2274 231 3.2646 - - - -
0.2283 232 2.7021 - - - -
0.2293 233 3.5131 - - - -
0.2303 234 3.103 - - - -
0.2313 235 2.9535 - - - -
0.2323 236 2.9631 - - - -
0.2333 237 2.8068 - - - -
0.2343 238 3.4251 - - - -
0.2352 239 2.8495 - - - -
0.2362 240 2.9972 - - - -
0.2372 241 3.3509 - - - -
0.2382 242 2.9234 - - - -
0.2392 243 2.4086 - - - -
0.2402 244 3.1282 - - - -
0.2411 245 2.3352 - - - -
0.2421 246 2.4706 - - - -
0.2431 247 3.5449 - - - -
0.2441 248 2.8963 - - - -
0.2451 249 2.773 - - - -
0.2461 250 2.355 - - - -
0.2470 251 2.656 - - - -
0.2480 252 2.6221 - - - -
0.2490 253 8.6739 - - - -
0.25 254 10.8242 - - - -
0.2510 255 2.3408 - - - -
0.2520 256 2.1221 - - - -
0.2530 257 3.295 - - - -
0.2539 258 2.5896 - - - -
0.2549 259 2.1215 - - - -
0.2559 260 9.4851 - - - -
0.2569 261 2.1982 - - - -
0.2579 262 3.0568 - - - -
0.2589 263 2.6269 - - - -
0.2598 264 2.4792 - - - -
0.2608 265 1.9445 - - - -
0.2618 266 2.4061 - - - -
0.2628 267 8.3116 - - - -
0.2638 268 8.0804 - - - -
0.2648 269 2.1674 - - - -
0.2657 270 7.1975 - - - -
0.2667 271 5.9104 - - - -
0.2677 272 2.498 - - - -
0.2687 273 2.5249 - - - -
0.2697 274 2.7152 - - - -
0.2707 275 2.7904 - - - -
0.2717 276 2.7745 - - - -
0.2726 277 2.9741 - - - -
0.2736 278 1.8215 - - - -
0.2746 279 4.6844 - - - -
0.2756 280 2.8613 - - - -
0.2766 281 2.7147 - - - -
0.2776 282 2.814 - - - -
0.2785 283 2.3569 - - - -
0.2795 284 2.672 - - - -
0.2805 285 3.2052 - - - -
0.2815 286 2.8056 - - - -
0.2825 287 2.6268 - - - -
0.2835 288 2.5641 - - - -
0.2844 289 2.4475 - - - -
0.2854 290 2.7377 - - - -
0.2864 291 2.3831 - - - -
0.2874 292 8.8069 - - - -
0.2884 293 2.186 - - - -
0.2894 294 2.3389 - - - -
0.2904 295 1.9744 - - - -
0.2913 296 2.4491 - - - -
0.2923 297 2.5668 - - - -
0.2933 298 2.1939 - - - -
0.2943 299 2.2832 - - - -
0.2953 300 2.7508 - - - -
0.2963 301 2.5206 - - - -
0.2972 302 2.3522 - - - -
0.2982 303 2.7186 - - - -
0.2992 304 2.1369 - - - -
0.3002 305 9.7972 - - - -

Framework Versions

  • Python: 3.10.12
  • Sentence Transformers: 3.2.1
  • Transformers: 4.44.2
  • PyTorch: 2.5.0+cu121
  • Accelerate: 0.34.2
  • Datasets: 3.0.2
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

GISTEmbedLoss

@misc{solatorio2024gistembed,
    title={GISTEmbed: Guided In-sample Selection of Training Negatives for Text Embedding Fine-tuning},
    author={Aivin V. Solatorio},
    year={2024},
    eprint={2402.16829},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}