tomasravel's picture
Add new SentenceTransformer model.
276aad3 verified
metadata
base_model: sentence-transformers/paraphrase-MiniLM-L6-v2
datasets: []
language: []
library_name: sentence-transformers
pipeline_tag: sentence-similarity
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - generated_from_trainer
  - dataset_size:75253
  - loss:CoSENTLoss
widget:
  - source_sentence: buenos aires general pueyrredon mar del plata calle 395
    sentences:
      - buenos aires lujan de cuyo mar del plata calle 395
      - buenos aires general pueyrredon mar del plata calle 499
      - buenos aires general pueyrredon calle 15
  - source_sentence: buenos aires bahia blanca chacabuco
    sentences:
      - jujuy ciudad autonoma buenos aires av eva peron
      - buenos aires caada de gomez cadetes
      - buenos aires bahia blanca migueletes
  - source_sentence: buenos aires bahia blanca curumalal
    sentences:
      - buenos aires punilla mar del plata corbeta uruguay
      - capital federal ciudad autonoma buenos aires av rey del bosque
      - buenos aires rio chico curumalal
  - source_sentence: buenos aires lomas de zamora sixto fernandez
    sentences:
      - buenos aires general pueyrredon santa rosa de calamuchita san lorenzo
      - buenos aires jose ingenieros sixto fernandez
      - buenos aires lomas de zamora florida luis viale
  - source_sentence: buenos aires moreno francisco alvarez paramaribo
    sentences:
      - mendoza general pueyrredon mar del plata calle 3 b
      - buenos aires moreno francisco alvarez bermejo
      - buenos aires ezeiza av 60

SentenceTransformer based on sentence-transformers/paraphrase-MiniLM-L6-v2

This is a sentence-transformers model finetuned from sentence-transformers/paraphrase-MiniLM-L6-v2. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 128, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("tomasravel/modelo_finetuneado24")
# Run inference
sentences = [
    'buenos aires moreno francisco alvarez paramaribo',
    'buenos aires moreno francisco alvarez bermejo',
    'mendoza general pueyrredon mar del plata calle 3 b',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Dataset

Unnamed Dataset

  • Size: 75,253 training samples
  • Columns: sentence_0, sentence_1, and label
  • Approximate statistics based on the first 1000 samples:
    sentence_0 sentence_1 label
    type string string float
    details
    • min: 4 tokens
    • mean: 13.46 tokens
    • max: 21 tokens
    • min: 5 tokens
    • mean: 13.0 tokens
    • max: 22 tokens
    • min: 0.2
    • mean: 0.69
    • max: 1.0
  • Samples:
    sentence_0 sentence_1 label
    buenos aires lomas de zamora temperley cangallo buenos aires lomas de zamora cangallo 1.0
    buenos aires general pueyrredon mar del plata calle 33 buenos aires maximo paz mar del plata calle 33 0.6
    buenos aires general pueyrredon mar del plata cordoba buenos aires washington mar del plata cordoba 0.6
  • Loss: CoSENTLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "pairwise_cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 32
  • num_train_epochs: 10
  • multi_dataset_batch_sampler: round_robin

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: no
  • prediction_loss_only: True
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 32
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1
  • num_train_epochs: 10
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.0
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • eval_use_gather_object: False
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: round_robin

Training Logs

Epoch Step Training Loss
0.2126 500 6.2141
0.4252 1000 5.3697
0.6378 1500 5.2046
0.8503 2000 5.1007
1.0629 2500 4.9564
1.2755 3000 4.8524
1.4881 3500 4.7941
1.7007 4000 4.7099
1.9133 4500 4.6723
2.1259 5000 4.5816
2.3384 5500 4.5275
2.5510 6000 4.527
2.7636 6500 4.4588
2.9762 7000 4.4253
3.1888 7500 4.3234
3.4014 8000 4.3147
3.6139 8500 4.2644
3.8265 9000 4.256
4.0391 9500 4.1724
4.2517 10000 4.1406
4.4643 10500 4.0917
4.6769 11000 4.1334
4.8895 11500 4.0791
5.1020 12000 4.0217
5.3146 12500 3.9745
5.5272 13000 3.9575
5.7398 13500 3.942
5.9524 14000 3.9029
6.1650 14500 3.8617
6.3776 15000 3.8648
6.5901 15500 3.7995
6.8027 16000 3.83
7.0153 16500 3.734
7.2279 17000 3.7528
7.4405 17500 3.634
7.6531 18000 3.7306
7.8656 18500 3.7076
8.0782 19000 3.6494
8.2908 19500 3.664
8.5034 20000 3.5254
8.7160 20500 3.5624
8.9286 21000 3.5812
9.1412 21500 3.566
9.3537 22000 3.3967
9.5663 22500 3.474
9.7789 23000 3.5136
9.9915 23500 3.4518

Framework Versions

  • Python: 3.10.12
  • Sentence Transformers: 3.0.1
  • Transformers: 4.44.2
  • PyTorch: 2.2.2+cu121
  • Accelerate: 0.34.2
  • Datasets: 2.21.0
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

CoSENTLoss

@online{kexuefm-8847,
    title={CoSENT: A more efficient sentence vector scheme than Sentence-BERT},
    author={Su Jianlin},
    year={2022},
    month={Jan},
    url={https://kexue.fm/archives/8847},
}