BGE SITGES CAT

This is a sentence-transformers model finetuned from BAAI/bge-m3. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: BAAI/bge-m3
  • Maximum Sequence Length: 8192 tokens
  • Output Dimensionality: 1024 tokens
  • Similarity Function: Cosine Similarity
  • Language: ca
  • License: apache-2.0

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 8192, 'do_lower_case': False}) with Transformer model: XLMRobertaModel 
  (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("adriansanz/SITGES-BAAI2")
# Run inference
sentences = [
    "Cal revisar la informació i els terminis de la convocatòria específica de cada procés que trobareu a la Seu electrònica de l'Ajuntament de Sitges.",
    "On es pot trobar la informació sobre els terminis de presentació d'al·legacions en un procés de selecció de personal de l'Ajuntament de Sitges?",
    "Quin és el document que es necessita per acreditar l'any de construcció i l'adequació a la legalitat urbanística d'un immoble?",
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Information Retrieval

Metric Value
cosine_accuracy@1 0.1388
cosine_accuracy@3 0.2225
cosine_accuracy@5 0.3086
cosine_accuracy@10 0.5
cosine_precision@1 0.1388
cosine_precision@3 0.0742
cosine_precision@5 0.0617
cosine_precision@10 0.05
cosine_recall@1 0.1388
cosine_recall@3 0.2225
cosine_recall@5 0.3086
cosine_recall@10 0.5
cosine_ndcg@10 0.2825
cosine_mrr@10 0.2178
cosine_map@100 0.243

Information Retrieval

Metric Value
cosine_accuracy@1 0.1316
cosine_accuracy@3 0.2225
cosine_accuracy@5 0.3158
cosine_accuracy@10 0.4904
cosine_precision@1 0.1316
cosine_precision@3 0.0742
cosine_precision@5 0.0632
cosine_precision@10 0.049
cosine_recall@1 0.1316
cosine_recall@3 0.2225
cosine_recall@5 0.3158
cosine_recall@10 0.4904
cosine_ndcg@10 0.2759
cosine_mrr@10 0.2117
cosine_map@100 0.2378

Information Retrieval

Metric Value
cosine_accuracy@1 0.1388
cosine_accuracy@3 0.2177
cosine_accuracy@5 0.3062
cosine_accuracy@10 0.4856
cosine_precision@1 0.1388
cosine_precision@3 0.0726
cosine_precision@5 0.0612
cosine_precision@10 0.0486
cosine_recall@1 0.1388
cosine_recall@3 0.2177
cosine_recall@5 0.3062
cosine_recall@10 0.4856
cosine_ndcg@10 0.2766
cosine_mrr@10 0.2143
cosine_map@100 0.2408

Information Retrieval

Metric Value
cosine_accuracy@1 0.1244
cosine_accuracy@3 0.2177
cosine_accuracy@5 0.3134
cosine_accuracy@10 0.4689
cosine_precision@1 0.1244
cosine_precision@3 0.0726
cosine_precision@5 0.0627
cosine_precision@10 0.0469
cosine_recall@1 0.1244
cosine_recall@3 0.2177
cosine_recall@5 0.3134
cosine_recall@10 0.4689
cosine_ndcg@10 0.2671
cosine_mrr@10 0.2064
cosine_map@100 0.2343

Information Retrieval

Metric Value
cosine_accuracy@1 0.122
cosine_accuracy@3 0.2129
cosine_accuracy@5 0.3014
cosine_accuracy@10 0.4928
cosine_precision@1 0.122
cosine_precision@3 0.071
cosine_precision@5 0.0603
cosine_precision@10 0.0493
cosine_recall@1 0.122
cosine_recall@3 0.2129
cosine_recall@5 0.3014
cosine_recall@10 0.4928
cosine_ndcg@10 0.2715
cosine_mrr@10 0.2055
cosine_map@100 0.2308

Information Retrieval

Metric Value
cosine_accuracy@1 0.1196
cosine_accuracy@3 0.1986
cosine_accuracy@5 0.2823
cosine_accuracy@10 0.4689
cosine_precision@1 0.1196
cosine_precision@3 0.0662
cosine_precision@5 0.0565
cosine_precision@10 0.0469
cosine_recall@1 0.1196
cosine_recall@3 0.1986
cosine_recall@5 0.2823
cosine_recall@10 0.4689
cosine_ndcg@10 0.2583
cosine_mrr@10 0.1957
cosine_map@100 0.2212

Training Details

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: epoch
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • gradient_accumulation_steps: 16
  • learning_rate: 2e-05
  • num_train_epochs: 6
  • lr_scheduler_type: cosine
  • warmup_ratio: 0.1
  • bf16: True
  • tf32: True
  • load_best_model_at_end: True
  • optim: adamw_torch_fused
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: epoch
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 16
  • eval_accumulation_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 6
  • max_steps: -1
  • lr_scheduler_type: cosine
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: True
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss loss dim_1024_cosine_map@100 dim_128_cosine_map@100 dim_256_cosine_map@100 dim_512_cosine_map@100 dim_64_cosine_map@100 dim_768_cosine_map@100
0.3404 5 3.3256 - - - - - - -
0.6809 10 2.2115 - - - - - - -
0.9532 14 - 1.2963 0.2260 0.2148 0.2144 0.2258 0.2069 0.2252
1.0213 15 1.7921 - - - - - - -
1.3617 20 1.2295 - - - - - - -
1.7021 25 0.9048 - - - - - - -
1.9745 29 - 0.8667 0.2311 0.2267 0.2292 0.2279 0.2121 0.2278
2.0426 30 0.7256 - - - - - - -
2.3830 35 0.5252 - - - - - - -
2.7234 40 0.4648 - - - - - - -
2.9957 44 - 0.6920 0.2311 0.2243 0.2332 0.2319 0.2211 0.2354
3.0638 45 0.3518 - - - - - - -
3.4043 50 0.321 - - - - - - -
3.7447 55 0.2923 - - - - - - -
3.9489 58 - 0.6514 0.2343 0.2210 0.2293 0.2338 0.2242 0.2331
4.0851 60 0.2522 - - - - - - -
4.4255 65 0.2445 - - - - - - -
4.7660 70 0.2358 - - - - - - -
4.9702 73 - 0.6481 0.2348 0.2239 0.2252 0.2332 0.2167 0.2298
5.1064 75 0.2301 - - - - - - -
5.4468 80 0.2262 - - - - - - -
5.7191 84 - 0.646 0.243 0.2308 0.2343 0.2408 0.2212 0.2378
  • The bold row denotes the saved checkpoint.

Framework Versions

  • Python: 3.10.12
  • Sentence Transformers: 3.0.1
  • Transformers: 4.42.3
  • PyTorch: 2.3.1+cu121
  • Accelerate: 0.32.1
  • Datasets: 2.20.0
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MatryoshkaLoss

@misc{kusupati2024matryoshka,
    title={Matryoshka Representation Learning}, 
    author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
    year={2024},
    eprint={2205.13147},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply}, 
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
3
Safetensors
Model size
568M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for adriansanz/SITGES-bge-FT1

Base model

BAAI/bge-m3
Finetuned
(191)
this model

Evaluation results