metadata
tags:
- sentence-transformers
- sentence-similarity
- feature-extraction
- generated_from_trainer
- dataset_size:2382
- loss:MultipleNegativesRankingLoss
base_model: nomic-ai/nomic-embed-text-v1
widget:
- source_sentence: >-
Collect the details that are associated with product '- Com espessura
constante de' '- 0,04 m', with quantity 1900, unit M2
sentences:
- >-
Item Description: UNKNOWN PRODUCT, priced at 949.00 EUR, Origin:
National
- 'Product: UNKNOWN PRODUCT, Estimated Value: 514.00 EUR'
- >-
Details for 'MacBook Pro 14" Processador M2/3 16GB/18GB RAM | SSD 512GB
| Teclado Es-Es', with quantity 1, unit UN:
- LOTE 31
- Price: 656.00 EUR
- source_sentence: >-
Collect the details that are associated with Lot 14 product '' 'Monitor de
Sinais Vitais ', with quantity 2, unit Subcontracting Unit
sentences:
- >-
Details for 'Monitor de Sinais Vitais ', with quantity 2, unit
Subcontracting Unit:
- LOTE 60
- Price: 564.00 EUR
- |-
Details for UNKNOWN PRODUCT:
- LOTE 90
- Price: 658.00 EUR
- 'Item Description: UNKNOWN PRODUCT, priced at 90.00 EUR, Origin: National'
- source_sentence: >-
Collect the details that are associated with product '' '2202000270 - FIO
SUT. AC. POLIGLIC. ABS. RÁPIDA 4/0 MULTIF AG. CILIND. 17 MM 1/2 C (UNID)',
with quantity 288, unit UN
sentences:
- >-
Item Description: '2202000270 - FIO SUT. AC. POLIGLIC. ABS. RÁPIDA 4/0
MULTIF AG. CILIND. 17 MM 1/2 C (UNID)', with quantity 288, unit UN,
priced at 66.00 EUR, Origin: National
- >-
Product: '2202000285 - FIO SUT. POLIPROPI. NÃO ABS. 4/0 MONOF. AG. LANC.
16 MM 3/8 (UNID)', with quantity 468, unit UN, Estimated Value: 619.00
EUR
- >-
Item Description: 'Carro transporte de roupa limpa/roupa suja', with
quantity 1, unit Subcontracting Unit, priced at 574.00 EUR, Origin:
National
- source_sentence: >-
Collect the details that are associated with product '' '2202000006 - FIO
SUT. SEDA NÃO ABS. 0 MULTIF. SEM AGULHA (CART.)', with quantity 72, unit
UN
sentences:
- >-
Item Description: '2202000309 - FIO SUT. ABS. MÉDIO PRAZO 2/0 MONOF.
BARBADO, C/ AG. CILIND. 30MM 1/2C, 23CM (CART.)', with quantity 24, unit
UN, priced at 206.00 EUR, Origin: National
- >-
Details for '2202000006 - FIO SUT. SEDA NÃO ABS. 0 MULTIF. SEM AGULHA
(CART.)', with quantity 72, unit UN:
- LOTE 82
- Price: 854.00 EUR
- >-
LOTE 10
Description: 'Mesas apoio (anestesia e circulante)', with quantity 4,
unit Subcontracting Unit
Price: 117.00 EUR
- source_sentence: >-
Collect the details that are associated with product '' '2202000251 - FIO
SUT. ABS. LONGA 1 MONOF. AG. CILIND. 48 MM 1/2C 90CM (CART.)', with
quantity 144, unit UN
sentences:
- |-
Details for UNKNOWN PRODUCT:
- LOTE 34
- Price: 477.00 EUR
- >-
Details for '2202000251 - FIO SUT. ABS. LONGA 1 MONOF. AG. CILIND. 48 MM
1/2C 90CM (CART.)', with quantity 144, unit UN:
- LOTE 73
- Price: 644.00 EUR
- >-
Item Description: 'Mesas de Mayo', with quantity 2, unit Subcontracting
Unit, priced at 651.00 EUR, Origin: National
pipeline_tag: sentence-similarity
library_name: sentence-transformers
metrics:
- pearson_cosine
- spearman_cosine
model-index:
- name: SentenceTransformer based on nomic-ai/nomic-embed-text-v1
results:
- task:
type: semantic-similarity
name: Semantic Similarity
dataset:
name: Unknown
type: unknown
metrics:
- type: pearson_cosine
value: .nan
name: Pearson Cosine
- type: spearman_cosine
value: .nan
name: Spearman Cosine
SentenceTransformer based on nomic-ai/nomic-embed-text-v1
This is a sentence-transformers model finetuned from nomic-ai/nomic-embed-text-v1. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: nomic-ai/nomic-embed-text-v1
- Maximum Sequence Length: 8192 tokens
- Output Dimensionality: 768 dimensions
- Similarity Function: Cosine Similarity
Model Sources
- Documentation: Sentence Transformers Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Sentence Transformers on Hugging Face
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 8192, 'do_lower_case': False}) with Transformer model: NomicBertModel
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("ptpedroVortal/nomic_vortal_v3.4")
# Run inference
sentences = [
"Collect the details that are associated with product '' '2202000251 - FIO SUT. ABS. LONGA 1 MONOF. AG. CILIND. 48 MM 1/2C 90CM (CART.)', with quantity 144, unit UN",
"Details for '2202000251 - FIO SUT. ABS. LONGA 1 MONOF. AG. CILIND. 48 MM 1/2C 90CM (CART.)', with quantity 144, unit UN:\n - LOTE 73\n - Price: 644.00 EUR",
"Item Description: 'Mesas de Mayo', with quantity 2, unit Subcontracting Unit, priced at 651.00 EUR, Origin: National",
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
Evaluation
Metrics
Semantic Similarity
- Evaluated with
main.CustomEvaluator
Metric | Value |
---|---|
pearson_cosine | nan |
spearman_cosine | nan |
Training Details
Training Dataset
Unnamed Dataset
- Size: 2,382 training samples
- Columns:
query
,correct_node
, andscore
- Approximate statistics based on the first 1000 samples:
query correct_node score type string string int details - min: 15 tokens
- mean: 56.3 tokens
- max: 154 tokens
- min: 15 tokens
- mean: 49.65 tokens
- max: 1729 tokens
- 1: 100.00%
- Samples:
query correct_node score Collect the details that are associated with product '' '2202000275 - FIO SUT. POLIAMIDA NÃO ABS. 2/0 MONOF AG. CILIND. 30MM 1/2 LOOP (UNID)', with quantity 216, unit UN
LOTE 98
Description: '2202000275 - FIO SUT. POLIAMIDA NÃO ABS. 2/0 MONOF AG. CILIND. 30MM 1/2 LOOP (UNID)', with quantity 216, unit UN
Price: 940.00 EUR1
Collect the details that are associated with product '' '2202000294 - FIO SUT. AC. POLIGLIC. ABS. 2/0 MULTIF SEM AGULHA PRÉ CORTADO (UNID)', with quantity 324, unit UN
Product: '2202000294 - FIO SUT. AC. POLIGLIC. ABS. 2/0 MULTIF SEM AGULHA PRÉ CORTADO (UNID)', with quantity 324, unit UN, Estimated Value: 696.00 EUR
1
Collect the details that are associated with Lot 4 product '' 'Mesas de Mayo', with quantity 2, unit Subcontracting Unit
LOTE 44
Description: 'Mesas de Mayo', with quantity 2, unit Subcontracting Unit
Price: 542.00 EUR1
- Loss:
MultipleNegativesRankingLoss
with these parameters:{ "scale": 20.0, "similarity_fct": "cos_sim" }
Evaluation Dataset
Unnamed Dataset
- Size: 297 evaluation samples
- Columns:
query
,correct_node
, andscore
- Approximate statistics based on the first 297 samples:
query correct_node score type string string int details - min: 15 tokens
- mean: 55.37 tokens
- max: 154 tokens
- min: 15 tokens
- mean: 46.58 tokens
- max: 435 tokens
- 1: 100.00%
- Samples:
query correct_node score Collect the details that are associated with Lot 7 product '' 'Carro transporte de roupa suja', with quantity 1, unit Subcontracting Unit
Item Description: 'Carro transporte de roupa suja', with quantity 1, unit Subcontracting Unit, priced at 628.00 EUR, Origin: National
1
Collect the details that are associated with Lot 10 product '' 'Mesas para cirurgia', with quantity 2, unit Subcontracting Unit
Details for 'Mesas para cirurgia', with quantity 2, unit Subcontracting Unit:
- LOTE 83
- Price: 940.00 EUR1
Collect the details that are associated with Lot 1 product '' 'PAINEL MULTIPLO ALERGENOS RESPIRATORIOS ', with quantity 1152, unit UND
Product: 'PAINEL MULTIPLO ALERGENOS RESPIRATORIOS ', with quantity 1152, unit UND, Estimated Value: 714.00 EUR
1
- Loss:
MultipleNegativesRankingLoss
with these parameters:{ "scale": 20.0, "similarity_fct": "cos_sim" }
Training Hyperparameters
Non-Default Hyperparameters
eval_strategy
: stepsper_device_train_batch_size
: 16per_device_eval_batch_size
: 16num_train_epochs
: 10warmup_ratio
: 0.1bf16
: Trueload_best_model_at_end
: Truebatch_sampler
: no_duplicates
All Hyperparameters
Click to expand
overwrite_output_dir
: Falsedo_predict
: Falseeval_strategy
: stepsprediction_loss_only
: Trueper_device_train_batch_size
: 16per_device_eval_batch_size
: 16per_gpu_train_batch_size
: Noneper_gpu_eval_batch_size
: Nonegradient_accumulation_steps
: 1eval_accumulation_steps
: Nonetorch_empty_cache_steps
: Nonelearning_rate
: 5e-05weight_decay
: 0.0adam_beta1
: 0.9adam_beta2
: 0.999adam_epsilon
: 1e-08max_grad_norm
: 1.0num_train_epochs
: 10max_steps
: -1lr_scheduler_type
: linearlr_scheduler_kwargs
: {}warmup_ratio
: 0.1warmup_steps
: 0log_level
: passivelog_level_replica
: warninglog_on_each_node
: Truelogging_nan_inf_filter
: Truesave_safetensors
: Truesave_on_each_node
: Falsesave_only_model
: Falserestore_callback_states_from_checkpoint
: Falseno_cuda
: Falseuse_cpu
: Falseuse_mps_device
: Falseseed
: 42data_seed
: Nonejit_mode_eval
: Falseuse_ipex
: Falsebf16
: Truefp16
: Falsefp16_opt_level
: O1half_precision_backend
: autobf16_full_eval
: Falsefp16_full_eval
: Falsetf32
: Nonelocal_rank
: 0ddp_backend
: Nonetpu_num_cores
: Nonetpu_metrics_debug
: Falsedebug
: []dataloader_drop_last
: Falsedataloader_num_workers
: 0dataloader_prefetch_factor
: Nonepast_index
: -1disable_tqdm
: Falseremove_unused_columns
: Truelabel_names
: Noneload_best_model_at_end
: Trueignore_data_skip
: Falsefsdp
: []fsdp_min_num_params
: 0fsdp_config
: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap
: Noneaccelerator_config
: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed
: Nonelabel_smoothing_factor
: 0.0optim
: adamw_torchoptim_args
: Noneadafactor
: Falsegroup_by_length
: Falselength_column_name
: lengthddp_find_unused_parameters
: Noneddp_bucket_cap_mb
: Noneddp_broadcast_buffers
: Falsedataloader_pin_memory
: Truedataloader_persistent_workers
: Falseskip_memory_metrics
: Trueuse_legacy_prediction_loop
: Falsepush_to_hub
: Falseresume_from_checkpoint
: Nonehub_model_id
: Nonehub_strategy
: every_savehub_private_repo
: Falsehub_always_push
: Falsegradient_checkpointing
: Falsegradient_checkpointing_kwargs
: Noneinclude_inputs_for_metrics
: Falseinclude_for_metrics
: []eval_do_concat_batches
: Truefp16_backend
: autopush_to_hub_model_id
: Nonepush_to_hub_organization
: Nonemp_parameters
:auto_find_batch_size
: Falsefull_determinism
: Falsetorchdynamo
: Noneray_scope
: lastddp_timeout
: 1800torch_compile
: Falsetorch_compile_backend
: Nonetorch_compile_mode
: Nonedispatch_batches
: Nonesplit_batches
: Noneinclude_tokens_per_second
: Falseinclude_num_input_tokens_seen
: Falseneftune_noise_alpha
: Noneoptim_target_modules
: Nonebatch_eval_metrics
: Falseeval_on_start
: Falseuse_liger_kernel
: Falseeval_use_gather_object
: Falseaverage_tokens_across_devices
: Falseprompts
: Nonebatch_sampler
: no_duplicatesmulti_dataset_batch_sampler
: proportional
Training Logs
Epoch | Step | Training Loss | Validation Loss | spearman_cosine |
---|---|---|---|---|
0.6711 | 100 | 0.6485 | 0.4410 | nan |
1.3356 | 200 | 0.5026 | 0.4399 | nan |
2.0067 | 300 | 0.491 | 0.4175 | nan |
2.6711 | 400 | 0.442 | 0.4409 | nan |
3.3356 | 500 | 0.3999 | 0.4421 | nan |
4.0067 | 600 | 0.367 | 0.6182 | nan |
4.6711 | 700 | 0.3743 | 0.6104 | nan |
5.3356 | 800 | 0.1972 | 0.6115 | nan |
- The bold row denotes the saved checkpoint.
Framework Versions
- Python: 3.10.14
- Sentence Transformers: 3.3.1
- Transformers: 4.47.0.dev0
- PyTorch: 2.5.1+cu121
- Accelerate: 1.1.1
- Datasets: 3.1.0
- Tokenizers: 0.20.4
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
MultipleNegativesRankingLoss
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}