MPNet base trained on AllNLI triplets

This is a sentence-transformers model finetuned from microsoft/mpnet-base on the sxc_med_llm_chemical_gen dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: microsoft/mpnet-base
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 768 tokens
  • Similarity Function: Cosine Similarity
  • Training Dataset:
  • Language: en
  • License: apache-2.0

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: MPNetModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("Saideepthi55/sentencetransformer-ft")
# Run inference
sentences = [
    'With a molecule represented by the SMILES string CNNNCC(=O)N[C@H](C)C[C@@H](C)NCc1ccc2c(c1)CCC2, propose adjustments that can increase its logP value while keeping the output molecule structurally related to the input molecule.',
    'Given a molecule expressed in SMILES string, help me optimize it according to my requirements.',
    'In line with your criteria, I\'ve optimized the molecule and present it as "C[C@H](C[C@@H](C)NC(=O)COC(C)(C)C)NCc1ccc2c(c1)CCC2".',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Triplet

Metric Value
cosine_accuracy 0.6562
dot_accuracy 0.5342
manhattan_accuracy 0.7076
euclidean_accuracy 0.6584
max_accuracy 0.7076

Triplet

Metric Value
cosine_accuracy 0.9804
dot_accuracy 0.0189
manhattan_accuracy 0.9811
euclidean_accuracy 0.9802
max_accuracy 0.9811

Training Details

Training Dataset

sxc_med_llm_chemical_gen

  • Dataset: sxc_med_llm_chemical_gen at ae28874
  • Size: 117,502 training samples
  • Columns: input, instruction, and output
  • Approximate statistics based on the first 1000 samples:
    input instruction output
    type string string string
    details
    • min: 2 tokens
    • mean: 50.03 tokens
    • max: 156 tokens
    • min: 6 tokens
    • mean: 16.19 tokens
    • max: 69 tokens
    • min: 3 tokens
    • mean: 67.94 tokens
    • max: 321 tokens
  • Samples:
    input instruction output
    List the countries of the Middle East Egypt, Israel, Iran, Iraq, Jordan, Kuwait, Lebanon, Qatar, Saudi Arabia, Syria, Turkey, The United Arab Emirates, Oman, and Yemen.
    Support me in refining the molecule CCC(C)=C(CC)c1ccc(Nc2nc3nonc3nc2N2CCC(Oc3ccc(Cl)cc3)CC2)cc1 to have a more drug-like nature, ensuring it remains close to the input molecule. Please inform me of the QED value of the optimized molecule. Help me adapt a molecular structure denoted in SMILES string based on my preferences. I've optimized the molecule according to your requirements, yielding the structure "COC(=O)c1ccc(Nc2nc3nonc3nc2N2CCC(Oc3ccc(Cl)cc3)CC2)cc1" and an estimated QED value of 0.4.
    Create a social media strategy to attract followers on Twitter. The social media strategy for Twitter should involve Tweeting on a regular basis, with content that is engaging and relevant to the target audience. This can include retweetable quotes, links to videos and articles, sharing photos/images, polls, and so on. It is also beneficial for the information to include hashtags relevant to the content and any relevant topics of the industry.

    It is also important to leverage Twitter influencers and accounts, as well as engaging with potential followers regularly and DM them when necessary. Additionally, running campaigns, contests and giveaways to gain more followers should also be part of the strategy. Finally, hashtag research should be conducted to increase the reach of the tweets.
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Evaluation Dataset

sxc_med_llm_chemical_gen

  • Dataset: sxc_med_llm_chemical_gen at ae28874
  • Size: 4,500 evaluation samples
  • Columns: input, instruction, and output
  • Approximate statistics based on the first 1000 samples:
    input instruction output
    type string string string
    details
    • min: 2 tokens
    • mean: 68.49 tokens
    • max: 165 tokens
    • min: 6 tokens
    • mean: 16.89 tokens
    • max: 36 tokens
    • min: 3 tokens
    • mean: 69.16 tokens
    • max: 205 tokens
  • Samples:
    input instruction output
    With a molecule represented by the SMILES string COc1ccc(CN(C(=O)Nc2ccc(C@Hc3ccccc3)c(Cl)c2)C2CC2)cc1, propose adjustments that can decrease its blood-brain barrier permeability (BBBP) value by at least 0.1 compared to the pre-optimized value while keeping the output molecule structurally related to the input molecule. Please inform me of the BBBP value of the optimized molecule. Help me adapt a molecular structure denoted in SMILES string based on my preferences. I've optimized the molecule according to your requirements, yielding the structure "CC(C)c1cccc(Oc2ccc(CN(C(=O)Nc3ccc(C@Hc4ccccc4)c(Cl)c3)C3CC3)cc2)c1" and an estimated BBBP value of 0.34.
    Which modifications can be applied to the molecule Cc1nn(-c2cc3cc(F)ccc3s2)c2ncc(C(=O)N3CCCC3c3nnc4n3CCCCC4)cc12 to increase its water solubility value while ensuring it remains similar to the original molecule? Aid me in refining a molecular structure written in SMILES notation based on my criteria. I've optimized the molecule according to your requirements, resulting in the following structure: "Cc1nn(C)c2ncc(C(=O)N3CCCC3c3nnc4n3CCCCC4)cc12".
    Describe the benefits of having a mentor. Having a mentor can provide tremendous benefits, including guidance, advice, support and connection to a network of contacts. Mentors can offer valuable feedback, help to create goals and strategies, and provide the accountability necessary to carry out the desired goals. They can also provide a fresh perspective which can help to create new ideas and solutions.
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • learning_rate: 2e-05
  • num_train_epochs: 1
  • warmup_ratio: 0.1
  • fp16: True

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 8
  • per_device_eval_batch_size: 8
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 1
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • eval_use_gather_object: False
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss loss all-nli-dev_max_accuracy
0 0 - - 0.7076
0.0174 64 - - 0.7156
0.0068 100 2.7336 2.6486 0.7524
0.0136 200 2.4965 1.9213 0.8162
0.0204 300 1.9042 1.7761 0.822
0.0272 400 1.6856 1.7172 0.8371
0.0340 500 1.6117 1.6916 0.8507
0.0408 600 1.5673 1.6809 0.8976
0.0477 700 1.5984 1.7052 0.9329
0.0545 800 1.5828 1.6841 0.9391
0.0613 900 1.5375 1.6534 0.9267
0.0681 1000 1.5561 1.6619 0.9509
0.0749 1100 1.4911 1.6538 0.9556
0.0817 1200 1.5075 1.6498 0.966
0.0885 1300 1.4722 1.6468 0.946
0.0953 1400 1.4806 1.6981 0.9631
0.1021 1500 1.4788 1.6335 0.9662
0.1089 1600 1.4668 1.6668 0.9731
0.1157 1700 1.4383 1.6473 0.9711
0.1225 1800 1.4549 1.6462 0.9713
0.1294 1900 1.4394 1.6184 0.9718
0.1362 2000 1.3861 1.6156 0.9676
0.1430 2100 1.4111 1.6045 0.9711
0.1498 2200 1.4286 1.6056 0.9782
0.1566 2300 1.4669 1.6174 0.9764
0.1634 2400 1.3761 1.6182 0.9776
0.1702 2500 1.4119 1.6150 0.9738
0.1770 2600 1.3625 1.5984 0.9776
0.1838 2700 1.3726 1.6092 0.9807
0.1906 2800 1.3265 1.6059 0.9789
0.1974 2900 1.3925 1.6004 0.978
0.2042 3000 1.3524 1.5964 0.9773
0.2111 3100 1.342 1.6213 0.9787
0.2179 3200 1.3478 1.6016 0.9822
0.2247 3300 1.3888 1.6038 0.9793
0.2315 3400 1.3328 1.5977 0.9813
0.2383 3500 1.372 1.6114 0.9824
0.2451 3600 1.3046 1.6082 0.9824
0.2519 3700 1.3857 1.5922 0.9824
0.2587 3800 1.3236 1.6127 0.9809
0.2655 3900 1.2929 1.5935 0.9824
0.2723 4000 1.3889 1.6047 0.9831
0.2791 4100 1.3509 1.6030 0.9844
0.2859 4200 1.3455 1.6099 0.9824
0.2928 4300 1.337 1.5939 0.984
0.2996 4400 1.3302 1.6057 0.9827
0.3064 4500 1.3377 1.6254 0.9833
0.3132 4600 1.3221 1.6020 0.9849
0.3200 4700 1.3209 1.6146 0.9824
0.3268 4800 1.354 1.6022 0.9824
0.3336 4900 1.3213 1.6136 0.9822
0.3404 5000 1.3484 1.5920 0.9807
0.3472 5100 1.3412 1.6106 0.978
0.3540 5200 1.3532 1.6001 0.9784
0.3608 5300 1.2984 1.6192 0.9762
0.3676 5400 1.3621 1.5850 0.98
0.3745 5500 1.2839 1.6158 0.9807
0.3813 5600 1.3664 1.6030 0.9831
0.3881 5700 1.327 1.6168 0.9822
0.3949 5800 1.3123 1.6040 0.982
0.4017 5900 1.3019 1.6092 0.9824
0.4085 6000 1.3908 1.5935 0.9829
0.4153 6100 1.3136 1.5916 0.9791
0.4221 6200 1.32 1.6091 0.9807
0.4289 6300 1.3018 1.6052 0.9827
0.4357 6400 1.3144 1.6083 0.9816
0.4425 6500 1.2865 1.6015 0.9829
0.4493 6600 1.2946 1.5882 0.9818
0.4562 6700 1.3245 1.5949 0.9824
0.4630 6800 1.3278 1.6081 0.9831
0.4698 6900 1.2842 1.6086 0.9836
0.4766 7000 1.3231 1.6170 0.9811

Framework Versions

  • Python: 3.10.12
  • Sentence Transformers: 3.1.0
  • Transformers: 4.44.2
  • PyTorch: 2.4.0+cu121
  • Accelerate: 0.34.2
  • Datasets: 3.0.0
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
2
Safetensors
Model size
109M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for Saideepthi55/sentencetransformer-ft

Finetuned
(49)
this model

Dataset used to train Saideepthi55/sentencetransformer-ft

Evaluation results