--- language: - en library_name: sentence-transformers tags: - sentence-transformers - sentence-similarity - feature-extraction - generated base_model: microsoft/mpnet-base metrics: - accuracy widget: - source_sentence: Many youth are lazy. sentences: - Lincoln took his hat off. - At the end of the fourth century was when baked goods flourished. - DOD's common practice for managing this environment has been to create aggressive risk reduction efforts in its programs. - source_sentence: a guy on a bike sentences: - A man is on a bike. - two men sit in a train car - She is the boy's aunt. - source_sentence: The dog is wet. sentences: - A child and small dog running. - The man is riding a sheep. - The man is doing a bike trick. - source_sentence: yeah really no kidding sentences: - 'Really? No kidding! ' - yeah i mean just when uh the they military paid for her education - Changes were made to the Grant Renewal Application to provide extra information to the LSC. - source_sentence: 'Harlem did a great job ' sentences: - 'Missouri was happy to continue it''s planning efforts. ' - yeah i mean just when uh the they military paid for her education - I know exactly. pipeline_tag: sentence-similarity co2_eq_emissions: emissions: 18.165192544667764 source: codecarbon training_type: fine-tuning on_cloud: false cpu_model: 13th Gen Intel(R) Core(TM) i7-13700K ram_total_size: 31.777088165283203 hours_used: 0.141 hardware_used: 1 x NVIDIA GeForce RTX 3090 --- # SentenceTransformer This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [microsoft/mpnet-base](https://huggingface.co/microsoft/mpnet-base) on the [multi_nli](https://huggingface.co/datasets/nyu-mll/multi_nli), [snli](https://huggingface.co/datasets/stanfordnlp/snli) and [stsb](https://huggingface.co/datasets/mteb/stsbenchmark-sts) datasets. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more. ## Model Details ### Model Description - **Model Type:** Sentence Transformer - **Base model:** [microsoft/mpnet-base](https://huggingface.co/microsoft/mpnet-base) - **Maximum Sequence Length:** 384 tokens - **Output Dimensionality:** 768 tokens - **Training Datasets:** - [multi_nli](https://huggingface.co/datasets/nyu-mll/multi_nli) - [snli](https://huggingface.co/datasets/stanfordnlp/snli) - [stsb](https://huggingface.co/datasets/mteb/stsbenchmark-sts) - **Language:** en ### Model Sources - **Documentation:** [Sentence Transformers Documentation](https://sbert.net) - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers) - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers) ### Full Model Architecture ``` SentenceTransformer( (0): Transformer({'max_seq_length': 384, 'do_lower_case': False}) with Transformer model: MPNetModel (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True}) ) ``` ## Usage ### Direct Usage (Sentence Transformers) First install the Sentence Transformers library: ```bash pip install -U sentence-transformers ``` Then you can load this model and run inference. ```python from sentence_transformers import SentenceTransformer # Download from the 🤗 Hub model = SentenceTransformer("tomaarsen/st-v3-test-mpnet-base-allnli-stsb") # Run inference sentences = [ "Harlem did a great job ", "Missouri was happy to continue it's planning efforts. ", "yeah i mean just when uh the they military paid for her education", ] embeddings = model.encode(sentences) print(embeddings.shape) # [3, 768] ``` ## Training Details ### Training Datasets #### multi_nli * Dataset: [multi_nli](https://huggingface.co/datasets/nyu-mll/multi_nli) at [da70db2](https://huggingface.co/datasets/nyu-mll/multi_nli/tree/da70db2af9d09693783c3320c4249840212ee221) * Size: 10,000 training samples * Columns: premise, hypothesis, and label * Approximate statistics based on the first 1000 samples: | | premise | hypothesis | label | |:--------|:-----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:-------------------------------------------------------------------| | type | string | string | int | | details | | | | * Samples: | premise | hypothesis | label | |:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:---------------| | Conceptually cream skimming has two basic dimensions - product and geography. | Product and geography are what make cream skimming work. | 1 | | you know during the season and i guess at at your level uh you lose them to the next level if if they decide to recall the the parent team the Braves decide to call to recall a guy from triple A then a double A guy goes up to replace him and a single A guy goes up to replace him | You lose the things to the following level if the people recall. | 0 | | One of our number will carry out your instructions minutely. | A member of my team will execute your orders with immense precision. | 0 | * Loss: [sentence_transformers.losses.SoftmaxLoss.SoftmaxLoss](https://sbert.net/docs/package_reference/losses.html#softmaxloss) #### snli * Dataset: [snli](https://huggingface.co/datasets/stanfordnlp/snli) at [cdb5c3d](https://huggingface.co/datasets/stanfordnlp/snli/tree/cdb5c3d5eed6ead6e5a341c8e56e669bb666725b) * Size: 10,000 training samples * Columns: snli_premise, hypothesis, and label * Approximate statistics based on the first 1000 samples: | | snli_premise | hypothesis | label | |:--------|:----------------------------------------------------------------------------------|:---------------------------------------------------------------------------------|:-------------------------------------------------------------------| | type | string | string | int | | details | | | | * Samples: | snli_premise | hypothesis | label | |:--------------------------------------------------------------------|:---------------------------------------------------------------|:---------------| | A person on a horse jumps over a broken down airplane. | A person is training his horse for a competition. | 1 | | A person on a horse jumps over a broken down airplane. | A person is at a diner, ordering an omelette. | 2 | | A person on a horse jumps over a broken down airplane. | A person is outdoors, on a horse. | 0 | * Loss: [sentence_transformers.losses.SoftmaxLoss.SoftmaxLoss](https://sbert.net/docs/package_reference/losses.html#softmaxloss) #### stsb * Dataset: [stsb](https://huggingface.co/datasets/mteb/stsbenchmark-sts) at [8913289](https://huggingface.co/datasets/mteb/stsbenchmark-sts/tree/8913289635987208e6e7c72789e4be2fe94b6abd) * Size: 5,749 training samples * Columns: sentence1, sentence2, and label * Approximate statistics based on the first 1000 samples: | | sentence1 | sentence2 | label | |:--------|:---------------------------------------------------------------------------------|:---------------------------------------------------------------------------------|:---------------------------------------------------------------| | type | string | string | float | | details | | | | * Samples: | sentence1 | sentence2 | label | |:-----------------------------------------------------------|:----------------------------------------------------------------------|:------------------| | A plane is taking off. | An air plane is taking off. | 1.0 | | A man is playing a large flute. | A man is playing a flute. | 0.76 | | A man is spreading shreded cheese on a pizza. | A man is spreading shredded cheese on an uncooked pizza. | 0.76 | * Loss: [sentence_transformers.losses.CosineSimilarityLoss.CosineSimilarityLoss](https://sbert.net/docs/package_reference/losses.html#cosinesimilarityloss) with these parameters: ```json { "loss_fct": "torch.nn.modules.loss.MSELoss" } ``` ### Evaluation Datasets #### multi_nli * Dataset: [multi_nli](https://huggingface.co/datasets/nyu-mll/multi_nli) at [da70db2](https://huggingface.co/datasets/nyu-mll/multi_nli/tree/da70db2af9d09693783c3320c4249840212ee221) * Size: 100 evaluation samples * Columns: premise, hypothesis, and label * Approximate statistics based on the first 1000 samples: | | premise | hypothesis | label | |:--------|:-----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:-------------------------------------------------------------------| | type | string | string | int | | details | | | | * Samples: | premise | hypothesis | label | |:---------------------------------------------------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------|:---------------| | The new rights are nice enough | Everyone really likes the newest benefits | 1 | | This site includes a list of all award winners and a searchable database of Government Executive articles. | The Government Executive articles housed on the website are not able to be searched. | 2 | | uh i don't know i i have mixed emotions about him uh sometimes i like him but at the same times i love to see somebody beat him | I like him for the most part, but would still enjoy seeing someone beat him. | 0 | * Loss: [sentence_transformers.losses.SoftmaxLoss.SoftmaxLoss](https://sbert.net/docs/package_reference/losses.html#softmaxloss) #### snli * Dataset: [snli](https://huggingface.co/datasets/stanfordnlp/snli) at [cdb5c3d](https://huggingface.co/datasets/stanfordnlp/snli/tree/cdb5c3d5eed6ead6e5a341c8e56e669bb666725b) * Size: 9,842 evaluation samples * Columns: snli_premise, hypothesis, and label * Approximate statistics based on the first 1000 samples: | | snli_premise | hypothesis | label | |:--------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:-------------------------------------------------------------------| | type | string | string | int | | details | | | | * Samples: | snli_premise | hypothesis | label | |:-------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------|:---------------| | Two women are embracing while holding to go packages. | The sisters are hugging goodbye while holding to go packages after just eating lunch. | 1 | | Two women are embracing while holding to go packages. | Two woman are holding packages. | 0 | | Two women are embracing while holding to go packages. | The men are fighting outside a deli. | 2 | * Loss: [sentence_transformers.losses.SoftmaxLoss.SoftmaxLoss](https://sbert.net/docs/package_reference/losses.html#softmaxloss) #### stsb * Dataset: [stsb](https://huggingface.co/datasets/mteb/stsbenchmark-sts) at [8913289](https://huggingface.co/datasets/mteb/stsbenchmark-sts/tree/8913289635987208e6e7c72789e4be2fe94b6abd) * Size: 1,500 evaluation samples * Columns: sentence1, sentence2, and label * Approximate statistics based on the first 1000 samples: | | sentence1 | sentence2 | label | |:--------|:---------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:---------------------------------------------------------------| | type | string | string | float | | details | | | | * Samples: | sentence1 | sentence2 | label | |:--------------------------------------------------|:------------------------------------------------------|:------------------| | A man with a hard hat is dancing. | A man wearing a hard hat is dancing. | 1.0 | | A young child is riding a horse. | A child is riding a horse. | 0.95 | | A man is feeding a mouse to a snake. | The man is feeding a mouse to the snake. | 1.0 | * Loss: [sentence_transformers.losses.CosineSimilarityLoss.CosineSimilarityLoss](https://sbert.net/docs/package_reference/losses.html#cosinesimilarityloss) with these parameters: ```json { "loss_fct": "torch.nn.modules.loss.MSELoss" } ``` ### Training Hyperparameters #### Non-Default Hyperparameters - per_device_train_batch_size: 128 - per_device_eval_batch_size: 128 - learning_rate: 2e-05 - num_train_epochs: 1 - warmup_ratio: 0.1 - seed: 33 - bf16: True #### All Hyperparameters
Click to expand - overwrite_output_dir: False - do_predict: False - prediction_loss_only: False - per_device_train_batch_size: 128 - per_device_eval_batch_size: 128 - per_gpu_train_batch_size: None - per_gpu_eval_batch_size: None - gradient_accumulation_steps: 1 - eval_accumulation_steps: None - learning_rate: 2e-05 - weight_decay: 0.0 - adam_beta1: 0.9 - adam_beta2: 0.999 - adam_epsilon: 1e-08 - max_grad_norm: 1.0 - num_train_epochs: 1 - max_steps: -1 - lr_scheduler_type: linear - lr_scheduler_kwargs: {} - warmup_ratio: 0.1 - warmup_steps: 0 - log_level: passive - log_level_replica: warning - log_on_each_node: True - logging_nan_inf_filter: True - save_safetensors: True - save_on_each_node: False - save_only_model: False - no_cuda: False - use_cpu: False - use_mps_device: False - seed: 33 - data_seed: None - jit_mode_eval: False - use_ipex: False - bf16: True - fp16: False - fp16_opt_level: O1 - half_precision_backend: auto - bf16_full_eval: False - fp16_full_eval: False - tf32: None - local_rank: 0 - ddp_backend: None - tpu_num_cores: None - tpu_metrics_debug: False - debug: [] - dataloader_drop_last: False - dataloader_num_workers: 0 - dataloader_prefetch_factor: None - past_index: -1 - disable_tqdm: False - remove_unused_columns: True - label_names: None - load_best_model_at_end: False - ignore_data_skip: False - fsdp: [] - fsdp_min_num_params: 0 - fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False} - fsdp_transformer_layer_cls_to_wrap: None - accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True} - deepspeed: None - label_smoothing_factor: 0.0 - optim: adamw_torch - optim_args: None - adafactor: False - group_by_length: False - length_column_name: length - ddp_find_unused_parameters: None - ddp_bucket_cap_mb: None - ddp_broadcast_buffers: None - dataloader_pin_memory: True - dataloader_persistent_workers: False - skip_memory_metrics: True - use_legacy_prediction_loop: False - push_to_hub: False - resume_from_checkpoint: None - hub_model_id: None - hub_strategy: every_save - hub_private_repo: False - hub_always_push: False - gradient_checkpointing: False - gradient_checkpointing_kwargs: None - include_inputs_for_metrics: False - fp16_backend: auto - push_to_hub_model_id: None - push_to_hub_organization: None - mp_parameters: - auto_find_batch_size: False - full_determinism: False - torchdynamo: None - ray_scope: last - ddp_timeout: 1800 - torch_compile: False - torch_compile_backend: None - torch_compile_mode: None - dispatch_batches: None - split_batches: None - include_tokens_per_second: False - include_num_input_tokens_seen: False - neftune_noise_alpha: None - optim_target_modules: None - round_robin_sampler: False
### Training Logs | Epoch | Step | Training Loss | multi_nli | snli | stsb | |:------:|:----:|:-------------:|:---------:|:------:|:------:| | 0.0493 | 10 | 0.9204 | 1.0998 | 1.1022 | 0.2997 | | 0.0985 | 20 | 1.0074 | 1.0983 | 1.0971 | 0.2499 | | 0.1478 | 30 | 1.0037 | 1.0994 | 1.0939 | 0.1667 | | 0.1970 | 40 | 0.7961 | 1.0945 | 1.0877 | 0.0814 | | 0.2463 | 50 | 0.9882 | 1.0950 | 1.0806 | 0.0840 | | 0.2956 | 60 | 0.7814 | 1.0873 | 1.0711 | 0.0681 | | 0.3448 | 70 | 0.6678 | 1.0829 | 1.0673 | 0.0504 | | 0.3941 | 80 | 0.7669 | 1.0771 | 1.0638 | 0.0501 | | 0.4433 | 90 | 0.9718 | 1.0704 | 1.0517 | 0.0482 | | 0.4926 | 100 | 0.8494 | 1.0609 | 1.0388 | 0.0526 | | 0.5419 | 110 | 0.745 | 1.0631 | 1.0285 | 0.0527 | | 0.5911 | 120 | 0.6416 | 1.0564 | 1.0148 | 0.0588 | | 0.6404 | 130 | 1.0331 | 1.0504 | 1.0026 | 0.0627 | | 0.6897 | 140 | 0.8305 | 1.0417 | 1.0023 | 0.0664 | | 0.7389 | 150 | 0.7362 | 1.0282 | 0.9937 | 0.0672 | | 0.7882 | 160 | 0.7164 | 1.0288 | 0.9930 | 0.0688 | | 0.8374 | 170 | 0.8217 | 1.0264 | 0.9819 | 0.0677 | | 0.8867 | 180 | 0.9046 | 1.0200 | 0.9734 | 0.0742 | | 0.9360 | 190 | 0.5327 | 1.0221 | 0.9764 | 0.0698 | | 0.9852 | 200 | 0.8974 | 1.0233 | 0.9776 | 0.0691 | ### Environmental Impact Carbon emissions were measured using [CodeCarbon](https://github.com/mlco2/codecarbon). - **Carbon Emitted**: 0.018 kg of CO2 - **Hours Used**: 0.141 hours ### Training Hardware - **On Cloud**: No - **GPU Model**: 1 x NVIDIA GeForce RTX 3090 - **CPU Model**: 13th Gen Intel(R) Core(TM) i7-13700K - **RAM Size**: 31.78 GB ### Framework Versions - Python: 3.11.6 - Sentence Transformers: 2.7.0.dev0 - Transformers: 4.39.3 - PyTorch: 2.1.0+cu121 - Accelerate: 0.26.1 - Datasets: 2.18.0 - Tokenizers: 0.15.2 ## Citation ### BibTeX #### Sentence Transformers ```bibtex @inproceedings{reimers-2019-sentence-bert, title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks", author = "Reimers, Nils and Gurevych, Iryna", booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing", month = "11", year = "2019", publisher = "Association for Computational Linguistics", url = "https://arxiv.org/abs/1908.10084", } ```