SentenceTransformer based on thenlper/gte-base

This is a sentence-transformers model finetuned from thenlper/gte-base. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: thenlper/gte-base
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 768 tokens
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
    'And then comes the figure of the human in the age of the Anthropocene, the era when humans act as a geological force on the planet, changing its climate for millennia to come.',
    'â\x80\x98Anthropoceneâ\x80\x99 means, after all, â\x80\x98new Man time.â\x80\x99 For, while the Anthropocene, as a name, claims a generalised human agency responsible for the myriad ecological crises gathered under its auspice, it is simply not the case that, as Ghosh argues, â\x80\x9cevery human being, past and present, has contributed to the present cycle of climate changeâ\x80\x9d (2016, 115).',
    'Minneapolis: University of Minnesota Press, 2007.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Dataset

Unnamed Dataset

  • Size: 134,934 training samples
  • Columns: inp1, inp2, and score
  • Approximate statistics based on the first 1000 samples:
    inp1 inp2 score
    type string string float
    details
    • min: 8 tokens
    • mean: 38.09 tokens
    • max: 512 tokens
    • min: 8 tokens
    • mean: 32.43 tokens
    • max: 512 tokens
    • min: -1.0
    • mean: -0.8
    • max: 1.0
  • Samples:
    inp1 inp2 score
    Following the lead of John Guillory in Cultural Capital: The Problem of Literary Canon Formation, I would argue that such theoretical arguments characteristically concern an “imaginary canon”—imaginary in that there is no specifically defined body of works or authors that make up such a canon. “Brooks’s theory,” guillory writes in Cultural Capital: The Problem of Liter- ary Canon Formation (Chicago: Univ. 1.0
    Cultural Capital: The Problem of Literary Canon Formation. “Brooks’s theory,” guillory writes in Cultural Capital: The Problem of Liter- ary Canon Formation (Chicago: Univ. 1.0
    A partic- ularly good example of the complex operations of critical attention and peda- gogical appropriation occurs with Zora Neale Hurston’s Their Eyes Were Watching God. Similarly, in her article comparing the image patterns in Zora Neale Hurston’s Their Eyes Were Watching God and Beloved, Glenda B. Weathers also observes the dichotomous function of the trees in Beloved and argues, “They posit knowledge of both good and evil” (2005, 201) for black Americans seek- ing freedom from slavery and oppression. 1.0
  • Loss: CoSENTLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "pairwise_cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • num_train_epochs: 2
  • warmup_ratio: 0.1
  • fp16: True

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 2
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional

Training Logs

Click to expand
Epoch Step Training Loss
0.0119 100 2.2069
0.0237 200 2.3883
0.0119 100 1.8358
0.0237 200 1.974
0.0356 300 1.8488
0.0474 400 1.8799
0.0593 500 2.0132
0.0711 600 1.8831
0.0830 700 1.601
0.0948 800 2.0316
0.1067 900 1.9483
0.1185 1000 1.6585
0.1304 1100 1.7986
0.1422 1200 1.4978
0.1541 1300 1.6035
0.1660 1400 1.9908
0.1778 1500 1.2896
0.1897 1600 1.97
0.2015 1700 1.9622
0.2134 1800 1.4706
0.2252 1900 1.5162
0.2371 2000 1.6988
0.2489 2100 1.6552
0.2608 2200 1.7779
0.2726 2300 1.9001
0.2845 2400 1.7802
0.2963 2500 1.6582
0.3082 2600 1.377
0.3201 2700 1.473
0.3319 2800 1.441
0.3438 2900 1.8727
0.3556 3000 1.1545
0.3675 3100 1.7319
0.3793 3200 1.9862
0.3912 3300 1.467
0.4030 3400 2.125
0.4149 3500 2.0474
0.4267 3600 1.7078
0.4386 3700 1.7791
0.4505 3800 1.6368
0.4623 3900 1.4451
0.4742 4000 1.5612
0.4860 4100 1.3163
0.4979 4200 1.5675
0.5097 4300 1.2766
0.5216 4400 1.4506
0.5334 4500 0.9601
0.5453 4600 1.4118
0.5571 4700 1.3951
0.5690 4800 1.2048
0.5808 4900 1.1108
0.5927 5000 1.5696
0.6046 5100 1.4223
0.6164 5200 1.1789
0.6283 5300 1.1573
0.6401 5400 1.4457
0.6520 5500 1.6622
0.6638 5600 1.2699
0.6757 5700 1.0191
0.6875 5800 1.2764
0.6994 5900 0.8999
0.6046 5100 1.5085
0.6164 5200 1.3738
0.6283 5300 1.0537
0.6401 5400 1.3578
0.6520 5500 1.6301
0.6638 5600 1.091
0.6757 5700 0.9261
0.6875 5800 1.1276
0.6994 5900 0.7678
0.6047 5100 1.2021
0.6166 5200 0.8787
0.6284 5300 0.6169
0.6403 5400 0.9881
0.6521 5500 1.1844
0.6640 5600 1.032
0.6758 5700 0.8486
0.6877 5800 1.4845
0.6995 5900 1.4
0.7114 6000 0.9685
0.7233 6100 0.9288
0.7351 6200 1.4682
0.7470 6300 0.6551
0.7588 6400 0.5513
0.7707 6500 0.6092
0.7825 6600 1.3235
0.7944 6700 0.4917
0.8063 6800 0.8944
0.8181 6900 0.9298
0.8300 7000 1.1134
0.8418 7100 0.8254
0.8537 7200 1.3363
0.8655 7300 0.6571
0.8774 7400 0.8209
0.8893 7500 0.6508
0.9011 7600 1.1972
0.9130 7700 1.1095
0.9248 7800 0.8772
0.9367 7900 1.0623
0.9485 8000 0.6073
0.9604 8100 0.8292
0.9723 8200 0.6765
0.9841 8300 0.5103
0.9960 8400 1.0618
1.0078 8500 0.5134
1.0197 8600 0.5203
1.0315 8700 0.6634
1.0434 8800 0.6644
1.0553 8900 0.7459
1.0671 9000 0.5969
1.0790 9100 0.5473
1.0908 9200 0.5495
1.1027 9300 0.5093
1.1145 9400 0.7049
1.1264 9500 0.726
1.1382 9600 0.6512
1.1501 9700 0.5121
1.1620 9800 0.5977
1.1738 9900 0.4933
1.1857 10000 0.8585
1.1975 10100 0.2955
1.2094 10200 0.6972
1.2212 10300 0.454
1.2331 10400 1.1057
1.2450 10500 0.9724
1.2568 10600 0.3057
1.2687 10700 0.5967
1.2805 10800 0.7332
1.2924 10900 0.5382
1.3042 11000 0.625
1.3161 11100 0.5354
1.3280 11200 0.4289
1.3398 11300 0.4243
1.3517 11400 0.6902
1.3635 11500 0.4248
1.3754 11600 0.3743
1.3872 11700 0.5463
1.3991 11800 0.8413
1.4110 11900 0.4748
1.4228 12000 0.56
1.4347 12100 0.9269
1.4465 12200 0.4668
1.4584 12300 0.4842
1.4702 12400 0.5172
1.4821 12500 0.4498
1.4940 12600 0.4695
1.5058 12700 0.2144
1.5177 12800 0.8002
1.5295 12900 0.4022
1.5414 13000 0.4491
1.5532 13100 0.4798
1.5651 13200 0.7489
1.5770 13300 0.6108
1.5888 13400 0.3806
1.6007 13500 0.4164
1.6125 13600 0.6362
1.6244 13700 0.4773
1.6362 13800 0.4875
1.6481 13900 0.5577
1.6599 14000 0.3318
1.6718 14100 0.2959
1.6837 14200 0.3168
1.6955 14300 0.403
1.7074 14400 0.6553
1.7192 14500 0.5814
1.7311 14600 0.3407
1.7429 14700 0.3985
1.7548 14800 0.406
1.7667 14900 0.5986
1.7785 15000 0.7694
1.7904 15100 0.5025
1.8022 15200 0.7199
1.8141 15300 0.4215
1.8259 15400 0.5484
1.8378 15500 0.3551
1.8497 15600 0.3572
1.8615 15700 0.3536
1.8734 15800 0.5116
1.8852 15900 0.7094
1.8971 16000 0.4402
1.9089 16100 0.4095
1.9208 16200 0.2173
1.9327 16300 0.6058
1.9445 16400 0.7796
1.9564 16500 0.5642
1.9682 16600 0.3085
1.9801 16700 0.4308
1.9919 16800 0.3712

Framework Versions

  • Python: 3.11.5
  • Sentence Transformers: 3.0.1
  • Transformers: 4.40.0
  • PyTorch: 2.2.2
  • Accelerate: 0.31.0
  • Datasets: 2.19.2
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

CoSENTLoss

@online{kexuefm-8847,
    title={CoSENT: A more efficient sentence vector scheme than Sentence-BERT},
    author={Su Jianlin},
    year={2022},
    month={Jan},
    url={https://kexue.fm/archives/8847},
}
Downloads last month
14
Safetensors
Model size
109M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for lucianli123/GTE-literary-citations

Base model

thenlper/gte-base
Finetuned
(11)
this model