thtang's picture
Update README.md
cf36132
metadata
pipeline_tag: sentence-similarity
tags:
  - sentence-transformers
  - feature-extraction
  - sentence-similarity
  - transformers

{MODEL_NAME}

This is a sentence-transformers model: It maps sentences & paragraphs to a 384 dimensional dense vector space and can be used for tasks like clustering or semantic search.

Usage (Sentence-Transformers)

Using this model becomes easy when you have sentence-transformers installed:

pip install -U sentence-transformers

Then you can use the model like this:

from sentence_transformers import SentenceTransformer
sentences = ["This is an example sentence", "Each sentence is converted"]

model = SentenceTransformer('{MODEL_NAME}')
embeddings = model.encode(sentences)
print(embeddings)

Usage (HuggingFace Transformers)

Without sentence-transformers, you can use the model like this: First, you pass your input through the transformer model, then you have to apply the right pooling-operation on-top of the contextualized word embeddings.

from transformers import AutoTokenizer, AutoModel
import torch


#Mean Pooling - Take attention mask into account for correct averaging
def mean_pooling(model_output, attention_mask):
    token_embeddings = model_output[0] #First element of model_output contains all token embeddings
    input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
    return torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(input_mask_expanded.sum(1), min=1e-9)


# Sentences we want sentence embeddings for
sentences = ['This is an example sentence', 'Each sentence is converted']

# Load model from HuggingFace Hub
tokenizer = AutoTokenizer.from_pretrained('{MODEL_NAME}')
model = AutoModel.from_pretrained('{MODEL_NAME}')

# Tokenize sentences
encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')

# Compute token embeddings
with torch.no_grad():
    model_output = model(**encoded_input)

# Perform pooling. In this case, mean pooling.
sentence_embeddings = mean_pooling(model_output, encoded_input['attention_mask'])

print("Sentence embeddings:")
print(sentence_embeddings)

Evaluation Results

Model id_raw_acc vn_raw_acc br_raw_acc th_raw_acc my_raw_acc ph_raw_acc sg_raw_acc avg
thtang/SetFit_ALL_200M_itr5 74.24% 64.04% 58.98% 67.24% 70.77% 70.63% 70.58% 68.07%
('ViT-B-16-SigLIP-i18n-256', 'webli') 69.38% 57.92% 47.40% 56.40% 65.20% 65.72% 65.12% 61.02%
('xlm-roberta-base-ViT-B-32', 'laion5b_s13b_b90k') 66.23% 54.05% 49.26% 55.39% 65.61% 66.11% 66.72% 60.48%
('xlm-roberta-large-ViT-H-14', 'frozen_laion5b_s13b_b90k') 66.05% 52.77% 46.46% 53.44% 62.70% 64.40% 64.24% 58.58%
('ViT-L-14', 'commonpool_xl_s13b_b90k') 65.48% 53.80% 46.61% 51.00% 62.01% 64.37% 63.94% 58.17%
('ViT-L-14', 'commonpool_xl_clip_s13b_b90k') 66.73% 49.82% 45.25% 38.32% 63.64% 66.17% 65.29% 56.46%
('ViT-B-16', 'commonpool_l_s1b_b8k') 62.14% 49.25% 45.20% 39.47% 61.15% 63.03% 62.63% 54.69%
('ViT-bigG-14-CLIPA', 'datacomp1b') 69.21% 44.39% 48.25% 20.54% 62.83% 68.15% 66.48% 54.26%
('ViT-bigG-14-CLIPA-336', 'datacomp1b') 69.17% 44.22% 48.06% 20.48% 62.79% 67.74% 66.63% 54.15%
('ViT-H-14-CLIPA-336', 'datacomp1b') 68.03% 42.79% 47.52% 20.82% 62.38% 67.06% 66.92% 53.65%
('ViT-H-14-CLIPA', 'datacomp1b') 68.18% 42.82% 47.33% 20.68% 62.31% 67.26% 66.56% 53.59%
('ViT-B-16', 'commonpool_l_clip_s1b_b8k') 63.68% 42.24% 44.87% 28.59% 62.04% 65.18% 64.97% 53.08%
('ViT-B-32-256', 'datacomp_s34b_b86k') 65.44% 38.94% 43.57% 25.11% 62.39% 65.82% 64.94% 52.32%
('ViT-L-14-CLIPA-336', 'datacomp1b') 66.99% 38.69% 45.25% 20.36% 61.47% 66.78% 65.56% 52.16%
('ViT-L-14-CLIPA', 'datacomp1b') 66.86% 38.34% 45.21% 20.18% 61.51% 66.71% 65.41% 52.03%
('ViT-H-14-CLIPA-336', 'laion2b') 64.62% 35.52% 44.73% 21.27% 61.01% 67.12% 65.76% 51.43%
('ViT-B-32', 'datacomp_xl_s13b_b90k') 64.57% 37.26% 42.06% 22.61% 61.96% 65.59% 64.63% 51.24%
('ViT-L-14', 'datacomp_xl_s13b_b90k') 64.37% 37.78% 40.65% 22.89% 60.72% 65.26% 64.30% 50.85%
('EVA02-E-14-plus', 'laion2b_s9b_b144k') 63.51% 31.79% 42.52% 23.71% 60.74% 64.74% 63.97% 50.14%
('ViT-H-14-quickgelu', 'metaclip_fullcc') 59.75% 34.61% 43.12% 22.69% 60.61% 65.47% 64.58% 50.12%
('ViT-B-16', 'datacomp_xl_s13b_b90k') 63.15% 36.19% 39.81% 22.39% 60.66% 63.96% 63.31% 49.92%
('ViT-bigG-14', 'laion2b_s39b_b160k') 63.03% 31.52% 41.20% 23.65% 60.52% 65.11% 63.99% 49.86%
('ViT-B-16', 'commonpool_l_basic_s1b_b8k') 62.56% 36.99% 40.87% 22.16% 59.57% 63.56% 63.06% 49.82%
intfloat/multilingual-e5-large 52.99% 42.00% 33.92% 47.69% 55.82% 57.76% 58.16% 49.76%
intfloat/multilingual-e5-base 52.06% 43.21% 34.17% 47.41% 55.28% 57.38% 57.45% 49.57%
('ViT-B-16', 'commonpool_l_image_s1b_b8k') 61.48% 36.08% 40.87% 22.62% 59.17% 63.47% 62.80% 49.50%
('convnext_large_d', 'laion2b_s26b_b102k_augreg') 61.61% 29.78% 39.92% 23.49% 60.93% 65.69% 64.60% 49.43%
('EVA01-g-14-plus', 'merged2b_s11b_b114k') 62.34% 30.29% 39.02% 22.80% 60.83% 65.19% 63.49% 49.14%
('convnext_large_d_320', 'laion2b_s29b_b131k_ft') 61.18% 29.24% 39.09% 23.23% 60.65% 65.64% 64.12% 49.02%
('ViT-B-32', 'laion2b_s34b_b79k') 61.21% 29.82% 37.51% 24.49% 60.21% 65.28% 64.08% 48.94%
('convnext_large_d_320', 'laion2b_s29b_b131k_ft_soup') 60.91% 29.28% 38.97% 22.61% 60.78% 65.76% 63.84% 48.88%
('convnext_xxlarge', 'laion2b_s34b_b82k_augreg_soup') 61.55% 30.17% 38.85% 22.30% 60.28% 64.83% 63.22% 48.74%
('ViT-B-32', 'laion2b_e16') 61.44% 28.15% 38.05% 24.49% 59.93% 65.14% 63.87% 48.72%
('ViT-B-16', 'datacomp_l_s1b_b8k') 61.33% 29.35% 38.67% 23.31% 60.29% 64.42% 63.64% 48.72%
('ViT-H-14', 'laion2b_s32b_b79k') 61.45% 29.19% 38.91% 22.64% 60.56% 64.86% 63.30% 48.70%
('EVA02-E-14', 'laion2b_s4b_b115k') 61.63% 29.60% 38.57% 22.89% 60.22% 64.83% 63.18% 48.70%
('convnext_xxlarge', 'laion2b_s34b_b82k_augreg_rewind') 61.24% 30.22% 39.04% 22.40% 60.02% 64.75% 62.99% 48.67%
('ViT-B-32-quickgelu', 'metaclip_fullcc') 58.26% 29.70% 38.99% 23.24% 60.07% 65.67% 64.30% 48.60%
('convnext_xxlarge', 'laion2b_s34b_b82k_augreg') 60.94% 29.90% 39.49% 22.08% 60.10% 64.50% 63.15% 48.59%
('ViT-g-14', 'laion2b_s12b_b42k') 61.46% 27.70% 38.23% 22.46% 60.65% 65.68% 63.87% 48.58%
('ViT-g-14', 'laion2b_s34b_b88k') 60.83% 29.56% 39.37% 21.63% 59.87% 64.68% 63.30% 48.46%
('ViT-L-14-quickgelu', 'metaclip_fullcc') 56.99% 31.07% 40.45% 23.13% 59.21% 64.77% 63.50% 48.45%
intfloat/multilingual-e5-small 49.50% 42.68% 30.96% 47.42% 54.44% 56.44% 57.04% 48.35%
('ViT-B-16-quickgelu', 'metaclip_fullcc') 58.00% 28.59% 37.68% 23.22% 59.42% 65.03% 64.10% 48.01%
('ViT-L-14', 'laion2b_s32b_b82k') 60.18% 28.09% 36.28% 23.70% 59.89% 64.86% 63.01% 48.00%
('ViT-B-32-quickgelu', 'laion400m_e32') 59.74% 25.92% 36.98% 25.19% 59.67% 64.79% 63.68% 48.00%
('ViT-B-32-quickgelu', 'laion400m_e31') 59.86% 25.92% 36.84% 25.20% 59.56% 64.76% 63.79% 47.99%
('convnext_base_w', 'laion2b_s13b_b82k_augreg') 60.97% 27.03% 36.75% 22.90% 59.70% 64.78% 63.46% 47.94%
('ViT-L-14', 'laion400m_e32') 60.01% 24.45% 37.24% 23.95% 59.17% 65.02% 63.78% 47.66%
('EVA01-g-14', 'laion400m_s11b_b41k') 60.51% 25.96% 36.17% 23.69% 59.57% 64.40% 63.22% 47.64%
('ViT-B-16-plus-240', 'laion400m_e32') 59.84% 25.29% 36.80% 23.73% 59.31% 64.99% 63.43% 47.63%
('ViT-B-16-plus-240', 'laion400m_e31') 59.69% 25.22% 36.79% 23.69% 59.44% 64.92% 63.53% 47.61%
('ViT-B-16', 'laion2b_s34b_b88k') 59.82% 27.45% 35.12% 24.41% 59.39% 64.37% 62.66% 47.60%
('ViT-L-14', 'laion400m_e31') 59.91% 24.26% 37.53% 23.84% 59.08% 64.90% 63.64% 47.60%
('ViT-L-16-SigLIP-256', 'webli') 65.54% 20.39% 44.65% 15.18% 60.10% 64.64% 62.44% 47.56%
('roberta-ViT-B-32', 'laion2b_s12b_b32k') 59.70% 25.15% 39.81% 17.10% 59.95% 65.81% 65.00% 47.50%
('ViT-L-14', 'commonpool_xl_laion_s13b_b90k') 58.13% 26.95% 34.93% 23.34% 59.05% 64.51% 63.63% 47.22%
('ViT-B-16-SigLIP', 'webli') 64.31% 19.87% 44.78% 14.87% 58.38% 65.16% 62.44% 47.12%
('ViT-B-16-SigLIP-256', 'webli') 64.24% 20.94% 44.15% 15.35% 58.22% 64.41% 62.43% 47.10%
('ViT-B-16-SigLIP-384', 'webli') 64.36% 20.06% 44.41% 15.11% 58.03% 64.68% 62.10% 46.96%
('ViT-L-16-SigLIP-384', 'webli') 64.49% 20.17% 44.01% 14.80% 58.89% 64.92% 61.39% 46.95%
('ViT-B-32', 'laion400m_e31') 59.06% 26.66% 35.69% 23.68% 58.00% 62.82% 62.68% 46.94%
('ViT-B-16-SigLIP-512', 'webli') 64.28% 19.61% 44.17% 15.09% 57.71% 64.83% 62.44% 46.88%
('convnext_base_w_320', 'laion_aesthetic_s13b_b82k_augreg') 57.60% 26.52% 35.01% 24.43% 57.05% 64.54% 62.74% 46.84%
('ViT-B-16', 'commonpool_l_text_s1b_b8k') 59.57% 28.15% 37.37% 20.89% 57.54% 62.68% 61.63% 46.83%
('ViT-B-32', 'laion400m_e32') 59.05% 26.62% 35.44% 23.54% 58.00% 62.74% 62.27% 46.81%
('convnext_base_w', 'laion2b_s13b_b82k') 58.65% 26.97% 34.80% 23.26% 58.31% 63.39% 61.56% 46.71%
sentence-transformers/gtr-t5-xxl 59.93% 24.82% 40.79% 17.23% 58.41% 64.00% 61.57% 46.68%
('ViT-B-16', 'laion400m_e32') 59.01% 24.34% 35.07% 21.84% 59.04% 64.58% 62.73% 46.66%
('ViT-B-16', 'laion400m_e31') 58.94% 24.20% 34.92% 21.58% 59.11% 64.77% 63.09% 46.66%
('convnext_base', 'laion400m_s13b_b51k') 58.44% 24.99% 34.05% 23.99% 58.33% 63.79% 62.59% 46.60%
('EVA02-L-14-336', 'merged2b_s6b_b61k') 59.54% 23.19% 34.54% 22.36% 59.24% 63.90% 63.40% 46.60%
('coca_ViT-B-32', 'laion2b_s13b_b90k') 58.70% 27.10% 33.22% 24.13% 57.53% 63.56% 61.87% 46.59%
('EVA02-L-14', 'merged2b_s4b_b131k') 59.64% 23.18% 34.62% 22.55% 59.11% 63.86% 63.10% 46.58%
thenlper/gte-large 55.10% 28.16% 33.96% 18.73% 59.50% 65.19% 63.52% 46.31%
('ViT-L-14-quickgelu', 'metaclip_400m') 54.32% 25.87% 34.30% 23.41% 58.50% 64.48% 63.24% 46.30%
('coca_ViT-L-14', 'laion2b_s13b_b90k') 57.92% 25.78% 33.97% 24.17% 57.64% 63.08% 61.55% 46.30%
('coca_ViT-L-14', 'mscoco_finetuned_laion2b_s13b_b90k') 58.07% 25.32% 34.18% 24.60% 57.77% 62.80% 61.28% 46.29%
('ViT-B-32-quickgelu', 'metaclip_400m') 55.85% 27.37% 31.91% 21.76% 58.64% 64.69% 63.11% 46.19%
sentence-transformers/paraphrase-multilingual-mpnet-base-v2 49.03% 32.58% 32.82% 38.43% 55.30% 57.36% 57.34% 46.12%
('convnext_base_w', 'laion_aesthetic_s13b_b82k') 57.39% 25.68% 33.71% 23.82% 56.64% 63.22% 62.22% 46.10%
('ViT-B-32', 'commonpool_m_clip_s128m_b4k') 56.09% 26.70% 38.25% 22.79% 56.52% 61.26% 61.05% 46.09%
('convnext_base_w_320', 'laion_aesthetic_s13b_b82k') 56.96% 25.60% 33.77% 24.64% 56.32% 63.33% 61.87% 46.07%
('ViT-B-16', 'commonpool_l_laion_s1b_b8k') 56.37% 25.70% 31.07% 23.18% 58.65% 63.93% 63.49% 46.06%
('ViT-B-16-quickgelu', 'metaclip_400m') 55.90% 25.88% 32.67% 21.57% 58.65% 64.48% 63.04% 46.03%
intfloat/e5-large 55.45% 28.54% 36.69% 18.15% 57.78% 62.92% 61.83% 45.91%
('EVA02-B-16', 'merged2b_s8b_b131k') 58.08% 24.45% 31.80% 22.36% 58.45% 63.25% 62.44% 45.83%
sentence-transformers/LaBSE 50.30% 32.82% 33.15% 39.79% 54.95% 53.71% 55.06% 45.68%
thenlper/gte-base 55.46% 27.88% 32.77% 17.20% 58.09% 63.68% 62.03% 45.30%
intfloat/e5-large-v2 55.10% 28.06% 35.95% 17.16% 57.16% 61.21% 60.84% 45.07%
('ViT-SO400M-14-SigLIP', 'webli') 60.18% 29.39% 38.90% 13.73% 52.79% 59.15% 56.81% 44.42%
('ViT-B-32', 'commonpool_m_s128m_b4k') 50.30% 32.12% 37.08% 23.02% 53.63% 57.64% 56.91% 44.39%
sentence-transformers/sentence-t5-xxl 50.98% 18.38% 36.37% 16.91% 59.25% 64.82% 63.75% 44.35%
infgrad/stella-base-en-v2 52.42% 26.24% 30.61% 18.81% 56.84% 63.03% 61.67% 44.23%
('RN50x4', 'openai') 56.39% 25.77% 29.99% 21.48% 55.31% 61.02% 59.42% 44.20%
('RN50x16', 'openai') 56.58% 25.09% 29.77% 21.03% 54.81% 61.28% 58.47% 43.86%
('RN101-quickgelu', 'openai') 56.57% 25.83% 29.66% 21.09% 54.50% 60.18% 58.74% 43.80%
('RN101', 'openai') 56.57% 25.83% 29.66% 21.09% 54.50% 60.18% 58.74% 43.80%
llmrails/ember-v1 50.85% 24.76% 31.02% 17.20% 57.62% 63.06% 62.04% 43.79%
sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2 44.88% 28.32% 29.45% 36.40% 53.97% 56.87% 56.14% 43.72%
BAAI/bge-large-en-v1.5 49.81% 25.55% 30.68% 17.41% 56.89% 62.87% 61.72% 43.56%
('RN50x64', 'openai') 55.34% 22.19% 30.63% 20.79% 55.18% 60.93% 59.45% 43.50%
('nllb-clip-large', 'v1') 48.84% 23.45% 33.92% 32.38% 53.67% 55.36% 56.76% 43.48%
BAAI/bge-base-en-v1.5 51.73% 24.30% 31.51% 17.53% 56.21% 62.37% 60.25% 43.42%
intfloat/e5-small 51.31% 27.36% 32.05% 16.66% 55.15% 60.39% 59.06% 43.14%
BAAI/bge-small-en-v1.5 51.37% 25.16% 29.99% 16.13% 56.17% 61.69% 61.01% 43.07%
('ViT-L-14', 'openai') 54.57% 21.44% 30.13% 19.50% 54.99% 60.94% 59.59% 43.02%
('ViT-L-14-336', 'openai') 54.12% 21.52% 30.63% 19.47% 55.41% 60.77% 58.87% 42.97%
intfloat/e5-small-v2 51.41% 26.82% 33.04% 16.30% 54.97% 58.66% 58.68% 42.84%
('ViT-SO400M-14-SigLIP-384', 'webli') 62.68% 15.00% 32.38% 7.32% 56.65% 64.12% 61.49% 42.81%
('RN50-quickgelu', 'openai') 53.15% 24.79% 29.57% 20.84% 53.15% 59.19% 57.59% 42.61%
('RN50', 'openai') 53.15% 24.79% 29.57% 20.84% 53.15% 59.19% 57.59% 42.61%
('ViT-B-16', 'openai') 53.31% 22.22% 27.96% 21.22% 53.68% 59.47% 58.45% 42.33%
('ViT-B-32', 'openai') 52.93% 23.44% 28.70% 20.78% 52.96% 59.38% 57.93% 42.30%
('ViT-B-32-quickgelu', 'openai') 52.93% 23.44% 28.70% 20.78% 52.96% 59.38% 57.93% 42.30%
sentence-transformers/all-MiniLM-L6-v2 50.80% 25.76% 27.04% 15.81% 54.63% 60.07% 59.68% 41.97%
('ViT-B-32', 'commonpool_m_basic_s128m_b4k') 52.54% 22.67% 30.25% 16.17% 53.22% 59.40% 58.31% 41.80%
sentence-transformers/all-MiniLM-L12-v2 48.98% 24.05% 25.74% 16.41% 54.51% 60.38% 58.90% 41.28%
('ViT-B-32', 'commonpool_m_image_s128m_b4k') 51.93% 20.40% 29.44% 16.53% 53.16% 58.71% 58.17% 41.19%
sentence-transformers/clip-ViT-B-32-multilingual-v1 44.45% 27.34% 28.00% 28.25% 50.30% 54.05% 53.39% 40.82%
sentence-transformers/distiluse-base-multilingual-cased-v2 43.51% 23.86% 28.41% 26.90% 53.14% 53.54% 54.38% 40.53%
('ViT-B-32', 'datacomp_m_s128m_b4k') 51.60% 19.45% 26.58% 16.46% 52.54% 59.03% 58.03% 40.53%
('ViT-B-32', 'commonpool_m_text_s128m_b4k') 50.38% 20.31% 27.01% 16.00% 52.61% 58.82% 58.10% 40.46%
sentence-transformers/all-mpnet-base-v2 46.97% 23.15% 24.75% 16.31% 52.66% 59.07% 57.75% 40.09%
('nllb-clip-base', 'v1') 42.72% 23.90% 29.29% 33.96% 48.33% 49.09% 51.21% 39.79%
sentence-transformers/paraphrase-mpnet-base-v2 46.00% 20.45% 26.92% 14.75% 52.89% 58.71% 58.20% 39.70%
sentence-transformers/all-distilroberta-v1 46.74% 22.34% 24.06% 17.59% 51.49% 57.54% 56.45% 39.46%
sentence-transformers/paraphrase-MiniLM-L6-v2 44.92% 23.59% 26.12% 14.23% 51.84% 57.14% 56.03% 39.12%
('ViT-B-32', 'commonpool_m_laion_s128m_b4k') 42.94% 19.21% 19.70% 17.26% 50.84% 57.59% 56.06% 37.66%
('RN50-quickgelu', 'cc12m') 40.71% 18.10% 16.78% 16.23% 45.55% 52.89% 50.77% 34.43%
('RN50', 'cc12m') 39.76% 17.32% 16.15% 15.76% 44.25% 52.46% 49.18% 33.55%
('RN101', 'yfcc15m') 33.79% 18.04% 16.05% 11.10% 37.62% 43.50% 42.45% 28.94%
('RN101-quickgelu', 'yfcc15m') 32.79% 16.89% 14.45% 11.56% 37.77% 42.86% 41.93% 28.32%
('ViT-B-32', 'commonpool_s_clip_s13m_b4k') 33.80% 13.26% 18.82% 12.42% 37.36% 42.09% 40.39% 28.31%
('RN50', 'yfcc15m') 31.81% 15.87% 14.88% 8.99% 37.42% 42.06% 41.19% 27.46%
('RN50-quickgelu', 'yfcc15m') 31.57% 15.90% 14.44% 8.99% 36.81% 41.81% 41.20% 27.24%
('ViT-B-32', 'commonpool_s_s13m_b4k') 29.42% 12.57% 16.82% 11.00% 32.42% 36.77% 35.48% 24.93%
('ViT-B-32', 'commonpool_s_text_s13m_b4k') 28.02% 10.61% 12.49% 9.85% 31.18% 37.10% 34.85% 23.44%
('ViT-B-32', 'commonpool_s_basic_s13m_b4k') 27.87% 10.72% 12.67% 8.16% 30.11% 36.13% 32.68% 22.62%
('coca_ViT-B-32', 'mscoco_finetuned_laion2b_s13b_b90k') 12.60% 7.91% 5.11% 9.96% 17.15% 20.67% 20.32% 13.39%
('ViT-B-32', 'commonpool_s_image_s13m_b4k') 15.20% 5.59% 5.91% 4.63% 16.80% 20.74% 18.78% 12.52%
('ViT-B-32', 'datacomp_s_s13m_b4k') 15.20% 5.59% 5.91% 4.63% 16.80% 20.74% 18.78% 12.52%
('ViT-B-32', 'commonpool_s_laion_s13m_b4k') 11.72% 5.12% 4.05% 4.23% 14.33% 18.82% 16.44% 10.67%

Training

The model was trained with the parameters:

DataLoader:

torch.utils.data.dataloader.DataLoader of length 1468721 with parameters:

{'batch_size': 160, 'sampler': 'torch.utils.data.sampler.RandomSampler', 'batch_sampler': 'torch.utils.data.sampler.BatchSampler'}

Loss:

sentence_transformers.losses.CosineSimilarityLoss.CosineSimilarityLoss

Parameters of the fit()-Method:

{
    "epochs": 1,
    "evaluation_steps": 0,
    "evaluator": "NoneType",
    "max_grad_norm": 1,
    "optimizer_class": "<class 'torch.optim.adamw.AdamW'>",
    "optimizer_params": {
        "lr": 2e-05
    },
    "scheduler": "WarmupLinear",
    "steps_per_epoch": null,
    "warmup_steps": 100,
    "weight_decay": 0.01
}

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 128, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False})
)

Citing & Authors