metadata
pipeline_tag: sentence-similarity
tags:
- sentence-transformers
- feature-extraction
- sentence-similarity
- transformers
{MODEL_NAME}
This is a sentence-transformers model: It maps sentences & paragraphs to a 384 dimensional dense vector space and can be used for tasks like clustering or semantic search.
Usage (Sentence-Transformers)
Using this model becomes easy when you have sentence-transformers installed:
pip install -U sentence-transformers
Then you can use the model like this:
from sentence_transformers import SentenceTransformer
sentences = ["This is an example sentence", "Each sentence is converted"]
model = SentenceTransformer('{MODEL_NAME}')
embeddings = model.encode(sentences)
print(embeddings)
Usage (HuggingFace Transformers)
Without sentence-transformers, you can use the model like this: First, you pass your input through the transformer model, then you have to apply the right pooling-operation on-top of the contextualized word embeddings.
from transformers import AutoTokenizer, AutoModel
import torch
#Mean Pooling - Take attention mask into account for correct averaging
def mean_pooling(model_output, attention_mask):
token_embeddings = model_output[0] #First element of model_output contains all token embeddings
input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
return torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(input_mask_expanded.sum(1), min=1e-9)
# Sentences we want sentence embeddings for
sentences = ['This is an example sentence', 'Each sentence is converted']
# Load model from HuggingFace Hub
tokenizer = AutoTokenizer.from_pretrained('{MODEL_NAME}')
model = AutoModel.from_pretrained('{MODEL_NAME}')
# Tokenize sentences
encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')
# Compute token embeddings
with torch.no_grad():
model_output = model(**encoded_input)
# Perform pooling. In this case, mean pooling.
sentence_embeddings = mean_pooling(model_output, encoded_input['attention_mask'])
print("Sentence embeddings:")
print(sentence_embeddings)
Evaluation Results
Model | id_raw_acc | vn_raw_acc | br_raw_acc | th_raw_acc | my_raw_acc | ph_raw_acc | sg_raw_acc | avg |
---|---|---|---|---|---|---|---|---|
thtang/SetFit_ALL_200M_itr5 | 74.24% | 64.04% | 58.98% | 67.24% | 70.77% | 70.63% | 70.58% | 68.07% |
('ViT-B-16-SigLIP-i18n-256', 'webli') | 69.38% | 57.92% | 47.40% | 56.40% | 65.20% | 65.72% | 65.12% | 61.02% |
('xlm-roberta-base-ViT-B-32', 'laion5b_s13b_b90k') | 66.23% | 54.05% | 49.26% | 55.39% | 65.61% | 66.11% | 66.72% | 60.48% |
('xlm-roberta-large-ViT-H-14', 'frozen_laion5b_s13b_b90k') | 66.05% | 52.77% | 46.46% | 53.44% | 62.70% | 64.40% | 64.24% | 58.58% |
('ViT-L-14', 'commonpool_xl_s13b_b90k') | 65.48% | 53.80% | 46.61% | 51.00% | 62.01% | 64.37% | 63.94% | 58.17% |
('ViT-L-14', 'commonpool_xl_clip_s13b_b90k') | 66.73% | 49.82% | 45.25% | 38.32% | 63.64% | 66.17% | 65.29% | 56.46% |
('ViT-B-16', 'commonpool_l_s1b_b8k') | 62.14% | 49.25% | 45.20% | 39.47% | 61.15% | 63.03% | 62.63% | 54.69% |
('ViT-bigG-14-CLIPA', 'datacomp1b') | 69.21% | 44.39% | 48.25% | 20.54% | 62.83% | 68.15% | 66.48% | 54.26% |
('ViT-bigG-14-CLIPA-336', 'datacomp1b') | 69.17% | 44.22% | 48.06% | 20.48% | 62.79% | 67.74% | 66.63% | 54.15% |
('ViT-H-14-CLIPA-336', 'datacomp1b') | 68.03% | 42.79% | 47.52% | 20.82% | 62.38% | 67.06% | 66.92% | 53.65% |
('ViT-H-14-CLIPA', 'datacomp1b') | 68.18% | 42.82% | 47.33% | 20.68% | 62.31% | 67.26% | 66.56% | 53.59% |
('ViT-B-16', 'commonpool_l_clip_s1b_b8k') | 63.68% | 42.24% | 44.87% | 28.59% | 62.04% | 65.18% | 64.97% | 53.08% |
('ViT-B-32-256', 'datacomp_s34b_b86k') | 65.44% | 38.94% | 43.57% | 25.11% | 62.39% | 65.82% | 64.94% | 52.32% |
('ViT-L-14-CLIPA-336', 'datacomp1b') | 66.99% | 38.69% | 45.25% | 20.36% | 61.47% | 66.78% | 65.56% | 52.16% |
('ViT-L-14-CLIPA', 'datacomp1b') | 66.86% | 38.34% | 45.21% | 20.18% | 61.51% | 66.71% | 65.41% | 52.03% |
('ViT-H-14-CLIPA-336', 'laion2b') | 64.62% | 35.52% | 44.73% | 21.27% | 61.01% | 67.12% | 65.76% | 51.43% |
('ViT-B-32', 'datacomp_xl_s13b_b90k') | 64.57% | 37.26% | 42.06% | 22.61% | 61.96% | 65.59% | 64.63% | 51.24% |
('ViT-L-14', 'datacomp_xl_s13b_b90k') | 64.37% | 37.78% | 40.65% | 22.89% | 60.72% | 65.26% | 64.30% | 50.85% |
('EVA02-E-14-plus', 'laion2b_s9b_b144k') | 63.51% | 31.79% | 42.52% | 23.71% | 60.74% | 64.74% | 63.97% | 50.14% |
('ViT-H-14-quickgelu', 'metaclip_fullcc') | 59.75% | 34.61% | 43.12% | 22.69% | 60.61% | 65.47% | 64.58% | 50.12% |
('ViT-B-16', 'datacomp_xl_s13b_b90k') | 63.15% | 36.19% | 39.81% | 22.39% | 60.66% | 63.96% | 63.31% | 49.92% |
('ViT-bigG-14', 'laion2b_s39b_b160k') | 63.03% | 31.52% | 41.20% | 23.65% | 60.52% | 65.11% | 63.99% | 49.86% |
('ViT-B-16', 'commonpool_l_basic_s1b_b8k') | 62.56% | 36.99% | 40.87% | 22.16% | 59.57% | 63.56% | 63.06% | 49.82% |
intfloat/multilingual-e5-large | 52.99% | 42.00% | 33.92% | 47.69% | 55.82% | 57.76% | 58.16% | 49.76% |
intfloat/multilingual-e5-base | 52.06% | 43.21% | 34.17% | 47.41% | 55.28% | 57.38% | 57.45% | 49.57% |
('ViT-B-16', 'commonpool_l_image_s1b_b8k') | 61.48% | 36.08% | 40.87% | 22.62% | 59.17% | 63.47% | 62.80% | 49.50% |
('convnext_large_d', 'laion2b_s26b_b102k_augreg') | 61.61% | 29.78% | 39.92% | 23.49% | 60.93% | 65.69% | 64.60% | 49.43% |
('EVA01-g-14-plus', 'merged2b_s11b_b114k') | 62.34% | 30.29% | 39.02% | 22.80% | 60.83% | 65.19% | 63.49% | 49.14% |
('convnext_large_d_320', 'laion2b_s29b_b131k_ft') | 61.18% | 29.24% | 39.09% | 23.23% | 60.65% | 65.64% | 64.12% | 49.02% |
('ViT-B-32', 'laion2b_s34b_b79k') | 61.21% | 29.82% | 37.51% | 24.49% | 60.21% | 65.28% | 64.08% | 48.94% |
('convnext_large_d_320', 'laion2b_s29b_b131k_ft_soup') | 60.91% | 29.28% | 38.97% | 22.61% | 60.78% | 65.76% | 63.84% | 48.88% |
('convnext_xxlarge', 'laion2b_s34b_b82k_augreg_soup') | 61.55% | 30.17% | 38.85% | 22.30% | 60.28% | 64.83% | 63.22% | 48.74% |
('ViT-B-32', 'laion2b_e16') | 61.44% | 28.15% | 38.05% | 24.49% | 59.93% | 65.14% | 63.87% | 48.72% |
('ViT-B-16', 'datacomp_l_s1b_b8k') | 61.33% | 29.35% | 38.67% | 23.31% | 60.29% | 64.42% | 63.64% | 48.72% |
('ViT-H-14', 'laion2b_s32b_b79k') | 61.45% | 29.19% | 38.91% | 22.64% | 60.56% | 64.86% | 63.30% | 48.70% |
('EVA02-E-14', 'laion2b_s4b_b115k') | 61.63% | 29.60% | 38.57% | 22.89% | 60.22% | 64.83% | 63.18% | 48.70% |
('convnext_xxlarge', 'laion2b_s34b_b82k_augreg_rewind') | 61.24% | 30.22% | 39.04% | 22.40% | 60.02% | 64.75% | 62.99% | 48.67% |
('ViT-B-32-quickgelu', 'metaclip_fullcc') | 58.26% | 29.70% | 38.99% | 23.24% | 60.07% | 65.67% | 64.30% | 48.60% |
('convnext_xxlarge', 'laion2b_s34b_b82k_augreg') | 60.94% | 29.90% | 39.49% | 22.08% | 60.10% | 64.50% | 63.15% | 48.59% |
('ViT-g-14', 'laion2b_s12b_b42k') | 61.46% | 27.70% | 38.23% | 22.46% | 60.65% | 65.68% | 63.87% | 48.58% |
('ViT-g-14', 'laion2b_s34b_b88k') | 60.83% | 29.56% | 39.37% | 21.63% | 59.87% | 64.68% | 63.30% | 48.46% |
('ViT-L-14-quickgelu', 'metaclip_fullcc') | 56.99% | 31.07% | 40.45% | 23.13% | 59.21% | 64.77% | 63.50% | 48.45% |
intfloat/multilingual-e5-small | 49.50% | 42.68% | 30.96% | 47.42% | 54.44% | 56.44% | 57.04% | 48.35% |
('ViT-B-16-quickgelu', 'metaclip_fullcc') | 58.00% | 28.59% | 37.68% | 23.22% | 59.42% | 65.03% | 64.10% | 48.01% |
('ViT-L-14', 'laion2b_s32b_b82k') | 60.18% | 28.09% | 36.28% | 23.70% | 59.89% | 64.86% | 63.01% | 48.00% |
('ViT-B-32-quickgelu', 'laion400m_e32') | 59.74% | 25.92% | 36.98% | 25.19% | 59.67% | 64.79% | 63.68% | 48.00% |
('ViT-B-32-quickgelu', 'laion400m_e31') | 59.86% | 25.92% | 36.84% | 25.20% | 59.56% | 64.76% | 63.79% | 47.99% |
('convnext_base_w', 'laion2b_s13b_b82k_augreg') | 60.97% | 27.03% | 36.75% | 22.90% | 59.70% | 64.78% | 63.46% | 47.94% |
('ViT-L-14', 'laion400m_e32') | 60.01% | 24.45% | 37.24% | 23.95% | 59.17% | 65.02% | 63.78% | 47.66% |
('EVA01-g-14', 'laion400m_s11b_b41k') | 60.51% | 25.96% | 36.17% | 23.69% | 59.57% | 64.40% | 63.22% | 47.64% |
('ViT-B-16-plus-240', 'laion400m_e32') | 59.84% | 25.29% | 36.80% | 23.73% | 59.31% | 64.99% | 63.43% | 47.63% |
('ViT-B-16-plus-240', 'laion400m_e31') | 59.69% | 25.22% | 36.79% | 23.69% | 59.44% | 64.92% | 63.53% | 47.61% |
('ViT-B-16', 'laion2b_s34b_b88k') | 59.82% | 27.45% | 35.12% | 24.41% | 59.39% | 64.37% | 62.66% | 47.60% |
('ViT-L-14', 'laion400m_e31') | 59.91% | 24.26% | 37.53% | 23.84% | 59.08% | 64.90% | 63.64% | 47.60% |
('ViT-L-16-SigLIP-256', 'webli') | 65.54% | 20.39% | 44.65% | 15.18% | 60.10% | 64.64% | 62.44% | 47.56% |
('roberta-ViT-B-32', 'laion2b_s12b_b32k') | 59.70% | 25.15% | 39.81% | 17.10% | 59.95% | 65.81% | 65.00% | 47.50% |
('ViT-L-14', 'commonpool_xl_laion_s13b_b90k') | 58.13% | 26.95% | 34.93% | 23.34% | 59.05% | 64.51% | 63.63% | 47.22% |
('ViT-B-16-SigLIP', 'webli') | 64.31% | 19.87% | 44.78% | 14.87% | 58.38% | 65.16% | 62.44% | 47.12% |
('ViT-B-16-SigLIP-256', 'webli') | 64.24% | 20.94% | 44.15% | 15.35% | 58.22% | 64.41% | 62.43% | 47.10% |
('ViT-B-16-SigLIP-384', 'webli') | 64.36% | 20.06% | 44.41% | 15.11% | 58.03% | 64.68% | 62.10% | 46.96% |
('ViT-L-16-SigLIP-384', 'webli') | 64.49% | 20.17% | 44.01% | 14.80% | 58.89% | 64.92% | 61.39% | 46.95% |
('ViT-B-32', 'laion400m_e31') | 59.06% | 26.66% | 35.69% | 23.68% | 58.00% | 62.82% | 62.68% | 46.94% |
('ViT-B-16-SigLIP-512', 'webli') | 64.28% | 19.61% | 44.17% | 15.09% | 57.71% | 64.83% | 62.44% | 46.88% |
('convnext_base_w_320', 'laion_aesthetic_s13b_b82k_augreg') | 57.60% | 26.52% | 35.01% | 24.43% | 57.05% | 64.54% | 62.74% | 46.84% |
('ViT-B-16', 'commonpool_l_text_s1b_b8k') | 59.57% | 28.15% | 37.37% | 20.89% | 57.54% | 62.68% | 61.63% | 46.83% |
('ViT-B-32', 'laion400m_e32') | 59.05% | 26.62% | 35.44% | 23.54% | 58.00% | 62.74% | 62.27% | 46.81% |
('convnext_base_w', 'laion2b_s13b_b82k') | 58.65% | 26.97% | 34.80% | 23.26% | 58.31% | 63.39% | 61.56% | 46.71% |
sentence-transformers/gtr-t5-xxl | 59.93% | 24.82% | 40.79% | 17.23% | 58.41% | 64.00% | 61.57% | 46.68% |
('ViT-B-16', 'laion400m_e32') | 59.01% | 24.34% | 35.07% | 21.84% | 59.04% | 64.58% | 62.73% | 46.66% |
('ViT-B-16', 'laion400m_e31') | 58.94% | 24.20% | 34.92% | 21.58% | 59.11% | 64.77% | 63.09% | 46.66% |
('convnext_base', 'laion400m_s13b_b51k') | 58.44% | 24.99% | 34.05% | 23.99% | 58.33% | 63.79% | 62.59% | 46.60% |
('EVA02-L-14-336', 'merged2b_s6b_b61k') | 59.54% | 23.19% | 34.54% | 22.36% | 59.24% | 63.90% | 63.40% | 46.60% |
('coca_ViT-B-32', 'laion2b_s13b_b90k') | 58.70% | 27.10% | 33.22% | 24.13% | 57.53% | 63.56% | 61.87% | 46.59% |
('EVA02-L-14', 'merged2b_s4b_b131k') | 59.64% | 23.18% | 34.62% | 22.55% | 59.11% | 63.86% | 63.10% | 46.58% |
thenlper/gte-large | 55.10% | 28.16% | 33.96% | 18.73% | 59.50% | 65.19% | 63.52% | 46.31% |
('ViT-L-14-quickgelu', 'metaclip_400m') | 54.32% | 25.87% | 34.30% | 23.41% | 58.50% | 64.48% | 63.24% | 46.30% |
('coca_ViT-L-14', 'laion2b_s13b_b90k') | 57.92% | 25.78% | 33.97% | 24.17% | 57.64% | 63.08% | 61.55% | 46.30% |
('coca_ViT-L-14', 'mscoco_finetuned_laion2b_s13b_b90k') | 58.07% | 25.32% | 34.18% | 24.60% | 57.77% | 62.80% | 61.28% | 46.29% |
('ViT-B-32-quickgelu', 'metaclip_400m') | 55.85% | 27.37% | 31.91% | 21.76% | 58.64% | 64.69% | 63.11% | 46.19% |
sentence-transformers/paraphrase-multilingual-mpnet-base-v2 | 49.03% | 32.58% | 32.82% | 38.43% | 55.30% | 57.36% | 57.34% | 46.12% |
('convnext_base_w', 'laion_aesthetic_s13b_b82k') | 57.39% | 25.68% | 33.71% | 23.82% | 56.64% | 63.22% | 62.22% | 46.10% |
('ViT-B-32', 'commonpool_m_clip_s128m_b4k') | 56.09% | 26.70% | 38.25% | 22.79% | 56.52% | 61.26% | 61.05% | 46.09% |
('convnext_base_w_320', 'laion_aesthetic_s13b_b82k') | 56.96% | 25.60% | 33.77% | 24.64% | 56.32% | 63.33% | 61.87% | 46.07% |
('ViT-B-16', 'commonpool_l_laion_s1b_b8k') | 56.37% | 25.70% | 31.07% | 23.18% | 58.65% | 63.93% | 63.49% | 46.06% |
('ViT-B-16-quickgelu', 'metaclip_400m') | 55.90% | 25.88% | 32.67% | 21.57% | 58.65% | 64.48% | 63.04% | 46.03% |
intfloat/e5-large | 55.45% | 28.54% | 36.69% | 18.15% | 57.78% | 62.92% | 61.83% | 45.91% |
('EVA02-B-16', 'merged2b_s8b_b131k') | 58.08% | 24.45% | 31.80% | 22.36% | 58.45% | 63.25% | 62.44% | 45.83% |
sentence-transformers/LaBSE | 50.30% | 32.82% | 33.15% | 39.79% | 54.95% | 53.71% | 55.06% | 45.68% |
thenlper/gte-base | 55.46% | 27.88% | 32.77% | 17.20% | 58.09% | 63.68% | 62.03% | 45.30% |
intfloat/e5-large-v2 | 55.10% | 28.06% | 35.95% | 17.16% | 57.16% | 61.21% | 60.84% | 45.07% |
('ViT-SO400M-14-SigLIP', 'webli') | 60.18% | 29.39% | 38.90% | 13.73% | 52.79% | 59.15% | 56.81% | 44.42% |
('ViT-B-32', 'commonpool_m_s128m_b4k') | 50.30% | 32.12% | 37.08% | 23.02% | 53.63% | 57.64% | 56.91% | 44.39% |
sentence-transformers/sentence-t5-xxl | 50.98% | 18.38% | 36.37% | 16.91% | 59.25% | 64.82% | 63.75% | 44.35% |
infgrad/stella-base-en-v2 | 52.42% | 26.24% | 30.61% | 18.81% | 56.84% | 63.03% | 61.67% | 44.23% |
('RN50x4', 'openai') | 56.39% | 25.77% | 29.99% | 21.48% | 55.31% | 61.02% | 59.42% | 44.20% |
('RN50x16', 'openai') | 56.58% | 25.09% | 29.77% | 21.03% | 54.81% | 61.28% | 58.47% | 43.86% |
('RN101-quickgelu', 'openai') | 56.57% | 25.83% | 29.66% | 21.09% | 54.50% | 60.18% | 58.74% | 43.80% |
('RN101', 'openai') | 56.57% | 25.83% | 29.66% | 21.09% | 54.50% | 60.18% | 58.74% | 43.80% |
llmrails/ember-v1 | 50.85% | 24.76% | 31.02% | 17.20% | 57.62% | 63.06% | 62.04% | 43.79% |
sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2 | 44.88% | 28.32% | 29.45% | 36.40% | 53.97% | 56.87% | 56.14% | 43.72% |
BAAI/bge-large-en-v1.5 | 49.81% | 25.55% | 30.68% | 17.41% | 56.89% | 62.87% | 61.72% | 43.56% |
('RN50x64', 'openai') | 55.34% | 22.19% | 30.63% | 20.79% | 55.18% | 60.93% | 59.45% | 43.50% |
('nllb-clip-large', 'v1') | 48.84% | 23.45% | 33.92% | 32.38% | 53.67% | 55.36% | 56.76% | 43.48% |
BAAI/bge-base-en-v1.5 | 51.73% | 24.30% | 31.51% | 17.53% | 56.21% | 62.37% | 60.25% | 43.42% |
intfloat/e5-small | 51.31% | 27.36% | 32.05% | 16.66% | 55.15% | 60.39% | 59.06% | 43.14% |
BAAI/bge-small-en-v1.5 | 51.37% | 25.16% | 29.99% | 16.13% | 56.17% | 61.69% | 61.01% | 43.07% |
('ViT-L-14', 'openai') | 54.57% | 21.44% | 30.13% | 19.50% | 54.99% | 60.94% | 59.59% | 43.02% |
('ViT-L-14-336', 'openai') | 54.12% | 21.52% | 30.63% | 19.47% | 55.41% | 60.77% | 58.87% | 42.97% |
intfloat/e5-small-v2 | 51.41% | 26.82% | 33.04% | 16.30% | 54.97% | 58.66% | 58.68% | 42.84% |
('ViT-SO400M-14-SigLIP-384', 'webli') | 62.68% | 15.00% | 32.38% | 7.32% | 56.65% | 64.12% | 61.49% | 42.81% |
('RN50-quickgelu', 'openai') | 53.15% | 24.79% | 29.57% | 20.84% | 53.15% | 59.19% | 57.59% | 42.61% |
('RN50', 'openai') | 53.15% | 24.79% | 29.57% | 20.84% | 53.15% | 59.19% | 57.59% | 42.61% |
('ViT-B-16', 'openai') | 53.31% | 22.22% | 27.96% | 21.22% | 53.68% | 59.47% | 58.45% | 42.33% |
('ViT-B-32', 'openai') | 52.93% | 23.44% | 28.70% | 20.78% | 52.96% | 59.38% | 57.93% | 42.30% |
('ViT-B-32-quickgelu', 'openai') | 52.93% | 23.44% | 28.70% | 20.78% | 52.96% | 59.38% | 57.93% | 42.30% |
sentence-transformers/all-MiniLM-L6-v2 | 50.80% | 25.76% | 27.04% | 15.81% | 54.63% | 60.07% | 59.68% | 41.97% |
('ViT-B-32', 'commonpool_m_basic_s128m_b4k') | 52.54% | 22.67% | 30.25% | 16.17% | 53.22% | 59.40% | 58.31% | 41.80% |
sentence-transformers/all-MiniLM-L12-v2 | 48.98% | 24.05% | 25.74% | 16.41% | 54.51% | 60.38% | 58.90% | 41.28% |
('ViT-B-32', 'commonpool_m_image_s128m_b4k') | 51.93% | 20.40% | 29.44% | 16.53% | 53.16% | 58.71% | 58.17% | 41.19% |
sentence-transformers/clip-ViT-B-32-multilingual-v1 | 44.45% | 27.34% | 28.00% | 28.25% | 50.30% | 54.05% | 53.39% | 40.82% |
sentence-transformers/distiluse-base-multilingual-cased-v2 | 43.51% | 23.86% | 28.41% | 26.90% | 53.14% | 53.54% | 54.38% | 40.53% |
('ViT-B-32', 'datacomp_m_s128m_b4k') | 51.60% | 19.45% | 26.58% | 16.46% | 52.54% | 59.03% | 58.03% | 40.53% |
('ViT-B-32', 'commonpool_m_text_s128m_b4k') | 50.38% | 20.31% | 27.01% | 16.00% | 52.61% | 58.82% | 58.10% | 40.46% |
sentence-transformers/all-mpnet-base-v2 | 46.97% | 23.15% | 24.75% | 16.31% | 52.66% | 59.07% | 57.75% | 40.09% |
('nllb-clip-base', 'v1') | 42.72% | 23.90% | 29.29% | 33.96% | 48.33% | 49.09% | 51.21% | 39.79% |
sentence-transformers/paraphrase-mpnet-base-v2 | 46.00% | 20.45% | 26.92% | 14.75% | 52.89% | 58.71% | 58.20% | 39.70% |
sentence-transformers/all-distilroberta-v1 | 46.74% | 22.34% | 24.06% | 17.59% | 51.49% | 57.54% | 56.45% | 39.46% |
sentence-transformers/paraphrase-MiniLM-L6-v2 | 44.92% | 23.59% | 26.12% | 14.23% | 51.84% | 57.14% | 56.03% | 39.12% |
('ViT-B-32', 'commonpool_m_laion_s128m_b4k') | 42.94% | 19.21% | 19.70% | 17.26% | 50.84% | 57.59% | 56.06% | 37.66% |
('RN50-quickgelu', 'cc12m') | 40.71% | 18.10% | 16.78% | 16.23% | 45.55% | 52.89% | 50.77% | 34.43% |
('RN50', 'cc12m') | 39.76% | 17.32% | 16.15% | 15.76% | 44.25% | 52.46% | 49.18% | 33.55% |
('RN101', 'yfcc15m') | 33.79% | 18.04% | 16.05% | 11.10% | 37.62% | 43.50% | 42.45% | 28.94% |
('RN101-quickgelu', 'yfcc15m') | 32.79% | 16.89% | 14.45% | 11.56% | 37.77% | 42.86% | 41.93% | 28.32% |
('ViT-B-32', 'commonpool_s_clip_s13m_b4k') | 33.80% | 13.26% | 18.82% | 12.42% | 37.36% | 42.09% | 40.39% | 28.31% |
('RN50', 'yfcc15m') | 31.81% | 15.87% | 14.88% | 8.99% | 37.42% | 42.06% | 41.19% | 27.46% |
('RN50-quickgelu', 'yfcc15m') | 31.57% | 15.90% | 14.44% | 8.99% | 36.81% | 41.81% | 41.20% | 27.24% |
('ViT-B-32', 'commonpool_s_s13m_b4k') | 29.42% | 12.57% | 16.82% | 11.00% | 32.42% | 36.77% | 35.48% | 24.93% |
('ViT-B-32', 'commonpool_s_text_s13m_b4k') | 28.02% | 10.61% | 12.49% | 9.85% | 31.18% | 37.10% | 34.85% | 23.44% |
('ViT-B-32', 'commonpool_s_basic_s13m_b4k') | 27.87% | 10.72% | 12.67% | 8.16% | 30.11% | 36.13% | 32.68% | 22.62% |
('coca_ViT-B-32', 'mscoco_finetuned_laion2b_s13b_b90k') | 12.60% | 7.91% | 5.11% | 9.96% | 17.15% | 20.67% | 20.32% | 13.39% |
('ViT-B-32', 'commonpool_s_image_s13m_b4k') | 15.20% | 5.59% | 5.91% | 4.63% | 16.80% | 20.74% | 18.78% | 12.52% |
('ViT-B-32', 'datacomp_s_s13m_b4k') | 15.20% | 5.59% | 5.91% | 4.63% | 16.80% | 20.74% | 18.78% | 12.52% |
('ViT-B-32', 'commonpool_s_laion_s13m_b4k') | 11.72% | 5.12% | 4.05% | 4.23% | 14.33% | 18.82% | 16.44% | 10.67% |
Training
The model was trained with the parameters:
DataLoader:
torch.utils.data.dataloader.DataLoader
of length 1468721 with parameters:
{'batch_size': 160, 'sampler': 'torch.utils.data.sampler.RandomSampler', 'batch_sampler': 'torch.utils.data.sampler.BatchSampler'}
Loss:
sentence_transformers.losses.CosineSimilarityLoss.CosineSimilarityLoss
Parameters of the fit()-Method:
{
"epochs": 1,
"evaluation_steps": 0,
"evaluator": "NoneType",
"max_grad_norm": 1,
"optimizer_class": "<class 'torch.optim.adamw.AdamW'>",
"optimizer_params": {
"lr": 2e-05
},
"scheduler": "WarmupLinear",
"steps_per_epoch": null,
"warmup_steps": 100,
"weight_decay": 0.01
}
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 128, 'do_lower_case': False}) with Transformer model: BertModel
(1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False})
)