|
--- |
|
pipeline_tag: sentence-similarity |
|
tags: |
|
- sentence-transformers |
|
- feature-extraction |
|
- sentence-similarity |
|
- transformers |
|
- mteb |
|
model-index: |
|
- name: mmlw-e5-small |
|
results: |
|
- task: |
|
type: Clustering |
|
dataset: |
|
type: PL-MTEB/8tags-clustering |
|
name: MTEB 8TagsClustering |
|
config: default |
|
split: test |
|
revision: None |
|
metrics: |
|
- type: v_measure |
|
value: 31.772224277808153 |
|
- task: |
|
type: Classification |
|
dataset: |
|
type: PL-MTEB/allegro-reviews |
|
name: MTEB AllegroReviews |
|
config: default |
|
split: test |
|
revision: None |
|
metrics: |
|
- type: accuracy |
|
value: 33.03180914512922 |
|
- type: f1 |
|
value: 29.800304217426167 |
|
- task: |
|
type: Retrieval |
|
dataset: |
|
type: arguana-pl |
|
name: MTEB ArguAna-PL |
|
config: default |
|
split: test |
|
revision: None |
|
metrics: |
|
- type: map_at_1 |
|
value: 28.804999999999996 |
|
- type: map_at_10 |
|
value: 45.327 |
|
- type: map_at_100 |
|
value: 46.17 |
|
- type: map_at_1000 |
|
value: 46.177 |
|
- type: map_at_3 |
|
value: 40.528999999999996 |
|
- type: map_at_5 |
|
value: 43.335 |
|
- type: mrr_at_1 |
|
value: 30.299 |
|
- type: mrr_at_10 |
|
value: 45.763 |
|
- type: mrr_at_100 |
|
value: 46.641 |
|
- type: mrr_at_1000 |
|
value: 46.648 |
|
- type: mrr_at_3 |
|
value: 41.074 |
|
- type: mrr_at_5 |
|
value: 43.836999999999996 |
|
- type: ndcg_at_1 |
|
value: 28.804999999999996 |
|
- type: ndcg_at_10 |
|
value: 54.308 |
|
- type: ndcg_at_100 |
|
value: 57.879000000000005 |
|
- type: ndcg_at_1000 |
|
value: 58.048 |
|
- type: ndcg_at_3 |
|
value: 44.502 |
|
- type: ndcg_at_5 |
|
value: 49.519000000000005 |
|
- type: precision_at_1 |
|
value: 28.804999999999996 |
|
- type: precision_at_10 |
|
value: 8.286 |
|
- type: precision_at_100 |
|
value: 0.984 |
|
- type: precision_at_1000 |
|
value: 0.1 |
|
- type: precision_at_3 |
|
value: 18.682000000000002 |
|
- type: precision_at_5 |
|
value: 13.627 |
|
- type: recall_at_1 |
|
value: 28.804999999999996 |
|
- type: recall_at_10 |
|
value: 82.85900000000001 |
|
- type: recall_at_100 |
|
value: 98.36399999999999 |
|
- type: recall_at_1000 |
|
value: 99.644 |
|
- type: recall_at_3 |
|
value: 56.04599999999999 |
|
- type: recall_at_5 |
|
value: 68.137 |
|
- task: |
|
type: Classification |
|
dataset: |
|
type: PL-MTEB/cbd |
|
name: MTEB CBD |
|
config: default |
|
split: test |
|
revision: None |
|
metrics: |
|
- type: accuracy |
|
value: 64.24 |
|
- type: ap |
|
value: 17.967103105024705 |
|
- type: f1 |
|
value: 52.97375416129459 |
|
- task: |
|
type: PairClassification |
|
dataset: |
|
type: PL-MTEB/cdsce-pairclassification |
|
name: MTEB CDSC-E |
|
config: default |
|
split: test |
|
revision: None |
|
metrics: |
|
- type: cos_sim_accuracy |
|
value: 88.8 |
|
- type: cos_sim_ap |
|
value: 76.68028778789487 |
|
- type: cos_sim_f1 |
|
value: 66.82352941176471 |
|
- type: cos_sim_precision |
|
value: 60.42553191489362 |
|
- type: cos_sim_recall |
|
value: 74.73684210526315 |
|
- type: dot_accuracy |
|
value: 88.1 |
|
- type: dot_ap |
|
value: 72.04910086070551 |
|
- type: dot_f1 |
|
value: 66.66666666666667 |
|
- type: dot_precision |
|
value: 69.31818181818183 |
|
- type: dot_recall |
|
value: 64.21052631578948 |
|
- type: euclidean_accuracy |
|
value: 88.8 |
|
- type: euclidean_ap |
|
value: 76.63591858340688 |
|
- type: euclidean_f1 |
|
value: 67.13286713286713 |
|
- type: euclidean_precision |
|
value: 60.25104602510461 |
|
- type: euclidean_recall |
|
value: 75.78947368421053 |
|
- type: manhattan_accuracy |
|
value: 88.9 |
|
- type: manhattan_ap |
|
value: 76.54552849815124 |
|
- type: manhattan_f1 |
|
value: 66.66666666666667 |
|
- type: manhattan_precision |
|
value: 60.51502145922747 |
|
- type: manhattan_recall |
|
value: 74.21052631578947 |
|
- type: max_accuracy |
|
value: 88.9 |
|
- type: max_ap |
|
value: 76.68028778789487 |
|
- type: max_f1 |
|
value: 67.13286713286713 |
|
- task: |
|
type: STS |
|
dataset: |
|
type: PL-MTEB/cdscr-sts |
|
name: MTEB CDSC-R |
|
config: default |
|
split: test |
|
revision: None |
|
metrics: |
|
- type: cos_sim_pearson |
|
value: 91.64169404461497 |
|
- type: cos_sim_spearman |
|
value: 91.9755161377078 |
|
- type: euclidean_pearson |
|
value: 90.87481478491249 |
|
- type: euclidean_spearman |
|
value: 91.92362666383987 |
|
- type: manhattan_pearson |
|
value: 90.8415510499638 |
|
- type: manhattan_spearman |
|
value: 91.85927127194698 |
|
- task: |
|
type: Retrieval |
|
dataset: |
|
type: dbpedia-pl |
|
name: MTEB DBPedia-PL |
|
config: default |
|
split: test |
|
revision: None |
|
metrics: |
|
- type: map_at_1 |
|
value: 6.148 |
|
- type: map_at_10 |
|
value: 12.870999999999999 |
|
- type: map_at_100 |
|
value: 18.04 |
|
- type: map_at_1000 |
|
value: 19.286 |
|
- type: map_at_3 |
|
value: 9.156 |
|
- type: map_at_5 |
|
value: 10.857999999999999 |
|
- type: mrr_at_1 |
|
value: 53.25 |
|
- type: mrr_at_10 |
|
value: 61.016999999999996 |
|
- type: mrr_at_100 |
|
value: 61.48400000000001 |
|
- type: mrr_at_1000 |
|
value: 61.507999999999996 |
|
- type: mrr_at_3 |
|
value: 58.75 |
|
- type: mrr_at_5 |
|
value: 60.375 |
|
- type: ndcg_at_1 |
|
value: 41.0 |
|
- type: ndcg_at_10 |
|
value: 30.281000000000002 |
|
- type: ndcg_at_100 |
|
value: 33.955999999999996 |
|
- type: ndcg_at_1000 |
|
value: 40.77 |
|
- type: ndcg_at_3 |
|
value: 34.127 |
|
- type: ndcg_at_5 |
|
value: 32.274 |
|
- type: precision_at_1 |
|
value: 52.5 |
|
- type: precision_at_10 |
|
value: 24.525 |
|
- type: precision_at_100 |
|
value: 8.125 |
|
- type: precision_at_1000 |
|
value: 1.728 |
|
- type: precision_at_3 |
|
value: 37.083 |
|
- type: precision_at_5 |
|
value: 32.15 |
|
- type: recall_at_1 |
|
value: 6.148 |
|
- type: recall_at_10 |
|
value: 17.866 |
|
- type: recall_at_100 |
|
value: 39.213 |
|
- type: recall_at_1000 |
|
value: 61.604000000000006 |
|
- type: recall_at_3 |
|
value: 10.084 |
|
- type: recall_at_5 |
|
value: 13.333999999999998 |
|
- task: |
|
type: Retrieval |
|
dataset: |
|
type: fiqa-pl |
|
name: MTEB FiQA-PL |
|
config: default |
|
split: test |
|
revision: None |
|
metrics: |
|
- type: map_at_1 |
|
value: 14.643 |
|
- type: map_at_10 |
|
value: 23.166 |
|
- type: map_at_100 |
|
value: 24.725 |
|
- type: map_at_1000 |
|
value: 24.92 |
|
- type: map_at_3 |
|
value: 20.166 |
|
- type: map_at_5 |
|
value: 22.003 |
|
- type: mrr_at_1 |
|
value: 29.630000000000003 |
|
- type: mrr_at_10 |
|
value: 37.632 |
|
- type: mrr_at_100 |
|
value: 38.512 |
|
- type: mrr_at_1000 |
|
value: 38.578 |
|
- type: mrr_at_3 |
|
value: 35.391 |
|
- type: mrr_at_5 |
|
value: 36.857 |
|
- type: ndcg_at_1 |
|
value: 29.166999999999998 |
|
- type: ndcg_at_10 |
|
value: 29.749 |
|
- type: ndcg_at_100 |
|
value: 35.983 |
|
- type: ndcg_at_1000 |
|
value: 39.817 |
|
- type: ndcg_at_3 |
|
value: 26.739 |
|
- type: ndcg_at_5 |
|
value: 27.993000000000002 |
|
- type: precision_at_1 |
|
value: 29.166999999999998 |
|
- type: precision_at_10 |
|
value: 8.333 |
|
- type: precision_at_100 |
|
value: 1.448 |
|
- type: precision_at_1000 |
|
value: 0.213 |
|
- type: precision_at_3 |
|
value: 17.747 |
|
- type: precision_at_5 |
|
value: 13.58 |
|
- type: recall_at_1 |
|
value: 14.643 |
|
- type: recall_at_10 |
|
value: 35.247 |
|
- type: recall_at_100 |
|
value: 59.150999999999996 |
|
- type: recall_at_1000 |
|
value: 82.565 |
|
- type: recall_at_3 |
|
value: 24.006 |
|
- type: recall_at_5 |
|
value: 29.383 |
|
- task: |
|
type: Retrieval |
|
dataset: |
|
type: hotpotqa-pl |
|
name: MTEB HotpotQA-PL |
|
config: default |
|
split: test |
|
revision: None |
|
metrics: |
|
- type: map_at_1 |
|
value: 32.627 |
|
- type: map_at_10 |
|
value: 48.041 |
|
- type: map_at_100 |
|
value: 49.008 |
|
- type: map_at_1000 |
|
value: 49.092999999999996 |
|
- type: map_at_3 |
|
value: 44.774 |
|
- type: map_at_5 |
|
value: 46.791 |
|
- type: mrr_at_1 |
|
value: 65.28 |
|
- type: mrr_at_10 |
|
value: 72.53500000000001 |
|
- type: mrr_at_100 |
|
value: 72.892 |
|
- type: mrr_at_1000 |
|
value: 72.909 |
|
- type: mrr_at_3 |
|
value: 71.083 |
|
- type: mrr_at_5 |
|
value: 71.985 |
|
- type: ndcg_at_1 |
|
value: 65.253 |
|
- type: ndcg_at_10 |
|
value: 57.13700000000001 |
|
- type: ndcg_at_100 |
|
value: 60.783 |
|
- type: ndcg_at_1000 |
|
value: 62.507000000000005 |
|
- type: ndcg_at_3 |
|
value: 52.17 |
|
- type: ndcg_at_5 |
|
value: 54.896 |
|
- type: precision_at_1 |
|
value: 65.253 |
|
- type: precision_at_10 |
|
value: 12.088000000000001 |
|
- type: precision_at_100 |
|
value: 1.496 |
|
- type: precision_at_1000 |
|
value: 0.172 |
|
- type: precision_at_3 |
|
value: 32.96 |
|
- type: precision_at_5 |
|
value: 21.931 |
|
- type: recall_at_1 |
|
value: 32.627 |
|
- type: recall_at_10 |
|
value: 60.439 |
|
- type: recall_at_100 |
|
value: 74.80799999999999 |
|
- type: recall_at_1000 |
|
value: 86.219 |
|
- type: recall_at_3 |
|
value: 49.44 |
|
- type: recall_at_5 |
|
value: 54.827999999999996 |
|
- task: |
|
type: Retrieval |
|
dataset: |
|
type: msmarco-pl |
|
name: MTEB MSMARCO-PL |
|
config: default |
|
split: validation |
|
revision: None |
|
metrics: |
|
- type: map_at_1 |
|
value: 13.150999999999998 |
|
- type: map_at_10 |
|
value: 21.179000000000002 |
|
- type: map_at_100 |
|
value: 22.227 |
|
- type: map_at_1000 |
|
value: 22.308 |
|
- type: map_at_3 |
|
value: 18.473 |
|
- type: map_at_5 |
|
value: 19.942999999999998 |
|
- type: mrr_at_1 |
|
value: 13.467 |
|
- type: mrr_at_10 |
|
value: 21.471 |
|
- type: mrr_at_100 |
|
value: 22.509 |
|
- type: mrr_at_1000 |
|
value: 22.585 |
|
- type: mrr_at_3 |
|
value: 18.789 |
|
- type: mrr_at_5 |
|
value: 20.262 |
|
- type: ndcg_at_1 |
|
value: 13.539000000000001 |
|
- type: ndcg_at_10 |
|
value: 25.942999999999998 |
|
- type: ndcg_at_100 |
|
value: 31.386999999999997 |
|
- type: ndcg_at_1000 |
|
value: 33.641 |
|
- type: ndcg_at_3 |
|
value: 20.368 |
|
- type: ndcg_at_5 |
|
value: 23.003999999999998 |
|
- type: precision_at_1 |
|
value: 13.539000000000001 |
|
- type: precision_at_10 |
|
value: 4.249 |
|
- type: precision_at_100 |
|
value: 0.7040000000000001 |
|
- type: precision_at_1000 |
|
value: 0.09 |
|
- type: precision_at_3 |
|
value: 8.782 |
|
- type: precision_at_5 |
|
value: 6.6049999999999995 |
|
- type: recall_at_1 |
|
value: 13.150999999999998 |
|
- type: recall_at_10 |
|
value: 40.698 |
|
- type: recall_at_100 |
|
value: 66.71000000000001 |
|
- type: recall_at_1000 |
|
value: 84.491 |
|
- type: recall_at_3 |
|
value: 25.452 |
|
- type: recall_at_5 |
|
value: 31.791000000000004 |
|
- task: |
|
type: Classification |
|
dataset: |
|
type: mteb/amazon_massive_intent |
|
name: MTEB MassiveIntentClassification (pl) |
|
config: pl |
|
split: test |
|
revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 |
|
metrics: |
|
- type: accuracy |
|
value: 67.3537323470074 |
|
- type: f1 |
|
value: 64.67852047603644 |
|
- task: |
|
type: Classification |
|
dataset: |
|
type: mteb/amazon_massive_scenario |
|
name: MTEB MassiveScenarioClassification (pl) |
|
config: pl |
|
split: test |
|
revision: 7d571f92784cd94a019292a1f45445077d0ef634 |
|
metrics: |
|
- type: accuracy |
|
value: 72.12508406186953 |
|
- type: f1 |
|
value: 71.55887309568853 |
|
- task: |
|
type: Retrieval |
|
dataset: |
|
type: nfcorpus-pl |
|
name: MTEB NFCorpus-PL |
|
config: default |
|
split: test |
|
revision: None |
|
metrics: |
|
- type: map_at_1 |
|
value: 4.18 |
|
- type: map_at_10 |
|
value: 9.524000000000001 |
|
- type: map_at_100 |
|
value: 12.272 |
|
- type: map_at_1000 |
|
value: 13.616 |
|
- type: map_at_3 |
|
value: 6.717 |
|
- type: map_at_5 |
|
value: 8.172 |
|
- type: mrr_at_1 |
|
value: 37.152 |
|
- type: mrr_at_10 |
|
value: 45.068000000000005 |
|
- type: mrr_at_100 |
|
value: 46.026 |
|
- type: mrr_at_1000 |
|
value: 46.085 |
|
- type: mrr_at_3 |
|
value: 43.344 |
|
- type: mrr_at_5 |
|
value: 44.412 |
|
- type: ndcg_at_1 |
|
value: 34.52 |
|
- type: ndcg_at_10 |
|
value: 27.604 |
|
- type: ndcg_at_100 |
|
value: 26.012999999999998 |
|
- type: ndcg_at_1000 |
|
value: 35.272 |
|
- type: ndcg_at_3 |
|
value: 31.538 |
|
- type: ndcg_at_5 |
|
value: 30.165999999999997 |
|
- type: precision_at_1 |
|
value: 36.223 |
|
- type: precision_at_10 |
|
value: 21.053 |
|
- type: precision_at_100 |
|
value: 7.08 |
|
- type: precision_at_1000 |
|
value: 1.9929999999999999 |
|
- type: precision_at_3 |
|
value: 30.031000000000002 |
|
- type: precision_at_5 |
|
value: 26.997 |
|
- type: recall_at_1 |
|
value: 4.18 |
|
- type: recall_at_10 |
|
value: 12.901000000000002 |
|
- type: recall_at_100 |
|
value: 27.438000000000002 |
|
- type: recall_at_1000 |
|
value: 60.768 |
|
- type: recall_at_3 |
|
value: 7.492 |
|
- type: recall_at_5 |
|
value: 10.05 |
|
- task: |
|
type: Retrieval |
|
dataset: |
|
type: nq-pl |
|
name: MTEB NQ-PL |
|
config: default |
|
split: test |
|
revision: None |
|
metrics: |
|
- type: map_at_1 |
|
value: 17.965 |
|
- type: map_at_10 |
|
value: 28.04 |
|
- type: map_at_100 |
|
value: 29.217 |
|
- type: map_at_1000 |
|
value: 29.285 |
|
- type: map_at_3 |
|
value: 24.818 |
|
- type: map_at_5 |
|
value: 26.617 |
|
- type: mrr_at_1 |
|
value: 20.22 |
|
- type: mrr_at_10 |
|
value: 30.148000000000003 |
|
- type: mrr_at_100 |
|
value: 31.137999999999998 |
|
- type: mrr_at_1000 |
|
value: 31.19 |
|
- type: mrr_at_3 |
|
value: 27.201999999999998 |
|
- type: mrr_at_5 |
|
value: 28.884999999999998 |
|
- type: ndcg_at_1 |
|
value: 20.365 |
|
- type: ndcg_at_10 |
|
value: 33.832 |
|
- type: ndcg_at_100 |
|
value: 39.33 |
|
- type: ndcg_at_1000 |
|
value: 41.099999999999994 |
|
- type: ndcg_at_3 |
|
value: 27.46 |
|
- type: ndcg_at_5 |
|
value: 30.584 |
|
- type: precision_at_1 |
|
value: 20.365 |
|
- type: precision_at_10 |
|
value: 5.849 |
|
- type: precision_at_100 |
|
value: 0.8959999999999999 |
|
- type: precision_at_1000 |
|
value: 0.107 |
|
- type: precision_at_3 |
|
value: 12.64 |
|
- type: precision_at_5 |
|
value: 9.334000000000001 |
|
- type: recall_at_1 |
|
value: 17.965 |
|
- type: recall_at_10 |
|
value: 49.503 |
|
- type: recall_at_100 |
|
value: 74.351 |
|
- type: recall_at_1000 |
|
value: 87.766 |
|
- type: recall_at_3 |
|
value: 32.665 |
|
- type: recall_at_5 |
|
value: 39.974 |
|
- task: |
|
type: Classification |
|
dataset: |
|
type: laugustyniak/abusive-clauses-pl |
|
name: MTEB PAC |
|
config: default |
|
split: test |
|
revision: None |
|
metrics: |
|
- type: accuracy |
|
value: 63.11323486823051 |
|
- type: ap |
|
value: 74.53486257377787 |
|
- type: f1 |
|
value: 60.631005373417736 |
|
- task: |
|
type: PairClassification |
|
dataset: |
|
type: PL-MTEB/ppc-pairclassification |
|
name: MTEB PPC |
|
config: default |
|
split: test |
|
revision: None |
|
metrics: |
|
- type: cos_sim_accuracy |
|
value: 80.10000000000001 |
|
- type: cos_sim_ap |
|
value: 89.69526236458292 |
|
- type: cos_sim_f1 |
|
value: 83.37468982630274 |
|
- type: cos_sim_precision |
|
value: 83.30578512396694 |
|
- type: cos_sim_recall |
|
value: 83.44370860927152 |
|
- type: dot_accuracy |
|
value: 77.8 |
|
- type: dot_ap |
|
value: 87.72366051496104 |
|
- type: dot_f1 |
|
value: 82.83752860411899 |
|
- type: dot_precision |
|
value: 76.80339462517681 |
|
- type: dot_recall |
|
value: 89.90066225165563 |
|
- type: euclidean_accuracy |
|
value: 80.10000000000001 |
|
- type: euclidean_ap |
|
value: 89.61317191870039 |
|
- type: euclidean_f1 |
|
value: 83.40214698596202 |
|
- type: euclidean_precision |
|
value: 83.19604612850083 |
|
- type: euclidean_recall |
|
value: 83.6092715231788 |
|
- type: manhattan_accuracy |
|
value: 79.60000000000001 |
|
- type: manhattan_ap |
|
value: 89.48363786968471 |
|
- type: manhattan_f1 |
|
value: 82.96296296296296 |
|
- type: manhattan_precision |
|
value: 82.48772504091653 |
|
- type: manhattan_recall |
|
value: 83.44370860927152 |
|
- type: max_accuracy |
|
value: 80.10000000000001 |
|
- type: max_ap |
|
value: 89.69526236458292 |
|
- type: max_f1 |
|
value: 83.40214698596202 |
|
- task: |
|
type: PairClassification |
|
dataset: |
|
type: PL-MTEB/psc-pairclassification |
|
name: MTEB PSC |
|
config: default |
|
split: test |
|
revision: None |
|
metrics: |
|
- type: cos_sim_accuracy |
|
value: 96.93877551020408 |
|
- type: cos_sim_ap |
|
value: 98.86489482248999 |
|
- type: cos_sim_f1 |
|
value: 95.11111111111113 |
|
- type: cos_sim_precision |
|
value: 92.507204610951 |
|
- type: cos_sim_recall |
|
value: 97.86585365853658 |
|
- type: dot_accuracy |
|
value: 95.73283858998145 |
|
- type: dot_ap |
|
value: 97.8261652492545 |
|
- type: dot_f1 |
|
value: 93.21533923303835 |
|
- type: dot_precision |
|
value: 90.28571428571428 |
|
- type: dot_recall |
|
value: 96.34146341463415 |
|
- type: euclidean_accuracy |
|
value: 96.93877551020408 |
|
- type: euclidean_ap |
|
value: 98.84837797066623 |
|
- type: euclidean_f1 |
|
value: 95.11111111111113 |
|
- type: euclidean_precision |
|
value: 92.507204610951 |
|
- type: euclidean_recall |
|
value: 97.86585365853658 |
|
- type: manhattan_accuracy |
|
value: 96.84601113172542 |
|
- type: manhattan_ap |
|
value: 98.78659090944161 |
|
- type: manhattan_f1 |
|
value: 94.9404761904762 |
|
- type: manhattan_precision |
|
value: 92.73255813953489 |
|
- type: manhattan_recall |
|
value: 97.2560975609756 |
|
- type: max_accuracy |
|
value: 96.93877551020408 |
|
- type: max_ap |
|
value: 98.86489482248999 |
|
- type: max_f1 |
|
value: 95.11111111111113 |
|
- task: |
|
type: Classification |
|
dataset: |
|
type: PL-MTEB/polemo2_in |
|
name: MTEB PolEmo2.0-IN |
|
config: default |
|
split: test |
|
revision: None |
|
metrics: |
|
- type: accuracy |
|
value: 63.961218836565095 |
|
- type: f1 |
|
value: 64.3979989243291 |
|
- task: |
|
type: Classification |
|
dataset: |
|
type: PL-MTEB/polemo2_out |
|
name: MTEB PolEmo2.0-OUT |
|
config: default |
|
split: test |
|
revision: None |
|
metrics: |
|
- type: accuracy |
|
value: 40.32388663967612 |
|
- type: f1 |
|
value: 32.339117999015755 |
|
- task: |
|
type: Retrieval |
|
dataset: |
|
type: quora-pl |
|
name: MTEB Quora-PL |
|
config: default |
|
split: test |
|
revision: None |
|
metrics: |
|
- type: map_at_1 |
|
value: 62.757 |
|
- type: map_at_10 |
|
value: 76.55999999999999 |
|
- type: map_at_100 |
|
value: 77.328 |
|
- type: map_at_1000 |
|
value: 77.35499999999999 |
|
- type: map_at_3 |
|
value: 73.288 |
|
- type: map_at_5 |
|
value: 75.25500000000001 |
|
- type: mrr_at_1 |
|
value: 72.28 |
|
- type: mrr_at_10 |
|
value: 79.879 |
|
- type: mrr_at_100 |
|
value: 80.121 |
|
- type: mrr_at_1000 |
|
value: 80.12700000000001 |
|
- type: mrr_at_3 |
|
value: 78.40700000000001 |
|
- type: mrr_at_5 |
|
value: 79.357 |
|
- type: ndcg_at_1 |
|
value: 72.33000000000001 |
|
- type: ndcg_at_10 |
|
value: 81.151 |
|
- type: ndcg_at_100 |
|
value: 83.107 |
|
- type: ndcg_at_1000 |
|
value: 83.397 |
|
- type: ndcg_at_3 |
|
value: 77.3 |
|
- type: ndcg_at_5 |
|
value: 79.307 |
|
- type: precision_at_1 |
|
value: 72.33000000000001 |
|
- type: precision_at_10 |
|
value: 12.587000000000002 |
|
- type: precision_at_100 |
|
value: 1.488 |
|
- type: precision_at_1000 |
|
value: 0.155 |
|
- type: precision_at_3 |
|
value: 33.943 |
|
- type: precision_at_5 |
|
value: 22.61 |
|
- type: recall_at_1 |
|
value: 62.757 |
|
- type: recall_at_10 |
|
value: 90.616 |
|
- type: recall_at_100 |
|
value: 97.905 |
|
- type: recall_at_1000 |
|
value: 99.618 |
|
- type: recall_at_3 |
|
value: 79.928 |
|
- type: recall_at_5 |
|
value: 85.30499999999999 |
|
- task: |
|
type: Retrieval |
|
dataset: |
|
type: scidocs-pl |
|
name: MTEB SCIDOCS-PL |
|
config: default |
|
split: test |
|
revision: None |
|
metrics: |
|
- type: map_at_1 |
|
value: 3.313 |
|
- type: map_at_10 |
|
value: 8.559999999999999 |
|
- type: map_at_100 |
|
value: 10.177999999999999 |
|
- type: map_at_1000 |
|
value: 10.459999999999999 |
|
- type: map_at_3 |
|
value: 6.094 |
|
- type: map_at_5 |
|
value: 7.323 |
|
- type: mrr_at_1 |
|
value: 16.3 |
|
- type: mrr_at_10 |
|
value: 25.579 |
|
- type: mrr_at_100 |
|
value: 26.717000000000002 |
|
- type: mrr_at_1000 |
|
value: 26.799 |
|
- type: mrr_at_3 |
|
value: 22.583000000000002 |
|
- type: mrr_at_5 |
|
value: 24.298000000000002 |
|
- type: ndcg_at_1 |
|
value: 16.3 |
|
- type: ndcg_at_10 |
|
value: 14.789 |
|
- type: ndcg_at_100 |
|
value: 21.731 |
|
- type: ndcg_at_1000 |
|
value: 27.261999999999997 |
|
- type: ndcg_at_3 |
|
value: 13.74 |
|
- type: ndcg_at_5 |
|
value: 12.199 |
|
- type: precision_at_1 |
|
value: 16.3 |
|
- type: precision_at_10 |
|
value: 7.779999999999999 |
|
- type: precision_at_100 |
|
value: 1.79 |
|
- type: precision_at_1000 |
|
value: 0.313 |
|
- type: precision_at_3 |
|
value: 12.933 |
|
- type: precision_at_5 |
|
value: 10.86 |
|
- type: recall_at_1 |
|
value: 3.313 |
|
- type: recall_at_10 |
|
value: 15.772 |
|
- type: recall_at_100 |
|
value: 36.392 |
|
- type: recall_at_1000 |
|
value: 63.525 |
|
- type: recall_at_3 |
|
value: 7.863 |
|
- type: recall_at_5 |
|
value: 11.003 |
|
- task: |
|
type: PairClassification |
|
dataset: |
|
type: PL-MTEB/sicke-pl-pairclassification |
|
name: MTEB SICK-E-PL |
|
config: default |
|
split: test |
|
revision: None |
|
metrics: |
|
- type: cos_sim_accuracy |
|
value: 81.7977986139421 |
|
- type: cos_sim_ap |
|
value: 73.21294750778902 |
|
- type: cos_sim_f1 |
|
value: 66.57391304347826 |
|
- type: cos_sim_precision |
|
value: 65.05778382053025 |
|
- type: cos_sim_recall |
|
value: 68.16239316239316 |
|
- type: dot_accuracy |
|
value: 78.67916836526702 |
|
- type: dot_ap |
|
value: 63.61943815978181 |
|
- type: dot_f1 |
|
value: 62.45014245014245 |
|
- type: dot_precision |
|
value: 52.04178537511871 |
|
- type: dot_recall |
|
value: 78.06267806267806 |
|
- type: euclidean_accuracy |
|
value: 81.7774154097024 |
|
- type: euclidean_ap |
|
value: 73.25053778387148 |
|
- type: euclidean_f1 |
|
value: 66.55064392620953 |
|
- type: euclidean_precision |
|
value: 65.0782845473111 |
|
- type: euclidean_recall |
|
value: 68.09116809116809 |
|
- type: manhattan_accuracy |
|
value: 81.63473298002447 |
|
- type: manhattan_ap |
|
value: 72.99781945530033 |
|
- type: manhattan_f1 |
|
value: 66.3623595505618 |
|
- type: manhattan_precision |
|
value: 65.4432132963989 |
|
- type: manhattan_recall |
|
value: 67.3076923076923 |
|
- type: max_accuracy |
|
value: 81.7977986139421 |
|
- type: max_ap |
|
value: 73.25053778387148 |
|
- type: max_f1 |
|
value: 66.57391304347826 |
|
- task: |
|
type: STS |
|
dataset: |
|
type: PL-MTEB/sickr-pl-sts |
|
name: MTEB SICK-R-PL |
|
config: default |
|
split: test |
|
revision: None |
|
metrics: |
|
- type: cos_sim_pearson |
|
value: 79.62332929388755 |
|
- type: cos_sim_spearman |
|
value: 73.70598290849304 |
|
- type: euclidean_pearson |
|
value: 77.3603286710006 |
|
- type: euclidean_spearman |
|
value: 73.74420279933932 |
|
- type: manhattan_pearson |
|
value: 77.12735032552482 |
|
- type: manhattan_spearman |
|
value: 73.53014836690127 |
|
- task: |
|
type: STS |
|
dataset: |
|
type: mteb/sts22-crosslingual-sts |
|
name: MTEB STS22 (pl) |
|
config: pl |
|
split: test |
|
revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80 |
|
metrics: |
|
- type: cos_sim_pearson |
|
value: 37.696942928686724 |
|
- type: cos_sim_spearman |
|
value: 40.6271445245692 |
|
- type: euclidean_pearson |
|
value: 30.212734461370832 |
|
- type: euclidean_spearman |
|
value: 40.66643376699638 |
|
- type: manhattan_pearson |
|
value: 29.90223716230108 |
|
- type: manhattan_spearman |
|
value: 40.35576319091178 |
|
- task: |
|
type: Retrieval |
|
dataset: |
|
type: scifact-pl |
|
name: MTEB SciFact-PL |
|
config: default |
|
split: test |
|
revision: None |
|
metrics: |
|
- type: map_at_1 |
|
value: 43.528 |
|
- type: map_at_10 |
|
value: 53.290000000000006 |
|
- type: map_at_100 |
|
value: 54.342 |
|
- type: map_at_1000 |
|
value: 54.376999999999995 |
|
- type: map_at_3 |
|
value: 50.651999999999994 |
|
- type: map_at_5 |
|
value: 52.248000000000005 |
|
- type: mrr_at_1 |
|
value: 46.666999999999994 |
|
- type: mrr_at_10 |
|
value: 55.286 |
|
- type: mrr_at_100 |
|
value: 56.094 |
|
- type: mrr_at_1000 |
|
value: 56.125 |
|
- type: mrr_at_3 |
|
value: 53.222 |
|
- type: mrr_at_5 |
|
value: 54.339000000000006 |
|
- type: ndcg_at_1 |
|
value: 46.0 |
|
- type: ndcg_at_10 |
|
value: 58.142 |
|
- type: ndcg_at_100 |
|
value: 62.426 |
|
- type: ndcg_at_1000 |
|
value: 63.395999999999994 |
|
- type: ndcg_at_3 |
|
value: 53.53 |
|
- type: ndcg_at_5 |
|
value: 55.842000000000006 |
|
- type: precision_at_1 |
|
value: 46.0 |
|
- type: precision_at_10 |
|
value: 7.9670000000000005 |
|
- type: precision_at_100 |
|
value: 1.023 |
|
- type: precision_at_1000 |
|
value: 0.11100000000000002 |
|
- type: precision_at_3 |
|
value: 21.444 |
|
- type: precision_at_5 |
|
value: 14.333000000000002 |
|
- type: recall_at_1 |
|
value: 43.528 |
|
- type: recall_at_10 |
|
value: 71.511 |
|
- type: recall_at_100 |
|
value: 89.93299999999999 |
|
- type: recall_at_1000 |
|
value: 97.667 |
|
- type: recall_at_3 |
|
value: 59.067 |
|
- type: recall_at_5 |
|
value: 64.789 |
|
- task: |
|
type: Retrieval |
|
dataset: |
|
type: trec-covid-pl |
|
name: MTEB TRECCOVID-PL |
|
config: default |
|
split: test |
|
revision: None |
|
metrics: |
|
- type: map_at_1 |
|
value: 0.22699999999999998 |
|
- type: map_at_10 |
|
value: 1.3379999999999999 |
|
- type: map_at_100 |
|
value: 6.965000000000001 |
|
- type: map_at_1000 |
|
value: 17.135 |
|
- type: map_at_3 |
|
value: 0.53 |
|
- type: map_at_5 |
|
value: 0.799 |
|
- type: mrr_at_1 |
|
value: 84.0 |
|
- type: mrr_at_10 |
|
value: 88.083 |
|
- type: mrr_at_100 |
|
value: 88.432 |
|
- type: mrr_at_1000 |
|
value: 88.432 |
|
- type: mrr_at_3 |
|
value: 87.333 |
|
- type: mrr_at_5 |
|
value: 87.833 |
|
- type: ndcg_at_1 |
|
value: 76.0 |
|
- type: ndcg_at_10 |
|
value: 58.199 |
|
- type: ndcg_at_100 |
|
value: 43.230000000000004 |
|
- type: ndcg_at_1000 |
|
value: 39.751 |
|
- type: ndcg_at_3 |
|
value: 63.743 |
|
- type: ndcg_at_5 |
|
value: 60.42999999999999 |
|
- type: precision_at_1 |
|
value: 84.0 |
|
- type: precision_at_10 |
|
value: 62.0 |
|
- type: precision_at_100 |
|
value: 44.519999999999996 |
|
- type: precision_at_1000 |
|
value: 17.746000000000002 |
|
- type: precision_at_3 |
|
value: 67.333 |
|
- type: precision_at_5 |
|
value: 63.2 |
|
- type: recall_at_1 |
|
value: 0.22699999999999998 |
|
- type: recall_at_10 |
|
value: 1.627 |
|
- type: recall_at_100 |
|
value: 10.600999999999999 |
|
- type: recall_at_1000 |
|
value: 37.532 |
|
- type: recall_at_3 |
|
value: 0.547 |
|
- type: recall_at_5 |
|
value: 0.864 |
|
language: pl |
|
license: apache-2.0 |
|
widget: |
|
- source_sentence: "query: Jak dożyć 100 lat?" |
|
sentences: |
|
- "passage: Trzeba zdrowo się odżywiać i uprawiać sport." |
|
- "passage: Trzeba pić alkohol, imprezować i jeździć szybkimi autami." |
|
- "passage: Gdy trwała kampania politycy zapewniali, że rozprawią się z zakazem niedzielnego handlu." |
|
|
|
--- |
|
|
|
<h1 align="center">MMLW-e5-small</h1> |
|
|
|
MMLW (muszę mieć lepszą wiadomość) are neural text encoders for Polish. |
|
This is a distilled model that can be used to generate embeddings applicable to many tasks such as semantic similarity, clustering, information retrieval. The model can also serve as a base for further fine-tuning. |
|
It transforms texts to 384 dimensional vectors. |
|
The model was initialized with multilingual E5 checkpoint, and then trained with [multilingual knowledge distillation method](https://aclanthology.org/2020.emnlp-main.365/) on a diverse corpus of 60 million Polish-English text pairs. We utilised [English FlagEmbeddings (BGE)](https://huggingface.co/BAAI/bge-base-en) as teacher models for distillation. |
|
|
|
## Usage (Sentence-Transformers) |
|
|
|
⚠️ Our embedding models require the use of specific prefixes and suffixes when encoding texts. For this model, queries should be prefixed with **"query: "** and passages with **"passage: "** ⚠️ |
|
|
|
You can use the model like this with [sentence-transformers](https://www.SBERT.net): |
|
|
|
```python |
|
from sentence_transformers import SentenceTransformer |
|
from sentence_transformers.util import cos_sim |
|
|
|
query_prefix = "query: " |
|
answer_prefix = "passage: " |
|
queries = [query_prefix + "Jak dożyć 100 lat?"] |
|
answers = [ |
|
answer_prefix + "Trzeba zdrowo się odżywiać i uprawiać sport.", |
|
answer_prefix + "Trzeba pić alkohol, imprezować i jeździć szybkimi autami.", |
|
answer_prefix + "Gdy trwała kampania politycy zapewniali, że rozprawią się z zakazem niedzielnego handlu." |
|
] |
|
model = SentenceTransformer("sdadas/mmlw-e5-small") |
|
queries_emb = model.encode(queries, convert_to_tensor=True, show_progress_bar=False) |
|
answers_emb = model.encode(answers, convert_to_tensor=True, show_progress_bar=False) |
|
|
|
best_answer = cos_sim(queries_emb, answers_emb).argmax().item() |
|
print(answers[best_answer]) |
|
# Trzeba zdrowo się odżywiać i uprawiać sport. |
|
``` |
|
|
|
## Evaluation Results |
|
|
|
- The model achieves an **Average Score** of **55.84** on the Polish Massive Text Embedding Benchmark (MTEB). See [MTEB Leaderboard](https://huggingface.co/spaces/mteb/leaderboard) for detailed results. |
|
- The model achieves **NDCG@10** of **47.64** on the Polish Information Retrieval Benchmark. See [PIRB Leaderboard](https://huggingface.co/spaces/sdadas/pirb) for detailed results. |
|
|
|
## Acknowledgements |
|
This model was trained with the A100 GPU cluster support delivered by the Gdansk University of Technology within the TASK center initiative. |
|
|
|
## Citation |
|
|
|
```bibtex |
|
@article{dadas2024pirb, |
|
title={{PIRB}: A Comprehensive Benchmark of Polish Dense and Hybrid Text Retrieval Methods}, |
|
author={Sławomir Dadas and Michał Perełkiewicz and Rafał Poświata}, |
|
year={2024}, |
|
eprint={2402.13350}, |
|
archivePrefix={arXiv}, |
|
primaryClass={cs.CL} |
|
} |
|
``` |