Add new SentenceTransformer model

fe9f8d9 verified 27 days ago

10.9 kB

	---
	tags:
	- sentence-transformers
	- sentence-similarity
	- feature-extraction
	- generated_from_trainer
	- loss:MatryoshkaLoss
	- loss:MultipleNegativesRankingLoss
	- mteb
	base_model: aubmindlab/bert-base-arabertv02
	widget:
	- source_sentence: ستة شبان بالشرفة يرتدون الجينز الأزرق و قميصهم مرفوع
	sentences:
	- الأولاد على الرمال.
	- نصف دزينة من الشباب يجلسون على شرفة.
	- مجموعة من الأولاد يرتدون معاطف الفراء.
	- source_sentence: السعرات الحرارية في لحم البقر المنغولي
	sentences:
	- 'هناك 320 سعرة حرارية في حصة واحدة من لحم البقر المنغولي باي وي. توزيع السعرات
	الحرارية: 47٪ دهون ، 28٪ كربوهيدرات ، 25٪ بروتين.'
	- أي هاتف ذكي هو الأفضل في عام 2016؟
	- وجبات فورية. يمكن أن تحتوي نودلز الرامن الفورية ، مثل تلك التي تُباع في أكواب
	الستايروفوم ، من 190 إلى 300 سعرة حرارية لكل وجبة ، اعتمادًا على النكهة. تميل
	الوجبات ذات النكهة الكريمية إلى أن تكون أعلى في السعرات الحرارية مقارنة بتلك التي
	تحتوي على مرق الدجاج ولحم البقر ولحم الخنزير ومرق الخضار.
	- source_sentence: ماذا نستخدم لغسل أيدينا
	sentences:
	- اتبع الخطوات الخمس أدناه لغسل يديك بالطريقة الصحيحة في كل مرة. بلل يديك بمياه
	جارية نظيفة (دافئة أو باردة) ، ثم أغلق الصنبور ، ثم ضع الصابون. افركي يديك عن
	طريق فركهما بالصابون. افرك يديك لمدة 20 ثانية على الأقل. اشطف يديك جيدًا تحت الماء
	النظيف الجاري.
	- 'كيف تغسل يديك. من الأفضل عمومًا غسل يديك بالماء والصابون. اتبع هذه الخطوات البسيطة:
	1 بلل يديك بالماء الجاري - إما دافئ أو بارد. 2 ضع صابون سائل أو صابون أو مسحوق.
	3 رغوة الصابون جيدا. 4 افرك يديك بقوة لمدة 20 ثانية على الأقل. 5 يشطف جيدا. 6
	جفف يديك بمنشفة نظيفة أو يمكن التخلص منها أو مجفف الهواء. 7 إذا أمكن ، استخدم
	منشفة أو مرفقك لإغلاق الصنبور.'
	- 1 متوسط عمر المنازل في ستيتسفيل ، نورث كارولاينا هو 45 عامًا. 2 بالنسبة للمنازل
	ذات الرهون العقارية ، يبلغ متوسط تكلفة المالك 1132 دولارًا شهريًا. 3 المنزل
	النموذجي له 5 غرف. 4 46.0٪ من المنازل يشغلها مالكوها و 38.8٪ مؤجرة. اعتبارًا من
	التعداد الأخير ، كان معدل البطالة في ستيتسفيل ، نورث كارولاينا البالغ 10.8 ٪ أسوأ
	من المتوسط الوطني البالغ 7.9 ٪. 2 معدل الفقر في ستيتسفيل ، نورث كارولاينا هو
	18.3٪.
	- source_sentence: شخصان يركضان على الشاطئ
	sentences:
	- شخصان يركزان بالخارج
	- كيف يمكنني تغيير شخصيتي من الانطوائي إلى الانطوائي؟
	- امرأة تجري بمفردها عبر البلدة
	- source_sentence: طفل صغير يرتدي قميص أبيض ينظر إلى دراجة لعبة
	sentences:
	- رجل نائم في الحافلة
	- طفل يركب في سيارة
	- طفل ينظر إلى الأشياء.
	pipeline_tag: sentence-similarity
	library_name: sentence-transformers
	metrics:
	- pearson_cosine
	- spearman_cosine
	- pearson_manhattan
	- spearman_manhattan
	- pearson_euclidean
	- spearman_euclidean
	- pearson_dot
	- spearman_dot
	- pearson_max
	- spearman_max
	model-index:
	- name: omarelshehy/matryoska-sts-0.86
	results:
	- dataset:
	config: ar-ar
	name: MTEB STS17 (ar-ar)
	revision: faeb762787bd10488a50c8b5be4a3b82e411949c
	split: test
	type: mteb/sts17-crosslingual-sts
	metrics:
	- type: pearson
	value: 85.1977
	- type: spearman
	value: 86.0559
	- type: cosine_pearson
	value: 85.1977
	- type: cosine_spearman
	value: 86.0559
	- type: manhattan_pearson
	value: 83.01950000000001
	- type: manhattan_spearman
	value: 85.28620000000001
	- type: euclidean_pearson
	value: 83.1524
	- type: euclidean_spearman
	value: 85.3787
	- type: main_score
	value: 86.0559
	task:
	type: STS
	- dataset:
	config: en-ar
	name: MTEB STS17 (en-ar)
	revision: faeb762787bd10488a50c8b5be4a3b82e411949c
	split: test
	type: mteb/sts17-crosslingual-sts
	metrics:
	- type: pearson
	value: 16.234
	- type: spearman
	value: 13.337499999999999
	- type: cosine_pearson
	value: 16.234
	- type: cosine_spearman
	value: 13.337499999999999
	- type: manhattan_pearson
	value: 11.103200000000001
	- type: manhattan_spearman
	value: 8.8513
	- type: euclidean_pearson
	value: 10.7335
	- type: euclidean_spearman
	value: 7.857
	- type: main_score
	value: 13.337499999999999
	task:
	type: STS
	- dataset:
	config: ar
	name: MTEB STS22 (ar)
	revision: de9d86b3b84231dc21f76c7b7af1f28e2f57f6e3
	split: test
	type: mteb/sts22-crosslingual-sts
	metrics:
	- type: pearson
	value: 49.8116
	- type: spearman
	value: 58.7217
	- type: cosine_pearson
	value: 49.8116
	- type: cosine_spearman
	value: 58.7217
	- type: manhattan_pearson
	value: 55.281499999999994
	- type: manhattan_spearman
	value: 58.658
	- type: euclidean_pearson
	value: 54.600300000000004
	- type: euclidean_spearman
	value: 58.59029999999999
	- type: main_score
	value: 58.7217
	task:
	type: STS
	---

	# SentenceTransformer based on aubmindlab/bert-base-arabertv02

	This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [aubmindlab/bert-base-arabertv02](https://huggingface.co/aubmindlab/bert-base-arabertv02). It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

	## Model Details

	### Model Description
	- Model Type: Sentence Transformer
	- Base model: [aubmindlab/bert-base-arabertv02](https://huggingface.co/aubmindlab/bert-base-arabertv02) <!-- at revision 016fb9d6768f522a59c6e0d2d5d5d43a4e1bff60 -->
	- Maximum Sequence Length: 512 tokens
	- Output Dimensionality: 768 tokens
	- Similarity Function: Cosine Similarity
	<!-- - Training Dataset: Unknown -->
	<!-- - Language: Unknown -->
	<!-- - License: Unknown -->

	### Model Sources

	- Documentation: [Sentence Transformers Documentation](https://sbert.net)
	- Repository: [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
	- Hugging Face: [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)

	### Full Model Architecture

	```
	SentenceTransformer(
	(0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
	(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
	)
	```

	## Usage

	### Direct Usage (Sentence Transformers)

	First install the Sentence Transformers library:

	```bash
	pip install -U sentence-transformers
	```

	Then you can load this model and run inference.
	```python
	from sentence_transformers import SentenceTransformer

	# Download from the 🤗 Hub
	model = SentenceTransformer("omarelshehy/Arabic-STS-Matryoshka-V2")
	# Run inference
	sentences = [
	'طفل صغير يرتدي قميص أبيض ينظر إلى دراجة لعبة',
	'طفل ينظر إلى الأشياء.',
	'طفل يركب في سيارة',
	]
	embeddings = model.encode(sentences)
	print(embeddings.shape)
	# [3, 768]

	# Get the similarity scores for the embeddings
	similarities = model.similarity(embeddings, embeddings)
	print(similarities.shape)
	# [3, 3]
	```

	<!--
	### Direct Usage (Transformers)

	<details><summary>Click to see the direct usage in Transformers</summary>

	</details>
	-->

	<!--
	### Downstream Usage (Sentence Transformers)

	You can finetune this model on your own dataset.

	<details><summary>Click to expand</summary>

	</details>
	-->

	<!--
	### Out-of-Scope Use

	List how the model may foreseeably be misused and address what users ought not to do with the model.
	-->

	## Citation

	### BibTeX

	#### Sentence Transformers
	```bibtex
	@inproceedings{reimers-2019-sentence-bert,
	title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
	author = "Reimers, Nils and Gurevych, Iryna",
	booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
	month = "11",
	year = "2019",
	publisher = "Association for Computational Linguistics",
	url = "https://arxiv.org/abs/1908.10084",
	}
	```

	#### MatryoshkaLoss
	```bibtex
	@misc{kusupati2024matryoshka,
	title={Matryoshka Representation Learning},
	author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
	year={2024},
	eprint={2205.13147},
	archivePrefix={arXiv},
	primaryClass={cs.LG}
	}
	```

	#### MultipleNegativesRankingLoss
	```bibtex
	@misc{henderson2017efficient,
	title={Efficient Natural Language Response Suggestion for Smart Reply},
	author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
	year={2017},
	eprint={1705.00652},
	archivePrefix={arXiv},
	primaryClass={cs.CL}
	}
	```

	<!--
	## Glossary

	Clearly define terms in order to be accessible across audiences.
	-->

	<!--
	## Model Card Authors

	Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.
	-->

	<!--
	## Model Card Contact

	Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.
	-->