mDeBERTa-v3-base-myanmar-xnli / README.md

Update README.md

d6bb19d verified 8 months ago

6.13 kB

	---
	datasets:
	- akhtet/myXNLI
	metrics:
	- accuracy
	pipeline_tag: zero-shot-classification
	widget:
	- text: >-
	မြန်မာ့စီးပွားရေးမှာ ရွှေ နဲ့ ဒေါ်လာက အရေးပါသလို ဒေါ်လာစျေးပေါ်မူတည်ပြီး
	အခြားစားသောက်ကုန်ပစ္စည်းတွေကလည်း လိုက်ပါပြောင်းလဲလေ့ ရှိပါတယ်။
	candidate_labels: commerce, fashion, music, politics, sports
	multi_class: false
	example_title: Myanmar Economy
	- text: >-
	၂၀၁၇ ခုနှစ် ဇွန်လ ၃၀ ရက်နေ့တွင် ရန်ကုန်မြို့၌ ကျင်းပသော ကိုယ်ခံပညာပေါင်းစုံ
	ပြိုင်ပွဲ မစ်ဒယ်ဝိတ်တန်း ကမ္ဘာ့ချန်ပီယံလုပွဲတွင် အောင်လအန်ဆန်က လက်ရှိ
	ချန်ပီယံ ဘစ်ဒက်ရှ်ကို လက်ရည်အသာဖြင့် ထိုးသတ်ပြီး အမှတ်ဖြင့် အနိုင်ရကာ
	ကမ္ဘာ့ချန်ပီယံဆု ဆွတ်ခူးနိုင်သော ပထမဆုံး မြန်မာတစ်ဦး ဖြစ်လာသည်။
	candidate_labels: boxing, football, MMA, racing, swimming
	multi_class: false
	example_title: MMA Championship
	- text: >-
	မြန်မာ့ရိုးရာ အစားအစာတစ်မျိုးဖြစ်သော မုန့်ဟင်းငါးသည် အချဉ်ဖောက် ပြုလုပ်သည့်
	ဆန်ခေါက်ဆွဲဖတ် (မုန့်ဖတ်) လေးများနှင့် ငါးဖြင့် အဓိက ချက်လုပ်သော
	မုန့်ဟင်းငါးဟင်းရည် ခေါ် ဟင်းငါးရည် တို့ကို အခြား အစာပလာများနှင့် အတူတွဲဖက်
	စားသုံးရသည့် သွားရည်စာ တစ်မျိုးဖြစ်သည်။
	candidate_labels: chicken, fish, food, rice, soup
	multi_class: true
	example_title: Local Food
	license: mit
	language:
	- my
	- en
	---
	# Model Card for mDeBERTa-v3-base-myXNLI

	mDeBERTa-v3-base-myXNLI is a transformer model for text classification English and Myanmar (Burmese).

	It is based on multilingual DeBERTa v3 model and fine-tuned using myXNLI dataset on the Natural Language Inference task in English and Myanmar.

	Thus it is useful for Natural Language Inference and related tasks such as Zero-shot Text Classification on both English and Myanmar data.

	## Model Details

	- Model type: Transformer Encoder
	- Language(s) (NLP): Fine-tuned for Myanmar (Burmese) and English
	- License: MIT
	- Finetuned from model: mDeBERTa v3 base https://huggingface.co/microsoft/mdeberta-v3-base
	- Paper : Myanmar XNLI https://www.researchsquare.com/article/rs-4329843
	- Demo : A demo of Zero-shot Text Classification in Myanmar can be found on this page.

	## Bias, Risks, and Limitations

	Please refer to the papers for original foundation model: DeBERTa https://arxiv.org/abs/2006.03654 and DeBERTaV3 https://arxiv.org/abs/2111.09543.
	<!-- Any limitations with myXNLI ? -->

	## How to Get Started with the Model

	Use the code below to get started with the model for zero-shot classification task.

	```
	from transformers import pipeline

	classifier = pipeline(task="zero-shot-classification", model="akhtet/mDeBERTa-v3-base-myXNLI", framework="pt")

	output = classifier("မြန်မာ့စီးပွားရေးမှာ ရွှေ နဲ့ ဒေါ်လာက အရေးပါသလို ဒေါ်လာစျေးပေါ်မူတည်ပြီး အခြားစားသောက်ကုန်ပစ္စည်းတွေကလည်း လိုက်ပါပြောင်းလဲလေ့ ရှိပါတယ်။",
	candidate_labels=["commerce", "fashion", "music", "politics", "sports"],
	)

	print (output)
	# output
	# {'sequence': 'မြန်မာ့စီးပွားရေးမှာ ရွှေ နဲ့ ဒေါ်လာက အရေးပါသလို ဒေါ်လာစျေးပေါ်မူတည်ပြီး အခြားစားသောက်ကုန်ပစ္စည်းတွေကလည်း လိုက်ပါပြောင်းလဲလေ့ ရှိပါတယ်။',
	# 'labels': ['commerce', 'politics', 'fashion', 'music', 'sports'],
	# 'scores': [0.8995707631111145, 0.048580411821603775, 0.035297513008117676, 0.009092549793422222, 0.007458842825144529]}
	```

	Fore more details on zero-shot classification, please refer to HuggingFace documentation https://huggingface.co/tasks/zero-shot-classification

	## Training Details

	The model is fine-tuned on myXNLI dataset https://huggingface.co/datasets/akhtet/myXNLI. The English portion of myXNLI is from XNLI dataset.

	From this dataset, 4 different copies training data from myXNLI were concatenated, each with sentence pairs in en-en, en-my, my-en and my-my combinations.

	Training on cross-matched language data as above improved the NLI accuracy over training separately in each language.
	This approach was inspired by another model https://huggingface.co/joeddav/xlm-roberta-large-xnli

	The model was fine-tuned using this combined dataset for a single epoch.

	## Evaluation

	This model has been evaluted on myXNLI testset for Myanmar accuracy. We also provide the accuracy for English using XNLI testset.

	\| Model \| English accuracy \| Myanmar accuracy \|
	\| ----- \| ----- \| ----- \|
	\| mDeBERTa-v3-base-myXNLI \| 88.02 \| 80.99 \|

	## Citation

	<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->

	[More Information Needed]

	## Model Card Contact

	Aung Kyaw Htet