repleeka
/

eng-nyi-nmt

text2text-generation

Inference Endpoints

Model card Files Files and versions Community

eng-nyi-nmt / README.md

repleeka's picture

Update README.md

70ac1a9 verified 2 months ago

|

history blame contribute delete

3.2 kB

	---
	license: cc-by-nd-4.0
	language:
	- en
	- nyi
	metrics:
	- bleu
	base_model:
	- Helsinki-NLP/opus-mt-en-hi
	pipeline_tag: translation
	library_name: transformers
	tags:
	- english
	- nyishi
	- nmt
	- translation
	- nlp
	---
	# Model Card for Model ID

	The eng-nyi-nmt model is a neural machine translation (NMT) model fine-tuned on the EnNyiCorp (under development), consisting of English and Nyishi language pairs. Nyishi, a low-resource language spoken in Arunachal Pradesh, India, faces challenges due to the scarcity of digital resources and linguistic datasets. This model aims to support the translation of Nyishi, helping preserve and promote its use in digital spaces.

	To develop eng-nyi-nmt, the pre-trained model Helsinki-NLP/opus-mt-en-hi (English-to-Hindi) was leveraged as a foundation, given the structural similarities between Hindi and Nyishi in a multilingual context. Using transfer learning on this model allowed efficient adaptation of the Nyishi translation model, even with limited language data.

	## Model Details

	### Model Description
	- Developed by: Tungon Dugi and Nabam Kakum
	- Affiliation: National Institute of Technology Arunachal Pradesh, India
	- Email: [email protected] or [email protected]
	- Model type: Translation
	- Language(s) (NLP): English (en) and Nyishi (nyi)
	- Finetuned from model: Helsinki-NLP/opus-mt-en-hi

	### Uses
	#### Direct Use
	This model can be used for translation and text-to-text generation between English and Nyishi.

	### How to Get Started with the Model

	Use the code below to get started with the model:

	```python
	from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

	tokenizer = AutoTokenizer.from_pretrained("repleeka/eng-nyi-nmt")
	model = AutoModelForSeq2SeqLM.from_pretrained("repleeka/eng-nyi-nmt")
	```

	## Training Details
	### Training Data
	The model was trained using the EnNyiCopr dataset, which comprises aligned sentence pairs in English and Nyishi. This dataset was curated to support low-resource language machine translation, focusing on preserving and promoting Nyishi language in digital spaces.

	### Evaluation
	The model was evaluated on translation quality using common metrics, specifically BLEU score, and runtime efficiency.

	\| Metric \| Value \|
	\|------------------------\|------------------------\|
	\| BLEU Score \| 0.1468 \|
	\| Evaluation Runtime \| 1237.5341 seconds \|

	The BLEU score indicates a foundational level of translation quality for English-to-Nyishi, given the limited data resources. Although further refinement is needed, this result shows encouraging progress toward accurate translations.

	### Summary
	The eng-nyi-nmt model is in the early stages of development, offering initial translation capabilities between English and Nyishi. Further dataset expansion and enhanced training resources are crucial for advancing the model's performance, enabling better generalization and translation accuracy for practical applications. Continued efforts are essential for refining and optimizing the model's translation capabilities to address the needs of this extremely low-resource language.