|
--- |
|
license: cc-by-nd-4.0 |
|
language: |
|
- en |
|
- nyi |
|
metrics: |
|
- bleu |
|
base_model: |
|
- Helsinki-NLP/opus-mt-en-hi |
|
pipeline_tag: translation |
|
library_name: transformers |
|
tags: |
|
- english |
|
- nyishi |
|
- nmt |
|
- translation |
|
- nlp |
|
--- |
|
# Model Card for Model ID |
|
|
|
The **eng-nyi-nmt** model is a neural machine translation (NMT) model fine-tuned on the **EnNyiCorp** (under development), consisting of English and Nyishi language pairs. Nyishi, a low-resource language spoken in Arunachal Pradesh, India, faces challenges due to the scarcity of digital resources and linguistic datasets. This model aims to support the translation of Nyishi, helping preserve and promote its use in digital spaces. |
|
|
|
To develop **eng-nyi-nmt**, the pre-trained model **Helsinki-NLP/opus-mt-en-hi** (English-to-Hindi) was leveraged as a foundation, given the structural similarities between Hindi and Nyishi in a multilingual context. Using transfer learning on this model allowed efficient adaptation of the Nyishi translation model, even with limited language data. |
|
|
|
## Model Details |
|
|
|
### Model Description |
|
- **Developed by:** Tungon Dugi and Nabam Kakum |
|
- **Affiliation:** National Institute of Technology Arunachal Pradesh, India |
|
- **Email:** [email protected] or [email protected] |
|
- **Model type:** Translation |
|
- **Language(s) (NLP):** English (en) and Nyishi (nyi) |
|
- **Finetuned from model:** Helsinki-NLP/opus-mt-en-hi |
|
|
|
### Uses |
|
#### Direct Use |
|
This model can be used for translation and text-to-text generation between English and Nyishi. |
|
|
|
### How to Get Started with the Model |
|
|
|
Use the code below to get started with the model: |
|
|
|
```python |
|
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM |
|
|
|
tokenizer = AutoTokenizer.from_pretrained("repleeka/eng-nyi-nmt") |
|
model = AutoModelForSeq2SeqLM.from_pretrained("repleeka/eng-nyi-nmt") |
|
``` |
|
|
|
## Training Details |
|
### Training Data |
|
The model was trained using the **EnNyiCopr** dataset, which comprises aligned sentence pairs in English and Nyishi. This dataset was curated to support low-resource language machine translation, focusing on preserving and promoting Nyishi language in digital spaces. |
|
|
|
### Evaluation |
|
The model was evaluated on translation quality using common metrics, specifically BLEU score, and runtime efficiency. |
|
|
|
| Metric | Value | |
|
|------------------------|------------------------| |
|
| **BLEU Score** | 0.1468 | |
|
| **Evaluation Runtime** | 1237.5341 seconds | |
|
|
|
The BLEU score indicates a foundational level of translation quality for English-to-Nyishi, given the limited data resources. Although further refinement is needed, this result shows encouraging progress toward accurate translations. |
|
|
|
### Summary |
|
The **eng-nyi-nmt** model is in the early stages of development, offering initial translation capabilities between English and Nyishi. Further dataset expansion and enhanced training resources are crucial for advancing the model's performance, enabling better generalization and translation accuracy for practical applications. Continued efforts are essential for refining and optimizing the model's translation capabilities to address the needs of this extremely low-resource language. |
|
|