urtzai's picture
Update README.md
3f1cc60 verified
|
raw
history blame
1.47 kB
---
license: mit
language:
- en
- eu
tags:
- text2text-generation
- open-nmt
- pytorch
---
# Itzune v1.0 EN -> EU machine translation argos model
This model was trained using [argostrain](https://github.com/argosopentech/argos-train) training scripts with 11,542,706 English to Basque parallel strings extracted from datasets obtained directly from the [Opus project](https://opus.nlpl.eu/).
## Model description
- **Developed by:** Basque community
- **Model type:** traslation
- **Model version:** v1.0
- **Source Language:** English
- **Target Language:** Basque
- **License:** MIT
## Training Data
The English-Basque parallel sentences were collected from the following datasets:
| Dataset | Sentences before cleaning |
|----------------------|--------------------------:|
| CCMatrix v1 | 7,788,871 |
| OpenSubtitles v2018 | 805,780 |
| XLEnt v1.2 | 800,631 |
| GNOME v1 | 652,298 |
| HPLT v1.1 | 610,694 |
| EhuHac v1 | 585,210 |
| WikiMatrix v1 | 119,480 |
| KDE4 v2 | 100,160 |
| wikimedia v20230407 | 60,990 |
| bible-uedin v1 | 15,893 |
| Tatoeba v2023-04-12 | 2,070 |
| Wiktionary | 629 |
| **Total** | **11,542,706** |