|
--- |
|
license: mit |
|
language: |
|
- en |
|
- eu |
|
tags: |
|
- text2text-generation |
|
- open-nmt |
|
- pytorch |
|
--- |
|
|
|
# Itzune v1.0 EN -> EU machine translation argos model |
|
|
|
This model was trained using [argostrain](https://github.com/argosopentech/argos-train) training scripts with 11,542,706 English to Basque parallel strings extracted from datasets obtained directly from the [Opus project](https://opus.nlpl.eu/). |
|
|
|
## Model description |
|
|
|
|
|
- **Developed by:** Basque community |
|
- **Model type:** traslation |
|
- **Model version:** v1.0 |
|
- **Source Language:** English |
|
- **Target Language:** Basque |
|
- **License:** MIT |
|
|
|
## Training Data |
|
|
|
The English-Basque parallel sentences were collected from the following datasets: |
|
|
|
| Dataset | Sentences before cleaning | |
|
|----------------------|--------------------------:| |
|
| CCMatrix v1 | 7,788,871 | |
|
| OpenSubtitles v2018 | 805,780 | |
|
| XLEnt v1.2 | 800,631 | |
|
| GNOME v1 | 652,298 | |
|
| HPLT v1.1 | 610,694 | |
|
| EhuHac v1 | 585,210 | |
|
| WikiMatrix v1 | 119,480 | |
|
| KDE4 v2 | 100,160 | |
|
| wikimedia v20230407 | 60,990 | |
|
| bible-uedin v1 | 15,893 | |
|
| Tatoeba v2023-04-12 | 2,070 | |
|
| Wiktionary | 629 | |
|
| **Total** | **11,542,706** | |