File size: 1,474 Bytes
eba5d98
 
 
 
 
 
7e0b91c
 
66df461
eba5d98
 
3f1cc60
66df461
e5ba936
66df461
 
 
 
 
 
449ad7f
66df461
 
 
 
e5ba936
66df461
 
 
 
 
 
e5ba936
 
66df461
 
e5ba936
66df461
e5ba936
66df461
e5ba936
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
---
license: mit
language:
- en
- eu
tags:
- text2text-generation
- open-nmt
- pytorch
---

# Itzune v1.0 EN -> EU machine translation argos model

This model was trained using [argostrain](https://github.com/argosopentech/argos-train) training scripts with 11,542,706 English to Basque parallel strings extracted from datasets obtained directly from the [Opus project](https://opus.nlpl.eu/).

## Model description


- **Developed by:** Basque community
- **Model type:** traslation
- **Model version:** v1.0
- **Source Language:** English
- **Target Language:** Basque
- **License:** MIT

## Training Data

The English-Basque parallel sentences were collected from the following datasets:

| Dataset       	   | Sentences before cleaning |
|----------------------|--------------------------:|
| CCMatrix v1          | 7,788,871  	           |
| OpenSubtitles	v2018  | 805,780                   |
| XLEnt v1.2           | 800,631                   |
| GNOME v1             | 652,298                   |
| HPLT v1.1            | 610,694                   |
| EhuHac v1            | 585,210	               |
| WikiMatrix v1	       | 119,480                   |
| KDE4 v2              | 100,160                   |
| wikimedia v20230407  | 60,990                    |
| bible-uedin v1       | 15,893                    |
| Tatoeba v2023-04-12  | 2,070                     |
| Wiktionary           | 629                       |
| **Total**     	   | **11,542,706**            |