urtzai commited on
Commit
e5ba936
·
verified ·
1 Parent(s): 66df461

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -9
README.md CHANGED
@@ -11,7 +11,7 @@ tags:
11
 
12
  # Itzune EN -> EU machine translation argos model
13
 
14
- This model was trained using [argostrain](https://github.com/argosopentech/argos-train) training scripts with ~12M English to Basque parallel string datasets obtained directly from the [Opus project](https://opus.nlpl.eu/).
15
 
16
  ## Model description
17
 
@@ -23,21 +23,22 @@ This model was trained using [argostrain](https://github.com/argosopentech/argos
23
  - **Target Language:** Basque
24
  - **License:** MIT
25
 
26
- <!-- ## Training Data
27
 
28
  The English-Basque parallel sentences were collected from the following datasets:
29
 
30
  | Dataset | Sentences before cleaning |
31
  |----------------------|--------------------------:|
32
  | CCMatrix v1 | 7,788,871 |
33
- | EhuHac v1 | 585,210 |
34
- | bible-uedin v1 | 15,893 |
35
  | GNOME v1 | 652,298 |
36
  | HPLT v1.1 | 610,694 |
37
- | KDE4 v2 | 100,160 |
38
- | OpenSubtitles v2018 | 805,780 |
39
- | Tatoeba v2023-04-12 | 2,070 |
40
  | WikiMatrix v1 | 119,480 |
 
41
  | wikimedia v20230407 | 60,990 |
42
- | XLEnt v1.2 | 800,631 |
43
- | **Total** | **15,653,108** | -->
 
 
 
11
 
12
  # Itzune EN -> EU machine translation argos model
13
 
14
+ This model was trained using [argostrain](https://github.com/argosopentech/argos-train) training scripts with 11,542,706 English to Basque parallel strings extracted from datasets obtained directly from the [Opus project](https://opus.nlpl.eu/).
15
 
16
  ## Model description
17
 
 
23
  - **Target Language:** Basque
24
  - **License:** MIT
25
 
26
+ ## Training Data
27
 
28
  The English-Basque parallel sentences were collected from the following datasets:
29
 
30
  | Dataset | Sentences before cleaning |
31
  |----------------------|--------------------------:|
32
  | CCMatrix v1 | 7,788,871 |
33
+ | OpenSubtitles v2018 | 805,780 |
34
+ | XLEnt v1.2 | 800,631 |
35
  | GNOME v1 | 652,298 |
36
  | HPLT v1.1 | 610,694 |
37
+ | EhuHac v1 | 585,210 |
 
 
38
  | WikiMatrix v1 | 119,480 |
39
+ | KDE4 v2 | 100,160 |
40
  | wikimedia v20230407 | 60,990 |
41
+ | bible-uedin v1 | 15,893 |
42
+ | Tatoeba v2023-04-12 | 2,070 |
43
+ | Wiktionary | 629 |
44
+ | **Total** | **11,542,706** |