urtzai commited on
Commit
66df461
·
verified ·
1 Parent(s): 7e0b91c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +33 -1
README.md CHANGED
@@ -6,6 +6,38 @@ language:
6
  tags:
7
  - text2text-generation
8
  - open-nmt
 
9
  ---
10
 
11
- # EN -> EU argos model
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6
  tags:
7
  - text2text-generation
8
  - open-nmt
9
+ - pytorch
10
  ---
11
 
12
+ # Itzune EN -> EU machine translation argos model
13
+
14
+ This model was trained using [argostrain](https://github.com/argosopentech/argos-train) training scripts with ~12M English to Basque parallel string datasets obtained directly from the [Opus project](https://opus.nlpl.eu/).
15
+
16
+ ## Model description
17
+
18
+
19
+ - **Developed by:** Basque community
20
+ - **Model type:** traslation
21
+ - **Model version:** v0.1
22
+ - **Source Language:** English
23
+ - **Target Language:** Basque
24
+ - **License:** MIT
25
+
26
+ <!-- ## Training Data
27
+
28
+ The English-Basque parallel sentences were collected from the following datasets:
29
+
30
+ | Dataset | Sentences before cleaning |
31
+ |----------------------|--------------------------:|
32
+ | CCMatrix v1 | 7,788,871 |
33
+ | EhuHac v1 | 585,210 |
34
+ | bible-uedin v1 | 15,893 |
35
+ | GNOME v1 | 652,298 |
36
+ | HPLT v1.1 | 610,694 |
37
+ | KDE4 v2 | 100,160 |
38
+ | OpenSubtitles v2018 | 805,780 |
39
+ | Tatoeba v2023-04-12 | 2,070 |
40
+ | WikiMatrix v1 | 119,480 |
41
+ | wikimedia v20230407 | 60,990 |
42
+ | XLEnt v1.2 | 800,631 |
43
+ | **Total** | **15,653,108** | -->