Fairseq
Spanish
Catalan
fdelucaf commited on
Commit
1ed5d4a
·
verified ·
1 Parent(s): 4886eea

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -5
README.md CHANGED
@@ -89,7 +89,7 @@ The model was trained on a combination of the following datasets:
89
 
90
  ### Data preparation
91
 
92
- All datasets are concatenated and filtered using the [mBERT Gencata parallel filter](https://huggingface.co/projecte-aina/mbert-base-gencata)
93
  and cleaned using the clean-corpus-n.pl script from [moses](https://github.com/moses-smt/mosesdecoder), allowing sentences between 5 and 150 words.
94
 
95
  Before training, the punctuation was normalized using a modified version of the join-single-file.py script from
@@ -132,7 +132,7 @@ Weights were saved every 1000 updates and reported results are the average of th
132
 
133
  ## Evaluation
134
 
135
- ### Variable and metrics
136
 
137
  We use the BLEU score for evaluation on following test sets:
138
  [Flores-101](https://github.com/facebookresearch/flores),
@@ -168,14 +168,14 @@ Language Technologies Unit (LangTech) at the Barcelona Supercomputing Center.
168
  For further information, please send an email to [email protected].
169
 
170
  ### Copyright
171
- Copyright Language Technologies Unit at Barcelona Supercomputing Center (2023).
172
 
173
 
174
  ### Licensing Information
175
- This work is licensed under a [Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0).
176
 
177
  ### Funding
178
- This work has been promoted and financed by the Generalitat de Catalunya through the [Aina project] (https://projecteaina.cat/).
179
 
180
  ## Disclaimer
181
 
 
89
 
90
  ### Data preparation
91
 
92
+ All datasets were concatenated and filtered using the [mBERT Gencata parallel filter](https://huggingface.co/projecte-aina/mbert-base-gencata)
93
  and cleaned using the clean-corpus-n.pl script from [moses](https://github.com/moses-smt/mosesdecoder), allowing sentences between 5 and 150 words.
94
 
95
  Before training, the punctuation was normalized using a modified version of the join-single-file.py script from
 
132
 
133
  ## Evaluation
134
 
135
+ ### Variables and metrics
136
 
137
  We use the BLEU score for evaluation on following test sets:
138
  [Flores-101](https://github.com/facebookresearch/flores),
 
168
  For further information, please send an email to [email protected].
169
 
170
  ### Copyright
171
+ Language Technologies Unit at Barcelona Supercomputing Center (2023).
172
 
173
 
174
  ### Licensing Information
175
+ This work is licensed under an [Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0).
176
 
177
  ### Funding
178
+ This work has been promoted and financed by the Generalitat de Catalunya through the [Aina project](https://projecteaina.cat/).
179
 
180
  ## Disclaimer
181