projecte-aina
/

aina-translator-es-ca

Model card Files Files and versions Community

fdelucaf commited on Feb 20, 2024

Commit

1ed5d4a

·

verified ·

1 Parent(s): 4886eea

Update README.md

Files changed (1) hide show

README.md +5 -5

README.md CHANGED Viewed

@@ -89,7 +89,7 @@ The model was trained on a combination of the following datasets:
 ### Data preparation
- All datasets are concatenated and filtered using the [mBERT Gencata parallel filter](https://huggingface.co/projecte-aina/mbert-base-gencata)
  and cleaned using the clean-corpus-n.pl script from [moses](https://github.com/moses-smt/mosesdecoder), allowing sentences between 5 and 150 words.
  Before training, the punctuation was normalized using a modified version of the join-single-file.py script from
@@ -132,7 +132,7 @@ Weights were saved every 1000 updates and reported results are the average of th
 ## Evaluation
-### Variable and metrics
 We use the BLEU score for evaluation on following test sets:
 [Flores-101](https://github.com/facebookresearch/flores),
@@ -168,14 +168,14 @@ Language Technologies Unit (LangTech) at the Barcelona Supercomputing Center.
 For further information, please send an email to [email protected].
 ### Copyright
-Copyright Language Technologies Unit at Barcelona Supercomputing Center (2023).
 ### Licensing Information
-This work is licensed under a [Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0).
 ### Funding
-This work has been promoted and financed by the Generalitat de Catalunya through the [Aina project] (https://projecteaina.cat/).
 ## Disclaimer

 ### Data preparation
+ All datasets were concatenated and filtered using the [mBERT Gencata parallel filter](https://huggingface.co/projecte-aina/mbert-base-gencata)
  and cleaned using the clean-corpus-n.pl script from [moses](https://github.com/moses-smt/mosesdecoder), allowing sentences between 5 and 150 words.
  Before training, the punctuation was normalized using a modified version of the join-single-file.py script from
 ## Evaluation
+### Variables and metrics
 We use the BLEU score for evaluation on following test sets:
 [Flores-101](https://github.com/facebookresearch/flores),
 For further information, please send an email to [email protected].
 ### Copyright
+Language Technologies Unit at Barcelona Supercomputing Center (2023).
 ### Licensing Information
+This work is licensed under an [Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0).
 ### Funding
+This work has been promoted and financed by the Generalitat de Catalunya through the [Aina project](https://projecteaina.cat/).
 ## Disclaimer