readme: remove information about monolingual language models
Browse files
README.md
CHANGED
@@ -191,90 +191,6 @@ The following plot shows the pretraining loss curve:
|
|
191 |
|
192 |
![Training loss curve](stats/figures/pretraining_loss_historic-multilingual.png)
|
193 |
|
194 |
-
## English model
|
195 |
-
|
196 |
-
The English BERT model - with texts from British Library corpus - was trained with the Hugging Face
|
197 |
-
JAX/FLAX implementation for 10 epochs (approx. 1M steps) on a v3-8 TPU, using the following command:
|
198 |
-
|
199 |
-
```bash
|
200 |
-
python3 run_mlm_flax.py --model_type bert \
|
201 |
-
--config_name /mnt/datasets/bert-base-historic-english-cased/ \
|
202 |
-
--tokenizer_name /mnt/datasets/bert-base-historic-english-cased/ \
|
203 |
-
--train_file /mnt/datasets/bl-corpus/bl_1800-1900_extracted.txt \
|
204 |
-
--validation_file /mnt/datasets/bl-corpus/english_validation.txt \
|
205 |
-
--max_seq_length 512 \
|
206 |
-
--per_device_train_batch_size 16 \
|
207 |
-
--learning_rate 1e-4 \
|
208 |
-
--num_train_epochs 10 \
|
209 |
-
--preprocessing_num_workers 96 \
|
210 |
-
--output_dir /mnt/datasets/bert-base-historic-english-cased-512-noadafactor-10e \
|
211 |
-
--save_steps 2500 \
|
212 |
-
--eval_steps 2500 \
|
213 |
-
--warmup_steps 10000 \
|
214 |
-
--line_by_line \
|
215 |
-
--pad_to_max_length
|
216 |
-
```
|
217 |
-
|
218 |
-
The following plot shows the pretraining loss curve:
|
219 |
-
|
220 |
-
![Training loss curve](stats/figures/pretraining_loss_historic_english.png)
|
221 |
-
|
222 |
-
## Finnish model
|
223 |
-
|
224 |
-
The BERT model - with texts from Finnish part of Europeana - was trained with the Hugging Face
|
225 |
-
JAX/FLAX implementation for 40 epochs (approx. 1M steps) on a v3-8 TPU, using the following command:
|
226 |
-
|
227 |
-
```bash
|
228 |
-
python3 run_mlm_flax.py --model_type bert \
|
229 |
-
--config_name /mnt/datasets/bert-base-finnish-europeana-cased/ \
|
230 |
-
--tokenizer_name /mnt/datasets/bert-base-finnish-europeana-cased/ \
|
231 |
-
--train_file /mnt/datasets/hlms/extracted_content_Finnish_0.6.txt \
|
232 |
-
--validation_file /mnt/datasets/hlms/finnish_validation.txt \
|
233 |
-
--max_seq_length 512 \
|
234 |
-
--per_device_train_batch_size 16 \
|
235 |
-
--learning_rate 1e-4 \
|
236 |
-
--num_train_epochs 40 \
|
237 |
-
--preprocessing_num_workers 96 \
|
238 |
-
--output_dir /mnt/datasets/bert-base-finnish-europeana-cased-512-dupe1-noadafactor-40e \
|
239 |
-
--save_steps 2500 \
|
240 |
-
--eval_steps 2500 \
|
241 |
-
--warmup_steps 10000 \
|
242 |
-
--line_by_line \
|
243 |
-
--pad_to_max_length
|
244 |
-
```
|
245 |
-
|
246 |
-
The following plot shows the pretraining loss curve:
|
247 |
-
|
248 |
-
![Training loss curve](stats/figures/pretraining_loss_finnish_europeana.png)
|
249 |
-
|
250 |
-
## Swedish model
|
251 |
-
|
252 |
-
The BERT model - with texts from Swedish part of Europeana - was trained with the Hugging Face
|
253 |
-
JAX/FLAX implementation for 40 epochs (approx. 660K steps) on a v3-8 TPU, using the following command:
|
254 |
-
|
255 |
-
```bash
|
256 |
-
python3 run_mlm_flax.py --model_type bert \
|
257 |
-
--config_name /mnt/datasets/bert-base-swedish-europeana-cased/ \
|
258 |
-
--tokenizer_name /mnt/datasets/bert-base-swedish-europeana-cased/ \
|
259 |
-
--train_file /mnt/datasets/hlms/extracted_content_Swedish_0.6.txt \
|
260 |
-
--validation_file /mnt/datasets/hlms/swedish_validation.txt \
|
261 |
-
--max_seq_length 512 \
|
262 |
-
--per_device_train_batch_size 16 \
|
263 |
-
--learning_rate 1e-4 \
|
264 |
-
--num_train_epochs 40 \
|
265 |
-
--preprocessing_num_workers 96 \
|
266 |
-
--output_dir /mnt/datasets/bert-base-swedish-europeana-cased-512-dupe1-noadafactor-40e \
|
267 |
-
--save_steps 2500 \
|
268 |
-
--eval_steps 2500 \
|
269 |
-
--warmup_steps 10000 \
|
270 |
-
--line_by_line \
|
271 |
-
--pad_to_max_length
|
272 |
-
```
|
273 |
-
|
274 |
-
The following plot shows the pretraining loss curve:
|
275 |
-
|
276 |
-
![Training loss curve](stats/figures/pretraining_loss_swedish_europeana.png)
|
277 |
-
|
278 |
# Acknowledgments
|
279 |
|
280 |
Research supported with Cloud TPUs from Google's TPU Research Cloud (TRC) program, previously known as
|
|
|
191 |
|
192 |
![Training loss curve](stats/figures/pretraining_loss_historic-multilingual.png)
|
193 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
194 |
# Acknowledgments
|
195 |
|
196 |
Research supported with Cloud TPUs from Google's TPU Research Cloud (TRC) program, previously known as
|