update to new dataset version
Browse files- README.md +39 -19
- all_results.json +6 -6
- config.json +1 -1
- extract_sents.py +45 -0
- pytorch_model.bin +2 -2
- train_results.json +6 -6
- trainer_state.json +93 -147
- training_args.bin +2 -2
README.md
CHANGED
@@ -40,6 +40,11 @@ The training dataset consists of:
|
|
40 |
|
41 |
These are obtained from the [OPUS](https://opus.nlpl.eu/) base (Tiedemann, 2012) and filtered using [OpusFilter](https://helsinki-nlp.github.io/OpusFilter) (Aulamo et al., 2020), see [`dl_opus.yaml`](dl_opus.yaml) for the details. The filtering is slightly non-deterministic due to the retraining of a statistical alignment model, but in my experience, different runs tend to give extremely similar results. Do not hesitate to reach out if you experience difficulties in using this to collect data.
|
42 |
|
|
|
|
|
|
|
|
|
|
|
43 |
## Training procedure
|
44 |
|
45 |
The training hyperparameters are those suggested by Adelani et al. (2022) in their [code release](https://github.com/masakhane-io/lafand-mt), which gave their best results for machine translation of several African languages.
|
@@ -48,42 +53,57 @@ More specifically, we use the [example training script](https://github.com/huggi
|
|
48 |
|
49 |
```bash
|
50 |
python run_translation.py \
|
51 |
-
|
52 |
-
|
53 |
-
|
54 |
-
|
55 |
-
|
56 |
-
|
57 |
-
|
58 |
-
|
59 |
-
|
60 |
-
|
61 |
-
|
62 |
-
|
63 |
-
|
64 |
```
|
65 |
|
66 |
### Training hyperparameters
|
67 |
|
68 |
The following hyperparameters were used during training:
|
|
|
69 |
- learning_rate: 5e-05
|
70 |
- train_batch_size: 8
|
71 |
- eval_batch_size: 8
|
72 |
- seed: 42
|
73 |
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
|
74 |
- lr_scheduler_type: linear
|
75 |
-
- num_epochs:
|
76 |
|
77 |
### Framework versions
|
78 |
|
79 |
-
- Transformers 4.
|
80 |
- Pytorch 1.12.1+cu116
|
81 |
- Datasets 2.6.1
|
82 |
- Tokenizers 0.13.1
|
83 |
|
84 |
## References
|
85 |
|
86 |
-
- Adelani, David, Jesujoba Alabi, Angela Fan, Julia Kreutzer, Xiaoyu Shen, Machel Reid, Dana Ruiter,
|
87 |
-
|
88 |
-
|
89 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
40 |
|
41 |
These are obtained from the [OPUS](https://opus.nlpl.eu/) base (Tiedemann, 2012) and filtered using [OpusFilter](https://helsinki-nlp.github.io/OpusFilter) (Aulamo et al., 2020), see [`dl_opus.yaml`](dl_opus.yaml) for the details. The filtering is slightly non-deterministic due to the retraining of a statistical alignment model, but in my experience, different runs tend to give extremely similar results. Do not hesitate to reach out if you experience difficulties in using this to collect data.
|
42 |
|
43 |
+
In addition to these, the training dataset also includes parallel br/fr sentences, provided as
|
44 |
+
glosses in the [Arbres](https://arbres.iker.cnrs.fr) wiki (Jouitteau, 2022), obtained from their
|
45 |
+
[ongoing port](https://github.com/Autogramm/Breton/commit/45ac2c444a979b7ee41e5f24a3bfd1ec39f09d7d)
|
46 |
+
to Universal Dependencies in the Autogramm project.
|
47 |
+
|
48 |
## Training procedure
|
49 |
|
50 |
The training hyperparameters are those suggested by Adelani et al. (2022) in their [code release](https://github.com/masakhane-io/lafand-mt), which gave their best results for machine translation of several African languages.
|
|
|
53 |
|
54 |
```bash
|
55 |
python run_translation.py \
|
56 |
+
--model_name_or_path facebook/m2m100_418M \
|
57 |
+
--do_train \
|
58 |
+
--train_file {path_to_training_data} \
|
59 |
+
--source_lang br \
|
60 |
+
--target_lang fr \
|
61 |
+
--output_dir {path_to_model}\
|
62 |
+
--per_device_train_batch_size=8 \
|
63 |
+
--overwrite_output_dir \
|
64 |
+
--forced_bos_token fr \
|
65 |
+
--save_steps 4096 \
|
66 |
+
--fp16 \
|
67 |
+
--num_train_epochs 4
|
68 |
+
|
69 |
```
|
70 |
|
71 |
### Training hyperparameters
|
72 |
|
73 |
The following hyperparameters were used during training:
|
74 |
+
|
75 |
- learning_rate: 5e-05
|
76 |
- train_batch_size: 8
|
77 |
- eval_batch_size: 8
|
78 |
- seed: 42
|
79 |
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
|
80 |
- lr_scheduler_type: linear
|
81 |
+
- num_epochs: 4.0
|
82 |
|
83 |
### Framework versions
|
84 |
|
85 |
+
- Transformers 4.24.0
|
86 |
- Pytorch 1.12.1+cu116
|
87 |
- Datasets 2.6.1
|
88 |
- Tokenizers 0.13.1
|
89 |
|
90 |
## References
|
91 |
|
92 |
+
- Adelani, David, Jesujoba Alabi, Angela Fan, Julia Kreutzer, Xiaoyu Shen, Machel Reid, Dana Ruiter,
|
93 |
+
et al. 2022. « A Few Thousand Translations Go a Long Way! Leveraging Pre-trained Models for
|
94 |
+
African News Translation ». In Proceedings of the 2022 Conference of the North American Chapter of
|
95 |
+
the Association for Computational Linguistics: Human Language Technologies, 3053‑70. Seattle,
|
96 |
+
United States: Association for Computational Linguistics.
|
97 |
+
<https://doi.org/10.18653/v1/2022.naacl-main.223>.
|
98 |
+
- Mikko Aulamo, Sami Virpioja, and Jörg Tiedemann. 2020. OpusFilter: A Configurable Parallel Corpus
|
99 |
+
Filtering Toolbox. In Proceedings of the 58th Annual Meeting of the Association for Computational
|
100 |
+
Linguistics: System Demonstrations, pages 150–156, Online. Association for Computational
|
101 |
+
Linguistics.
|
102 |
+
- Tiedemann, Jorg 2012, Parallel Data, Tools and Interfaces in OPUS. In Proceedings of the 8th
|
103 |
+
International Conference on Language Resources and Evaluation (LREC 2012)
|
104 |
+
- Jouitteau, Mélanie. (éd.). 2009-2022. ARBRES, wikigrammaire des dialectes du breton et centre de
|
105 |
+
ressources pour son étude linguistique formelle, IKER, CNRS, http://arbres.iker.cnrs.fr. Licence
|
106 |
+
Creative Commons BY-NC-SA.
|
107 |
+
- Tyers, Francis M. 2009 "Rule-based augmentation of training data in Breton-French statistical
|
108 |
+
machine translation ". Proceedings of the 13th Annual Conference of the European Association of
|
109 |
+
Machine Translation, EAMT09. Barcelona, España. 213--218
|
all_results.json
CHANGED
@@ -1,8 +1,8 @@
|
|
1 |
{
|
2 |
-
"epoch":
|
3 |
-
"train_loss": 1.
|
4 |
-
"train_runtime":
|
5 |
-
"train_samples":
|
6 |
-
"train_samples_per_second":
|
7 |
-
"train_steps_per_second": 1.
|
8 |
}
|
|
|
1 |
{
|
2 |
+
"epoch": 4.0,
|
3 |
+
"train_loss": 1.4005291703168083,
|
4 |
+
"train_runtime": 11994.4751,
|
5 |
+
"train_samples": 54393,
|
6 |
+
"train_samples_per_second": 18.139,
|
7 |
+
"train_steps_per_second": 1.134
|
8 |
}
|
config.json
CHANGED
@@ -32,7 +32,7 @@
|
|
32 |
"pad_token_id": 1,
|
33 |
"scale_embedding": true,
|
34 |
"torch_dtype": "float32",
|
35 |
-
"transformers_version": "4.
|
36 |
"use_cache": true,
|
37 |
"vocab_size": 128112
|
38 |
}
|
|
|
32 |
"pad_token_id": 1,
|
33 |
"scale_embedding": true,
|
34 |
"torch_dtype": "float32",
|
35 |
+
"transformers_version": "4.24.0",
|
36 |
"use_cache": true,
|
37 |
"vocab_size": 128112
|
38 |
}
|
extract_sents.py
ADDED
@@ -0,0 +1,45 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
from typing import TextIO
|
2 |
+
import re
|
3 |
+
|
4 |
+
import click
|
5 |
+
import conllu
|
6 |
+
import jsonlines
|
7 |
+
|
8 |
+
|
9 |
+
@click.command(help="Extract a parallel corpus from a CoNLL-U file with translations")
|
10 |
+
@click.argument("conllu_path", type=click.File("r"))
|
11 |
+
@click.argument("output_path", type=click.File("w"), default="-")
|
12 |
+
@click.option("--main-langcode", default="br", show_default=True)
|
13 |
+
@click.option("--require-langcode", multiple=True, show_default=True)
|
14 |
+
def main(
|
15 |
+
conllu_path: TextIO,
|
16 |
+
main_langcode: str,
|
17 |
+
output_path: TextIO,
|
18 |
+
require_langcode: list[str],
|
19 |
+
):
|
20 |
+
with jsonlines.Writer(output_path) as out_stream:
|
21 |
+
for tokenlist in conllu.parse_incr(conllu_path):
|
22 |
+
if m := re.match(r"'?(?P<content>[^/]+?)'?$", tokenlist.metadata["text"]):
|
23 |
+
main_text = m.group("content")
|
24 |
+
else:
|
25 |
+
continue
|
26 |
+
translations = {
|
27 |
+
km.group("langcode"): kv.group("content")
|
28 |
+
for k, v in tokenlist.metadata.items()
|
29 |
+
if (km := re.match(r"text_(?P<langcode>.*)", k))
|
30 |
+
and (kv := re.match(r"'?(?P<content>[^/]+?)'?$", v))
|
31 |
+
}
|
32 |
+
if not all(l in translations for l in require_langcode):
|
33 |
+
continue
|
34 |
+
out_stream.write(
|
35 |
+
{
|
36 |
+
"translation": {
|
37 |
+
main_langcode: main_text,
|
38 |
+
**translations,
|
39 |
+
}
|
40 |
+
}
|
41 |
+
)
|
42 |
+
|
43 |
+
|
44 |
+
if __name__ == "__main__":
|
45 |
+
main()
|
pytorch_model.bin
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
-
size
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:c1c8ae0171f992187869f7f6979a8762112705b6caa0404548a13cf039f8a5f1
|
3 |
+
size 1935795713
|
train_results.json
CHANGED
@@ -1,8 +1,8 @@
|
|
1 |
{
|
2 |
-
"epoch":
|
3 |
-
"train_loss": 1.
|
4 |
-
"train_runtime":
|
5 |
-
"train_samples":
|
6 |
-
"train_samples_per_second":
|
7 |
-
"train_steps_per_second": 1.
|
8 |
}
|
|
|
1 |
{
|
2 |
+
"epoch": 4.0,
|
3 |
+
"train_loss": 1.4005291703168083,
|
4 |
+
"train_runtime": 11994.4751,
|
5 |
+
"train_samples": 54393,
|
6 |
+
"train_samples_per_second": 18.139,
|
7 |
+
"train_steps_per_second": 1.134
|
8 |
}
|
trainer_state.json
CHANGED
@@ -1,241 +1,187 @@
|
|
1 |
{
|
2 |
"best_metric": null,
|
3 |
"best_model_checkpoint": null,
|
4 |
-
"epoch":
|
5 |
-
"global_step":
|
6 |
"is_hyper_param_search": false,
|
7 |
"is_local_process_zero": true,
|
8 |
"is_world_process_zero": true,
|
9 |
"log_history": [
|
10 |
{
|
11 |
-
"epoch": 0.
|
12 |
-
"learning_rate": 4.
|
13 |
-
"loss": 2.
|
14 |
"step": 500
|
15 |
},
|
16 |
{
|
17 |
-
"epoch": 0.
|
18 |
-
"learning_rate": 4.
|
19 |
-
"loss": 2.
|
20 |
"step": 1000
|
21 |
},
|
22 |
{
|
23 |
-
"epoch": 0.
|
24 |
-
"learning_rate": 4.
|
25 |
-
"loss": 2.
|
26 |
"step": 1500
|
27 |
},
|
28 |
{
|
29 |
-
"epoch": 0.
|
30 |
-
"learning_rate": 4.
|
31 |
-
"loss":
|
32 |
"step": 2000
|
33 |
},
|
34 |
{
|
35 |
-
"epoch": 0.
|
36 |
-
"learning_rate": 4.
|
37 |
-
"loss":
|
38 |
"step": 2500
|
39 |
},
|
40 |
{
|
41 |
-
"epoch": 0.
|
42 |
-
"learning_rate":
|
43 |
-
"loss": 1.
|
44 |
"step": 3000
|
45 |
},
|
46 |
{
|
47 |
-
"epoch":
|
48 |
-
"learning_rate":
|
49 |
-
"loss": 1.
|
50 |
"step": 3500
|
51 |
},
|
52 |
{
|
53 |
-
"epoch":
|
54 |
-
"learning_rate": 3.
|
55 |
-
"loss": 1.
|
56 |
"step": 4000
|
57 |
},
|
58 |
{
|
59 |
-
"epoch":
|
60 |
-
"learning_rate": 3.
|
61 |
-
"loss": 1.
|
62 |
"step": 4500
|
63 |
},
|
64 |
{
|
65 |
-
"epoch":
|
66 |
-
"learning_rate": 3.
|
67 |
-
"loss": 1.
|
68 |
"step": 5000
|
69 |
},
|
70 |
{
|
71 |
-
"epoch":
|
72 |
-
"learning_rate":
|
73 |
-
"loss": 1.
|
74 |
"step": 5500
|
75 |
},
|
76 |
{
|
77 |
-
"epoch":
|
78 |
-
"learning_rate":
|
79 |
-
"loss": 1.
|
80 |
"step": 6000
|
81 |
},
|
82 |
{
|
83 |
-
"epoch": 1.
|
84 |
-
"learning_rate":
|
85 |
-
"loss": 1.
|
86 |
"step": 6500
|
87 |
},
|
88 |
{
|
89 |
-
"epoch":
|
90 |
-
"learning_rate":
|
91 |
-
"loss": 1.
|
92 |
"step": 7000
|
93 |
},
|
94 |
{
|
95 |
-
"epoch":
|
96 |
-
"learning_rate": 2.
|
97 |
-
"loss": 1.
|
98 |
"step": 7500
|
99 |
},
|
100 |
{
|
101 |
-
"epoch":
|
102 |
-
"learning_rate": 2.
|
103 |
-
"loss": 1.
|
104 |
"step": 8000
|
105 |
},
|
106 |
{
|
107 |
-
"epoch":
|
108 |
-
"learning_rate":
|
109 |
-
"loss": 1.
|
110 |
"step": 8500
|
111 |
},
|
112 |
{
|
113 |
-
"epoch":
|
114 |
-
"learning_rate":
|
115 |
-
"loss": 1.
|
116 |
"step": 9000
|
117 |
},
|
118 |
{
|
119 |
-
"epoch":
|
120 |
-
"learning_rate":
|
121 |
-
"loss": 1.
|
122 |
"step": 9500
|
123 |
},
|
124 |
{
|
125 |
-
"epoch":
|
126 |
-
"learning_rate":
|
127 |
-
"loss": 1.
|
128 |
"step": 10000
|
129 |
},
|
130 |
{
|
131 |
-
"epoch":
|
132 |
-
"learning_rate":
|
133 |
-
"loss": 1.
|
134 |
"step": 10500
|
135 |
},
|
136 |
{
|
137 |
-
"epoch":
|
138 |
-
"learning_rate":
|
139 |
-
"loss":
|
140 |
"step": 11000
|
141 |
},
|
142 |
{
|
143 |
-
"epoch":
|
144 |
-
"learning_rate":
|
145 |
-
"loss":
|
146 |
"step": 11500
|
147 |
},
|
148 |
{
|
149 |
-
"epoch":
|
150 |
-
"learning_rate":
|
151 |
-
"loss":
|
152 |
"step": 12000
|
153 |
},
|
154 |
{
|
155 |
-
"epoch":
|
156 |
-
"learning_rate":
|
157 |
-
"loss":
|
158 |
"step": 12500
|
159 |
},
|
160 |
{
|
161 |
-
"epoch":
|
162 |
-
"learning_rate":
|
163 |
-
"loss":
|
164 |
"step": 13000
|
165 |
},
|
166 |
{
|
167 |
-
"epoch":
|
168 |
-
"learning_rate":
|
169 |
-
"loss":
|
170 |
"step": 13500
|
171 |
},
|
172 |
{
|
173 |
-
"epoch":
|
174 |
-
"
|
175 |
-
"
|
176 |
-
"
|
177 |
-
|
178 |
-
|
179 |
-
"
|
180 |
-
"learning_rate": 1.0473230836331916e-05,
|
181 |
-
"loss": 1.0918,
|
182 |
-
"step": 14500
|
183 |
-
},
|
184 |
-
{
|
185 |
-
"epoch": 2.45,
|
186 |
-
"learning_rate": 9.110238796205431e-06,
|
187 |
-
"loss": 1.0878,
|
188 |
-
"step": 15000
|
189 |
-
},
|
190 |
-
{
|
191 |
-
"epoch": 2.54,
|
192 |
-
"learning_rate": 7.747246756078944e-06,
|
193 |
-
"loss": 1.0506,
|
194 |
-
"step": 15500
|
195 |
-
},
|
196 |
-
{
|
197 |
-
"epoch": 2.62,
|
198 |
-
"learning_rate": 6.384254715952459e-06,
|
199 |
-
"loss": 1.0557,
|
200 |
-
"step": 16000
|
201 |
-
},
|
202 |
-
{
|
203 |
-
"epoch": 2.7,
|
204 |
-
"learning_rate": 5.021262675825973e-06,
|
205 |
-
"loss": 1.0325,
|
206 |
-
"step": 16500
|
207 |
-
},
|
208 |
-
{
|
209 |
-
"epoch": 2.78,
|
210 |
-
"learning_rate": 3.658270635699488e-06,
|
211 |
-
"loss": 1.0784,
|
212 |
-
"step": 17000
|
213 |
-
},
|
214 |
-
{
|
215 |
-
"epoch": 2.86,
|
216 |
-
"learning_rate": 2.295278595573002e-06,
|
217 |
-
"loss": 1.0239,
|
218 |
-
"step": 17500
|
219 |
-
},
|
220 |
-
{
|
221 |
-
"epoch": 2.94,
|
222 |
-
"learning_rate": 9.322865554465163e-07,
|
223 |
-
"loss": 1.0211,
|
224 |
-
"step": 18000
|
225 |
-
},
|
226 |
-
{
|
227 |
-
"epoch": 3.0,
|
228 |
-
"step": 18342,
|
229 |
-
"total_flos": 2.148457555862323e+16,
|
230 |
-
"train_loss": 1.4955534830489579,
|
231 |
-
"train_runtime": 15709.331,
|
232 |
-
"train_samples_per_second": 9.34,
|
233 |
-
"train_steps_per_second": 1.168
|
234 |
}
|
235 |
],
|
236 |
-
"max_steps":
|
237 |
-
"num_train_epochs":
|
238 |
-
"total_flos":
|
239 |
"trial_name": null,
|
240 |
"trial_params": null
|
241 |
}
|
|
|
1 |
{
|
2 |
"best_metric": null,
|
3 |
"best_model_checkpoint": null,
|
4 |
+
"epoch": 4.0,
|
5 |
+
"global_step": 13600,
|
6 |
"is_hyper_param_search": false,
|
7 |
"is_local_process_zero": true,
|
8 |
"is_world_process_zero": true,
|
9 |
"log_history": [
|
10 |
{
|
11 |
+
"epoch": 0.15,
|
12 |
+
"learning_rate": 4.816176470588236e-05,
|
13 |
+
"loss": 2.6313,
|
14 |
"step": 500
|
15 |
},
|
16 |
{
|
17 |
+
"epoch": 0.29,
|
18 |
+
"learning_rate": 4.632352941176471e-05,
|
19 |
+
"loss": 2.2069,
|
20 |
"step": 1000
|
21 |
},
|
22 |
{
|
23 |
+
"epoch": 0.44,
|
24 |
+
"learning_rate": 4.448529411764706e-05,
|
25 |
+
"loss": 2.035,
|
26 |
"step": 1500
|
27 |
},
|
28 |
{
|
29 |
+
"epoch": 0.59,
|
30 |
+
"learning_rate": 4.2647058823529415e-05,
|
31 |
+
"loss": 1.9491,
|
32 |
"step": 2000
|
33 |
},
|
34 |
{
|
35 |
+
"epoch": 0.74,
|
36 |
+
"learning_rate": 4.08125e-05,
|
37 |
+
"loss": 1.8742,
|
38 |
"step": 2500
|
39 |
},
|
40 |
{
|
41 |
+
"epoch": 0.88,
|
42 |
+
"learning_rate": 3.897426470588236e-05,
|
43 |
+
"loss": 1.8387,
|
44 |
"step": 3000
|
45 |
},
|
46 |
{
|
47 |
+
"epoch": 1.03,
|
48 |
+
"learning_rate": 3.713602941176471e-05,
|
49 |
+
"loss": 1.6941,
|
50 |
"step": 3500
|
51 |
},
|
52 |
{
|
53 |
+
"epoch": 1.18,
|
54 |
+
"learning_rate": 3.529779411764706e-05,
|
55 |
+
"loss": 1.5224,
|
56 |
"step": 4000
|
57 |
},
|
58 |
{
|
59 |
+
"epoch": 1.32,
|
60 |
+
"learning_rate": 3.3459558823529415e-05,
|
61 |
+
"loss": 1.4897,
|
62 |
"step": 4500
|
63 |
},
|
64 |
{
|
65 |
+
"epoch": 1.47,
|
66 |
+
"learning_rate": 3.1621323529411765e-05,
|
67 |
+
"loss": 1.4445,
|
68 |
"step": 5000
|
69 |
},
|
70 |
{
|
71 |
+
"epoch": 1.62,
|
72 |
+
"learning_rate": 2.978308823529412e-05,
|
73 |
+
"loss": 1.4593,
|
74 |
"step": 5500
|
75 |
},
|
76 |
{
|
77 |
+
"epoch": 1.76,
|
78 |
+
"learning_rate": 2.7944852941176468e-05,
|
79 |
+
"loss": 1.4251,
|
80 |
"step": 6000
|
81 |
},
|
82 |
{
|
83 |
+
"epoch": 1.91,
|
84 |
+
"learning_rate": 2.6113970588235297e-05,
|
85 |
+
"loss": 1.39,
|
86 |
"step": 6500
|
87 |
},
|
88 |
{
|
89 |
+
"epoch": 2.06,
|
90 |
+
"learning_rate": 2.427573529411765e-05,
|
91 |
+
"loss": 1.2959,
|
92 |
"step": 7000
|
93 |
},
|
94 |
{
|
95 |
+
"epoch": 2.21,
|
96 |
+
"learning_rate": 2.24375e-05,
|
97 |
+
"loss": 1.1621,
|
98 |
"step": 7500
|
99 |
},
|
100 |
{
|
101 |
+
"epoch": 2.35,
|
102 |
+
"learning_rate": 2.0599264705882353e-05,
|
103 |
+
"loss": 1.1374,
|
104 |
"step": 8000
|
105 |
},
|
106 |
{
|
107 |
+
"epoch": 2.5,
|
108 |
+
"learning_rate": 1.876102941176471e-05,
|
109 |
+
"loss": 1.1649,
|
110 |
"step": 8500
|
111 |
},
|
112 |
{
|
113 |
+
"epoch": 2.65,
|
114 |
+
"learning_rate": 1.6926470588235294e-05,
|
115 |
+
"loss": 1.1513,
|
116 |
"step": 9000
|
117 |
},
|
118 |
{
|
119 |
+
"epoch": 2.79,
|
120 |
+
"learning_rate": 1.5088235294117647e-05,
|
121 |
+
"loss": 1.1463,
|
122 |
"step": 9500
|
123 |
},
|
124 |
{
|
125 |
+
"epoch": 2.94,
|
126 |
+
"learning_rate": 1.3250000000000002e-05,
|
127 |
+
"loss": 1.1466,
|
128 |
"step": 10000
|
129 |
},
|
130 |
{
|
131 |
+
"epoch": 3.09,
|
132 |
+
"learning_rate": 1.1411764705882353e-05,
|
133 |
+
"loss": 1.0411,
|
134 |
"step": 10500
|
135 |
},
|
136 |
{
|
137 |
+
"epoch": 3.24,
|
138 |
+
"learning_rate": 9.573529411764706e-06,
|
139 |
+
"loss": 0.9581,
|
140 |
"step": 11000
|
141 |
},
|
142 |
{
|
143 |
+
"epoch": 3.38,
|
144 |
+
"learning_rate": 7.735294117647058e-06,
|
145 |
+
"loss": 0.9514,
|
146 |
"step": 11500
|
147 |
},
|
148 |
{
|
149 |
+
"epoch": 3.53,
|
150 |
+
"learning_rate": 5.897058823529412e-06,
|
151 |
+
"loss": 0.9429,
|
152 |
"step": 12000
|
153 |
},
|
154 |
{
|
155 |
+
"epoch": 3.68,
|
156 |
+
"learning_rate": 4.058823529411765e-06,
|
157 |
+
"loss": 0.9676,
|
158 |
"step": 12500
|
159 |
},
|
160 |
{
|
161 |
+
"epoch": 3.82,
|
162 |
+
"learning_rate": 2.2205882352941175e-06,
|
163 |
+
"loss": 0.9324,
|
164 |
"step": 13000
|
165 |
},
|
166 |
{
|
167 |
+
"epoch": 3.97,
|
168 |
+
"learning_rate": 3.8235294117647064e-07,
|
169 |
+
"loss": 0.9555,
|
170 |
"step": 13500
|
171 |
},
|
172 |
{
|
173 |
+
"epoch": 4.0,
|
174 |
+
"step": 13600,
|
175 |
+
"total_flos": 3.918346910230118e+16,
|
176 |
+
"train_loss": 1.4005291703168083,
|
177 |
+
"train_runtime": 11994.4751,
|
178 |
+
"train_samples_per_second": 18.139,
|
179 |
+
"train_steps_per_second": 1.134
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
180 |
}
|
181 |
],
|
182 |
+
"max_steps": 13600,
|
183 |
+
"num_train_epochs": 4,
|
184 |
+
"total_flos": 3.918346910230118e+16,
|
185 |
"trial_name": null,
|
186 |
"trial_params": null
|
187 |
}
|
training_args.bin
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
-
size
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:d7e19c4b52c1665d4e24c8332861794cb0354d00704d62e085c5f3112b7d82d7
|
3 |
+
size 3579
|