lgrobol commited on
Commit
d845ecf
·
1 Parent(s): 6871613

doc updates

Browse files
README.md CHANGED
@@ -66,54 +66,60 @@ The training hyperparameters are those suggested by Adelani et al. (2022) in the
66
  release](https://github.com/masakhane-io/lafand-mt), which gave their best results for machine
67
  translation of several African languages.
68
 
69
- More specifically, we use the [example training
70
- script](https://github.com/huggingface/transformers/blob/06886d5a684228a695b29645993b3be55190bd9c/examples/pytorch/translation/run_translation.py)
71
- provided by 🤗 Transformers for fine-tuning mBART with the following command
72
 
73
  ```bash
74
- python run_translation.py \
75
- --model_name_or_path facebook/m2m100_418M \
76
- --do_train \
77
- --train_file {path_to_training_data} \
78
- --source_lang br \
79
- --target_lang fr \
80
- --output_dir {path_to_model}\
81
- --per_device_train_batch_size=8 \
82
- --overwrite_output_dir \
83
- --forced_bos_token fr \
84
- --save_steps 4096 \
85
- --fp16 \
86
- --num_train_epochs 4
87
  ```
88
 
89
  ### Training hyperparameters
90
 
91
  The following hyperparameters were used during training:
92
 
93
- - `learning_rate`: 5e-05
94
- - `train_batch_size`: 8
95
- - `eval_batch_size`: 8
96
- - `seed`: 42
97
- - `optimizer`: Adam with betas=(0.9,0.999) and epsilon=1e-08
98
- - `lr_scheduler_type`: linear
99
- - `num_epochs`: 4.0
 
 
 
 
 
 
 
 
 
 
100
 
101
  ### Framework versions
102
 
103
- - Transformers 4.24.0
104
- - Pytorch 1.13.0
105
- - Datasets 2.6.1
106
- - Tokenizers 0.13.1
 
 
107
 
108
  ### Carbon emissions
109
 
110
  At this time, we estimate emissions of a rough 300 gCO<sub>2</sub> per fine-tuning run. So far, we
111
  account for
112
 
113
- - Fine-tuning the 2 released versions
114
- - 5 development runs
115
 
116
- So far, the equivalent carbon emissions for this model are approximately 2100 gCO<sub>2</sub>.
117
 
118
  ## References
119
 
 
66
  release](https://github.com/masakhane-io/lafand-mt), which gave their best results for machine
67
  translation of several African languages.
68
 
69
+ More specifically, we train this model with [zeldarose](https://github.com/LoicGrobol/zeldarose) with the following parameters
 
 
70
 
71
  ```bash
72
+ zeldarose transformer \
73
+ --config train_config.toml \
74
+ --tokenizer "facebook/m2m100_418M" --pretrained-model "facebook/m2m100_418M" \
75
+ --out-dir m2m100_418M+br-fr --model-name m2m100_418M+br-fr \
76
+ --strategy ddp --accelerator gpu --num-devices 4 --device-batch-size 2 --num-workers 8\
77
+ --max-epochs 16 --precision 16 --tf32-mode medium \
78
+ --val-data {val_path}.jsonl \
79
+ {train_path}.jsonl
80
+
 
 
 
 
81
  ```
82
 
83
  ### Training hyperparameters
84
 
85
  The following hyperparameters were used during training:
86
 
87
+ ```toml
88
+ [task]
89
+ change_ratio = 0.3
90
+ denoise_langs = []
91
+ poisson_lambda = 3.0
92
+ source_langs = ["br"]
93
+ target_langs = ["fr"]
94
+
95
+ [tuning]
96
+ batch_size = 16
97
+ betas = [0.9, 0.999]
98
+ epsilon = 1e-8
99
+ learning_rate = 5e-5
100
+ gradient_clipping = 1.0
101
+ lr_decay_steps = -1
102
+ warmup_steps = 1024
103
+ ```
104
 
105
  ### Framework versions
106
 
107
+ - Transformers 4.26.1
108
+ - Pytorch 1.12.1
109
+ - Datasets 2.10.0
110
+ - Tokenizers 0.13.2
111
+ - Pytorch-lightning 1.9.3
112
+ - Zeldarose [c6456ead](https://github.com/LoicGrobol/spertiniite/commit/c6456ead3649c4e6ddfb4a5a74b40f344eded09f)
113
 
114
  ### Carbon emissions
115
 
116
  At this time, we estimate emissions of a rough 300 gCO<sub>2</sub> per fine-tuning run. So far, we
117
  account for
118
 
119
+ - Fine-tuning the 3 released versions
120
+ - 8 development runs
121
 
122
+ So far, the equivalent carbon emissions for this model are approximately 3300 gCO<sub>2</sub>.
123
 
124
  ## References
125
 
added_tokens.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ {
2
+ "<mask>": 128112
3
+ }
all_results.json DELETED
@@ -1,8 +0,0 @@
1
- {
2
- "epoch": 4.0,
3
- "train_loss": 1.4005291703168083,
4
- "train_runtime": 11994.4751,
5
- "train_samples": 54393,
6
- "train_samples_per_second": 18.139,
7
- "train_steps_per_second": 1.134
8
- }
 
 
 
 
 
 
 
 
 
generation_config.json ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": 0,
4
+ "decoder_start_token_id": 2,
5
+ "early_stopping": true,
6
+ "eos_token_id": 2,
7
+ "max_length": 200,
8
+ "num_beams": 5,
9
+ "pad_token_id": 1,
10
+ "transformers_version": "4.26.1"
11
+ }
train_config.toml ADDED
@@ -0,0 +1,19 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ type = "mbart"
2
+
3
+ [task]
4
+ change_ratio = 0.3
5
+ denoise_langs = []
6
+ poisson_lambda = 3.0
7
+ source_langs = ["br"]
8
+ target_langs = ["fr"]
9
+
10
+ [tuning]
11
+ batch_size = 16
12
+ betas = [0.9, 0.999]
13
+ epsilon = 1e-8
14
+ learning_rate = 5e-5
15
+ gradient_clipping = 1.0
16
+ # Uncomment these for a more complex training setup
17
+ lr_decay_steps = -1
18
+ warmup_steps = 1024
19
+ # weight_decay = 1e-5
train_results.json DELETED
@@ -1,8 +0,0 @@
1
- {
2
- "epoch": 4.0,
3
- "train_loss": 1.4005291703168083,
4
- "train_runtime": 11994.4751,
5
- "train_samples": 54393,
6
- "train_samples_per_second": 18.139,
7
- "train_steps_per_second": 1.134
8
- }
 
 
 
 
 
 
 
 
 
trainer_state.json DELETED
@@ -1,187 +0,0 @@
1
- {
2
- "best_metric": null,
3
- "best_model_checkpoint": null,
4
- "epoch": 4.0,
5
- "global_step": 13600,
6
- "is_hyper_param_search": false,
7
- "is_local_process_zero": true,
8
- "is_world_process_zero": true,
9
- "log_history": [
10
- {
11
- "epoch": 0.15,
12
- "learning_rate": 4.816176470588236e-05,
13
- "loss": 2.6313,
14
- "step": 500
15
- },
16
- {
17
- "epoch": 0.29,
18
- "learning_rate": 4.632352941176471e-05,
19
- "loss": 2.2069,
20
- "step": 1000
21
- },
22
- {
23
- "epoch": 0.44,
24
- "learning_rate": 4.448529411764706e-05,
25
- "loss": 2.035,
26
- "step": 1500
27
- },
28
- {
29
- "epoch": 0.59,
30
- "learning_rate": 4.2647058823529415e-05,
31
- "loss": 1.9491,
32
- "step": 2000
33
- },
34
- {
35
- "epoch": 0.74,
36
- "learning_rate": 4.08125e-05,
37
- "loss": 1.8742,
38
- "step": 2500
39
- },
40
- {
41
- "epoch": 0.88,
42
- "learning_rate": 3.897426470588236e-05,
43
- "loss": 1.8387,
44
- "step": 3000
45
- },
46
- {
47
- "epoch": 1.03,
48
- "learning_rate": 3.713602941176471e-05,
49
- "loss": 1.6941,
50
- "step": 3500
51
- },
52
- {
53
- "epoch": 1.18,
54
- "learning_rate": 3.529779411764706e-05,
55
- "loss": 1.5224,
56
- "step": 4000
57
- },
58
- {
59
- "epoch": 1.32,
60
- "learning_rate": 3.3459558823529415e-05,
61
- "loss": 1.4897,
62
- "step": 4500
63
- },
64
- {
65
- "epoch": 1.47,
66
- "learning_rate": 3.1621323529411765e-05,
67
- "loss": 1.4445,
68
- "step": 5000
69
- },
70
- {
71
- "epoch": 1.62,
72
- "learning_rate": 2.978308823529412e-05,
73
- "loss": 1.4593,
74
- "step": 5500
75
- },
76
- {
77
- "epoch": 1.76,
78
- "learning_rate": 2.7944852941176468e-05,
79
- "loss": 1.4251,
80
- "step": 6000
81
- },
82
- {
83
- "epoch": 1.91,
84
- "learning_rate": 2.6113970588235297e-05,
85
- "loss": 1.39,
86
- "step": 6500
87
- },
88
- {
89
- "epoch": 2.06,
90
- "learning_rate": 2.427573529411765e-05,
91
- "loss": 1.2959,
92
- "step": 7000
93
- },
94
- {
95
- "epoch": 2.21,
96
- "learning_rate": 2.24375e-05,
97
- "loss": 1.1621,
98
- "step": 7500
99
- },
100
- {
101
- "epoch": 2.35,
102
- "learning_rate": 2.0599264705882353e-05,
103
- "loss": 1.1374,
104
- "step": 8000
105
- },
106
- {
107
- "epoch": 2.5,
108
- "learning_rate": 1.876102941176471e-05,
109
- "loss": 1.1649,
110
- "step": 8500
111
- },
112
- {
113
- "epoch": 2.65,
114
- "learning_rate": 1.6926470588235294e-05,
115
- "loss": 1.1513,
116
- "step": 9000
117
- },
118
- {
119
- "epoch": 2.79,
120
- "learning_rate": 1.5088235294117647e-05,
121
- "loss": 1.1463,
122
- "step": 9500
123
- },
124
- {
125
- "epoch": 2.94,
126
- "learning_rate": 1.3250000000000002e-05,
127
- "loss": 1.1466,
128
- "step": 10000
129
- },
130
- {
131
- "epoch": 3.09,
132
- "learning_rate": 1.1411764705882353e-05,
133
- "loss": 1.0411,
134
- "step": 10500
135
- },
136
- {
137
- "epoch": 3.24,
138
- "learning_rate": 9.573529411764706e-06,
139
- "loss": 0.9581,
140
- "step": 11000
141
- },
142
- {
143
- "epoch": 3.38,
144
- "learning_rate": 7.735294117647058e-06,
145
- "loss": 0.9514,
146
- "step": 11500
147
- },
148
- {
149
- "epoch": 3.53,
150
- "learning_rate": 5.897058823529412e-06,
151
- "loss": 0.9429,
152
- "step": 12000
153
- },
154
- {
155
- "epoch": 3.68,
156
- "learning_rate": 4.058823529411765e-06,
157
- "loss": 0.9676,
158
- "step": 12500
159
- },
160
- {
161
- "epoch": 3.82,
162
- "learning_rate": 2.2205882352941175e-06,
163
- "loss": 0.9324,
164
- "step": 13000
165
- },
166
- {
167
- "epoch": 3.97,
168
- "learning_rate": 3.8235294117647064e-07,
169
- "loss": 0.9555,
170
- "step": 13500
171
- },
172
- {
173
- "epoch": 4.0,
174
- "step": 13600,
175
- "total_flos": 3.918346910230118e+16,
176
- "train_loss": 1.4005291703168083,
177
- "train_runtime": 11994.4751,
178
- "train_samples_per_second": 18.139,
179
- "train_steps_per_second": 1.134
180
- }
181
- ],
182
- "max_steps": 13600,
183
- "num_train_epochs": 4,
184
- "total_flos": 3.918346910230118e+16,
185
- "trial_name": null,
186
- "trial_params": null
187
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
training_args.bin DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:d7e19c4b52c1665d4e24c8332861794cb0354d00704d62e085c5f3112b7d82d7
3
- size 3579