Commit
·
c3c064b
1
Parent(s):
a90bd52
Typos update in README.md (#14)
Browse files- Typos update in README.md (5028a7bbb3dee9b3bb68e367631600d1dd9e91d3)
Co-authored-by: Praneet Pabolu <[email protected]>
README.md
CHANGED
@@ -80,7 +80,7 @@ license: apache-2.0
|
|
80 |
Flan-UL2 is an encoder decoder model based on the `T5` architecture. It uses the same configuration as the [`UL2 model`](https://huggingface.co/google/ul2) released earlier last year. It was fine tuned using the "Flan" prompt tuning
|
81 |
and dataset collection.
|
82 |
|
83 |
-
According
|
84 |
- The original UL2 model was only trained with receptive field of 512, which made it non-ideal for N-shot prompting where N is large.
|
85 |
- The Flan-UL2 checkpoint uses a receptive field of 2048 which makes it more usable for few-shot in-context learning.
|
86 |
- The original UL2 model also had mode switch tokens that was rather mandatory to get good performance. However, they were a little cumbersome as this requires often some changes during inference or finetuning. In this update/change, we continue training UL2 20B for an additional 100k steps (with small batch) to forget “mode tokens” before applying Flan instruction tuning. This Flan-UL2 checkpoint does not require mode tokens anymore.
|
@@ -89,7 +89,7 @@ According ot the original [blog](https://www.yitay.net/blog/flan-ul2-20b) here a
|
|
89 |
|
90 |
## Converting from T5x to huggingface
|
91 |
|
92 |
-
You can use the [`convert_t5x_checkpoint_to_pytorch.py`](https://github.com/huggingface/transformers/blob/main/src/transformers/models/t5/convert_t5x_checkpoint_to_pytorch.py) script and pass the argument `strict = False`. The final layer norm is missing from the original dictionnary, that is why we are passing the `
|
93 |
```bash
|
94 |
python convert_t5x_checkpoint_to_pytorch.py --t5x_checkpoint_path PATH_TO_T5X_CHECKPOINTS --config_file PATH_TO_CONFIG --pytorch_dump_path PATH_TO_SAVE
|
95 |
```
|
@@ -181,7 +181,7 @@ with a batch size of 1024. The sequence length is set to 512/512 for inputs and
|
|
181 |
Dropout is set to 0 during pretraining. Pre-training took slightly more than one month for about 1 trillion
|
182 |
tokens. The model has 32 encoder layers and 32 decoder layers, `dmodel` of 4096 and `df` of 16384.
|
183 |
The dimension of each head is 256 for a total of 16 heads. Our model uses a model parallelism of 8.
|
184 |
-
The same
|
185 |
|
186 |
UL-20B can be interpreted as a model that is quite similar to T5 but trained with a different objective and slightly different scaling knobs.
|
187 |
UL-20B was trained using the [Jax](https://github.com/google/jax) and [T5X](https://github.com/google-research/t5x) infrastructure.
|
|
|
80 |
Flan-UL2 is an encoder decoder model based on the `T5` architecture. It uses the same configuration as the [`UL2 model`](https://huggingface.co/google/ul2) released earlier last year. It was fine tuned using the "Flan" prompt tuning
|
81 |
and dataset collection.
|
82 |
|
83 |
+
According to the original [blog](https://www.yitay.net/blog/flan-ul2-20b) here are the notable improvements:
|
84 |
- The original UL2 model was only trained with receptive field of 512, which made it non-ideal for N-shot prompting where N is large.
|
85 |
- The Flan-UL2 checkpoint uses a receptive field of 2048 which makes it more usable for few-shot in-context learning.
|
86 |
- The original UL2 model also had mode switch tokens that was rather mandatory to get good performance. However, they were a little cumbersome as this requires often some changes during inference or finetuning. In this update/change, we continue training UL2 20B for an additional 100k steps (with small batch) to forget “mode tokens” before applying Flan instruction tuning. This Flan-UL2 checkpoint does not require mode tokens anymore.
|
|
|
89 |
|
90 |
## Converting from T5x to huggingface
|
91 |
|
92 |
+
You can use the [`convert_t5x_checkpoint_to_pytorch.py`](https://github.com/huggingface/transformers/blob/main/src/transformers/models/t5/convert_t5x_checkpoint_to_pytorch.py) script and pass the argument `strict = False`. The final layer norm is missing from the original dictionnary, that is why we are passing the `strict = False` argument.
|
93 |
```bash
|
94 |
python convert_t5x_checkpoint_to_pytorch.py --t5x_checkpoint_path PATH_TO_T5X_CHECKPOINTS --config_file PATH_TO_CONFIG --pytorch_dump_path PATH_TO_SAVE
|
95 |
```
|
|
|
181 |
Dropout is set to 0 during pretraining. Pre-training took slightly more than one month for about 1 trillion
|
182 |
tokens. The model has 32 encoder layers and 32 decoder layers, `dmodel` of 4096 and `df` of 16384.
|
183 |
The dimension of each head is 256 for a total of 16 heads. Our model uses a model parallelism of 8.
|
184 |
+
The same sentencepiece tokenizer as T5 of vocab size 32000 is used (click [here](https://huggingface.co/docs/transformers/v4.20.0/en/model_doc/t5#transformers.T5Tokenizer) for more information about the T5 tokenizer).
|
185 |
|
186 |
UL-20B can be interpreted as a model that is quite similar to T5 but trained with a different objective and slightly different scaling knobs.
|
187 |
UL-20B was trained using the [Jax](https://github.com/google/jax) and [T5X](https://github.com/google-research/t5x) infrastructure.
|