DeathReaper0965 commited on
Commit
5028a7b
·
1 Parent(s): a90bd52

Typos update in README.md

Browse files

Adjusted some minor typos in the README file.

Files changed (1) hide show
  1. README.md +3 -3
README.md CHANGED
@@ -80,7 +80,7 @@ license: apache-2.0
80
  Flan-UL2 is an encoder decoder model based on the `T5` architecture. It uses the same configuration as the [`UL2 model`](https://huggingface.co/google/ul2) released earlier last year. It was fine tuned using the "Flan" prompt tuning
81
  and dataset collection.
82
 
83
- According ot the original [blog](https://www.yitay.net/blog/flan-ul2-20b) here are the notable improvements:
84
  - The original UL2 model was only trained with receptive field of 512, which made it non-ideal for N-shot prompting where N is large.
85
  - The Flan-UL2 checkpoint uses a receptive field of 2048 which makes it more usable for few-shot in-context learning.
86
  - The original UL2 model also had mode switch tokens that was rather mandatory to get good performance. However, they were a little cumbersome as this requires often some changes during inference or finetuning. In this update/change, we continue training UL2 20B for an additional 100k steps (with small batch) to forget “mode tokens” before applying Flan instruction tuning. This Flan-UL2 checkpoint does not require mode tokens anymore.
@@ -89,7 +89,7 @@ According ot the original [blog](https://www.yitay.net/blog/flan-ul2-20b) here a
89
 
90
  ## Converting from T5x to huggingface
91
 
92
- You can use the [`convert_t5x_checkpoint_to_pytorch.py`](https://github.com/huggingface/transformers/blob/main/src/transformers/models/t5/convert_t5x_checkpoint_to_pytorch.py) script and pass the argument `strict = False`. The final layer norm is missing from the original dictionnary, that is why we are passing the `stric=False` argument.
93
  ```bash
94
  python convert_t5x_checkpoint_to_pytorch.py --t5x_checkpoint_path PATH_TO_T5X_CHECKPOINTS --config_file PATH_TO_CONFIG --pytorch_dump_path PATH_TO_SAVE
95
  ```
@@ -181,7 +181,7 @@ with a batch size of 1024. The sequence length is set to 512/512 for inputs and
181
  Dropout is set to 0 during pretraining. Pre-training took slightly more than one month for about 1 trillion
182
  tokens. The model has 32 encoder layers and 32 decoder layers, `dmodel` of 4096 and `df` of 16384.
183
  The dimension of each head is 256 for a total of 16 heads. Our model uses a model parallelism of 8.
184
- The same same sentencepiece tokenizer as T5 of vocab size 32000 is used (click [here](https://huggingface.co/docs/transformers/v4.20.0/en/model_doc/t5#transformers.T5Tokenizer) for more information about the T5 tokenizer).
185
 
186
  UL-20B can be interpreted as a model that is quite similar to T5 but trained with a different objective and slightly different scaling knobs.
187
  UL-20B was trained using the [Jax](https://github.com/google/jax) and [T5X](https://github.com/google-research/t5x) infrastructure.
 
80
  Flan-UL2 is an encoder decoder model based on the `T5` architecture. It uses the same configuration as the [`UL2 model`](https://huggingface.co/google/ul2) released earlier last year. It was fine tuned using the "Flan" prompt tuning
81
  and dataset collection.
82
 
83
+ According to the original [blog](https://www.yitay.net/blog/flan-ul2-20b) here are the notable improvements:
84
  - The original UL2 model was only trained with receptive field of 512, which made it non-ideal for N-shot prompting where N is large.
85
  - The Flan-UL2 checkpoint uses a receptive field of 2048 which makes it more usable for few-shot in-context learning.
86
  - The original UL2 model also had mode switch tokens that was rather mandatory to get good performance. However, they were a little cumbersome as this requires often some changes during inference or finetuning. In this update/change, we continue training UL2 20B for an additional 100k steps (with small batch) to forget “mode tokens” before applying Flan instruction tuning. This Flan-UL2 checkpoint does not require mode tokens anymore.
 
89
 
90
  ## Converting from T5x to huggingface
91
 
92
+ You can use the [`convert_t5x_checkpoint_to_pytorch.py`](https://github.com/huggingface/transformers/blob/main/src/transformers/models/t5/convert_t5x_checkpoint_to_pytorch.py) script and pass the argument `strict = False`. The final layer norm is missing from the original dictionnary, that is why we are passing the `strict = False` argument.
93
  ```bash
94
  python convert_t5x_checkpoint_to_pytorch.py --t5x_checkpoint_path PATH_TO_T5X_CHECKPOINTS --config_file PATH_TO_CONFIG --pytorch_dump_path PATH_TO_SAVE
95
  ```
 
181
  Dropout is set to 0 during pretraining. Pre-training took slightly more than one month for about 1 trillion
182
  tokens. The model has 32 encoder layers and 32 decoder layers, `dmodel` of 4096 and `df` of 16384.
183
  The dimension of each head is 256 for a total of 16 heads. Our model uses a model parallelism of 8.
184
+ The same sentencepiece tokenizer as T5 of vocab size 32000 is used (click [here](https://huggingface.co/docs/transformers/v4.20.0/en/model_doc/t5#transformers.T5Tokenizer) for more information about the T5 tokenizer).
185
 
186
  UL-20B can be interpreted as a model that is quite similar to T5 but trained with a different objective and slightly different scaling knobs.
187
  UL-20B was trained using the [Jax](https://github.com/google/jax) and [T5X](https://github.com/google-research/t5x) infrastructure.