license: mit | |
language: | |
- en | |
tags: | |
- text generation | |
datasets: | |
- fhswf/TinyStoriesV2_cleaned | |
# BPE_GPT2_TinyStoriesV2_cleaned | |
BPE Tokenizer Model for dataset 'fhswf/TinyStoriesV2_cleaned' | |
Based on get-neo BPE Tokenizer, but with a smaller vocabulary. | |
Trained with TinyStoriesV2. | |
- Vocab Size: 1024 | |
- 256 Base chars | |
- 1 extra Token: <|endoftext|> | |
- 767 merges |