--- license: mit language: - en tags: - text generation datasets: - fhswf/TinyStoriesV2_cleaned --- # BPE_GPT2_TinyStoriesV2_cleaned BPE Tokenizer Model for dataset 'fhswf/TinyStoriesV2_cleaned' Based on get-neo BPE Tokenizer, but with a smaller vocabulary. Trained with TinyStoriesV2. - Vocab Size: 1024 - 256 Base chars - 1 extra Token: <|endoftext|> - 767 merges