English
text generation
maxkm's picture
Update README.md
2fc4b4e verified
---
license: mit
language:
- en
tags:
- text generation
datasets:
- fhswf/TinyStoriesV2_cleaned
---
# BPE_GPT2_TinyStoriesV2_cleaned
BPE Tokenizer Model for dataset 'fhswf/TinyStoriesV2_cleaned'
Based on get-neo BPE Tokenizer, but with a smaller vocabulary.
Trained with TinyStoriesV2.
- Vocab Size: 1024
- 256 Base chars
- 1 extra Token: <|endoftext|>
- 767 merges