File size: 311 Bytes
ced26b2 671058b ced26b2 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
---
license: mit
language:
- en
tags:
- text generation
datasets:
- fhswf/TinyStoriesV2_cleaned
---
BPE Tokenizer for TinyStoriesV2
---
Based on get-neo BPE Tokenizer, but with a smaller vocabulary.
Trained with TinyStoriesV2.
- Vocab Size: 2048
- 256 Base chars
- 1 extra Token: <|endoftext|>
- 1791 merges |