English
text generation
File size: 370 Bytes
20eb232
 
 
 
 
 
 
 
 
 
 
5422e45
20eb232
5422e45
 
 
 
 
 
2fc4b4e
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
---
license: mit
language:
- en
tags:
- text generation
datasets:
- fhswf/TinyStoriesV2_cleaned
---

# BPE_GPT2_TinyStoriesV2_cleaned
BPE Tokenizer Model for dataset 'fhswf/TinyStoriesV2_cleaned'

Based on get-neo BPE Tokenizer, but with a smaller vocabulary. 
Trained with TinyStoriesV2.

- Vocab Size: 1024 
- 256 Base chars
- 1 extra Token: <|endoftext|>
- 767 merges