teragron commited on
Commit
58fef9c
·
1 Parent(s): 651da3a

Delete doc/stories260K.md

Browse files
Files changed (1) hide show
  1. doc/stories260K.md +0 -58
doc/stories260K.md DELETED
@@ -1,58 +0,0 @@
1
- # stories260K
2
-
3
- [Stories260K huggginface link](https://huggingface.co/karpathy/tinyllamas)
4
-
5
- The 260K model is a tiny model used for testing, and was trained as follows:
6
-
7
- ```
8
- python train.py \
9
- --out_dir="outmini" \
10
- --batch_size=128 \
11
- --max_seq_len=512 \
12
- --gradient_accumulation_steps=1 \
13
- --vocab_source="custom" \
14
- --vocab_size=512 \
15
- --dim=64 \
16
- --n_layers=5 \
17
- --n_heads=8 \
18
- --n_kv_heads=4 \
19
- --multiple_of=4 \
20
- --learning_rate=1e-3 \
21
- --dropout=0.05 \
22
- --weight_decay=0.01 \
23
- --max_iters=100000 \
24
- --beta2=0.99 \
25
- --warmup_iters=1000 \
26
- --eval_interval=2000 \
27
- --eval_iters=100 \
28
- --compile=True
29
- ```
30
-
31
- You'll notice that `n_kv_heads` is 4 while `n_heads` is 8, so two heads at a time share their key,value projections, i.e. this model is 2X multiquery. You'll also notice that we're using a custom tokenizer with 512 tokens. The model trained for ~10 minutes (?) on my A100 and achieves validation loss of 1.2968.
32
-
33
- Sampling this model at temperature 0.0 (i.e. deterministic greedy argmax sampling) gives:
34
-
35
- ```
36
- $ ./run stories260K/stories260K.bin -z stories260K/tok512.bin -t 0.0
37
- Once upon a time, there was a little girl named Lily. She loved to play outside in the park. One day, she saw a big, red ball. She wanted to play with it, but it was too high.
38
- Lily's mom said, "Lily, let's go to the park." Lily was sad and didn't know what to do. She said, "I want to play with your ball, but I can't find it."
39
- Lily was sad and didn't know what to do. She said, "I'm sorry, Lily. I didn't know what to do."
40
- Lily didn't want to help her mom, so she said, "I'm sorry, mom. I didn't know what to do." Her mom said, "Don't worry, Lily. We can help you.
41
- ```
42
-
43
- You can reproduce the same in Python by running `sample.py`:
44
-
45
- ```
46
- $ python sample.py --checkpoint=stories260K/stories260K.pt --tokenizer=stories260K/tok512.model --temperature=0.0 --max_new_tokens=257
47
- ```
48
-
49
- I hardcoded max tokens to be 257 manually because the `sample.py` script doesn't currently terminate on the special BOS token like the run.c script does. Sampling at 1.0 with topp of 0.9 gives a bit more reasonable samples:
50
-
51
- ```
52
- $ ./run stories260K/stories260K.bin -z stories260K/tok512.bin -t 1.0 -p 0.9 -s 133742
53
- Once upon a time, there was a little boy named Timmy. Timmy loved to play with his toys and eat sandwiches. One day, Timmy's mom told him it was time to rest for a while. Timmy's friend Billy came over and took him a down.
54
- Timmy's mom saw that Timmy was sad, but Timmy said, "I didn't understand what is it! We need to find some leafs." Timmy thought about it and took a deep breath on a spoon. He hoped it was important to be kind and continued to find its image next time.
55
- After they finished getting, Timmy's dad came up to his house and promised to help Timmy.
56
- ```
57
-
58
- Hey you can't expect too much from a 260K parameter model. I'm even mildly shocked we get this far :D