|
--- |
|
datasets: |
|
- HuggingFaceFW/fineweb |
|
base_model: |
|
- openai-community/gpt2 |
|
--- |
|
|
|
# NanoGPT Speedrun |
|
|
|
Following https://github.com/KellerJordan/modded-nanogpt for fun (learning). |
|
|
|
## Run Info |
|
|
|
**baseline/** |
|
|
|
- Run on lightning cloud, using one L40S |
|
- Batch size set to 32 |
|
- VRAM usage: 26.95GB (25698MB reported in `nvidia-smi`) |
|
- 4 seconds per step, total 3200 steps |
|
- Checkpoint saved every 320 steps |
|
|
|
## Training loss |
|
|
|
To experimentally check the neural scaling law: |
|
|
|
![baseline/analysis/loss_plot2.png](baseline/analysis/loss_plot2.png) |
|
|
|
(Fitted line: `log y = -0.11 * log x + 0.9` where x is step (0 to 3200) and y is the training loss) |
|
|
|
## Demo |
|
|
|
Available at https://huggingface.co/spaces/lemonteaa/nanogpt-speedrun-demo |
|
|
|
(WIP) |
|
|