karpathy
/

gpt2_1558M_final4_hf

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

gpt2_1558M_final4_hf / README.md

karpathy's picture

Update README.md

2a1c4bc verified 6 months ago

|

history blame contribute delete

744 Bytes

	---
	library_name: transformers
	tags: []
	---

	This is a GPT-2 model trained in llm.c for 330K steps (of 1M batch size) on FineWeb-EDU.

	A lot more detailed information is here: https://github.com/karpathy/llm.c/discussions/677 .

	This model has a bit of a complicated history. I wanted to train it for 400K steps, i.e. (`-x 400000`), but it became unstable later in training and exploded around step 330K. Because I was losing my computing quota shortly, I decided to just rewind back to checkpoint 300K, and then instead of going all the way to 400K I started annealing linearly down to 330K. This went without incident and produced this model.

	This is the longest I've trained a GPT-2 model for, and it reaches HellaSwag of 62.7 by the end.