KempnerInstituteAI
/

loss-to-loss

Text Generation

Model card Files Files and versions Community

davidbrandfonbrener commited on Nov 7, 2024

Commit

9631c72

·

verified ·

1 Parent(s): 34cfc77

add model card

Files changed (1) hide show

README.md +43 -0

README.md ADDED Viewed

	@@ -0,0 +1,43 @@

+# Model description
+This repo contains over 500 model checkpoints ranging in size from 20M parameters up to 3.3B parameters and FLOP budgets from 2e17 to 1e21 FLOPs across 6 different pretraining datasets.
+Each subdirectory name contains four different parameters to identify the model in that subdirectory:
+- Dataset: one of `fineweb-100b`, `fineweb-edu-100b`, `proof-pile-2`, `slimpajama-chunk1`, `smollm-corpus`, or `starcoder`
+- N: the number of model parameters
+- D: the number of training tokens
+- C: the number of training FLOPs
+For example, a model trained on `starcoder` with 1.1e08 parameters on 3.0e08 tokens for a total of 2.0e17 FLOPs would have the name: `L2L_starcoder_N1.1e08_D3.0e08_C2.0e17/`
+Full training details for the models can be found in the training repo or paper.
+# How to load a model
+First, follow the instruction to install our fork of the [OLMo](https://github.com/allenai/OLMo) package from here: https://github.com/KempnerInstitute/loss-to-loss-olmo/tree/main
+With this installed, you can then use the huggingface hub and transformers to load a model with the following snippet:
+```python
+from olmo.model import HFMixinOLMo
+from huggingface_hub import snapshot_download
+tmp_dir = "tmp"
+model_name = "L2L_starcoder_N1.1e08_D3.0e08_C2.0e17"
+snapshot_download(
+    repo_id="KempnerInstituteAI/loss-to-loss",
+    allow_patterns=f"{model_name}/*",
+    local_dir=tmp_dir,
+)
+model = HFMixinOLMo.from_pretrained(f"{tmp_dir}/{model_name}")
+```
+# Citation
+If you use these models in your research, please cite this paper:
+```bibtex
+TODO