davidbrandfonbrener
commited on
add model card
Browse files
README.md
ADDED
@@ -0,0 +1,43 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Model description
|
2 |
+
|
3 |
+
This repo contains over 500 model checkpoints ranging in size from 20M parameters up to 3.3B parameters and FLOP budgets from 2e17 to 1e21 FLOPs across 6 different pretraining datasets.
|
4 |
+
|
5 |
+
Each subdirectory name contains four different parameters to identify the model in that subdirectory:
|
6 |
+
|
7 |
+
- Dataset: one of `fineweb-100b`, `fineweb-edu-100b`, `proof-pile-2`, `slimpajama-chunk1`, `smollm-corpus`, or `starcoder`
|
8 |
+
- N: the number of model parameters
|
9 |
+
- D: the number of training tokens
|
10 |
+
- C: the number of training FLOPs
|
11 |
+
|
12 |
+
For example, a model trained on `starcoder` with 1.1e08 parameters on 3.0e08 tokens for a total of 2.0e17 FLOPs would have the name: `L2L_starcoder_N1.1e08_D3.0e08_C2.0e17/`
|
13 |
+
|
14 |
+
Full training details for the models can be found in the training repo or paper.
|
15 |
+
|
16 |
+
# How to load a model
|
17 |
+
|
18 |
+
First, follow the instruction to install our fork of the [OLMo](https://github.com/allenai/OLMo) package from here: https://github.com/KempnerInstitute/loss-to-loss-olmo/tree/main
|
19 |
+
|
20 |
+
With this installed, you can then use the huggingface hub and transformers to load a model with the following snippet:
|
21 |
+
```python
|
22 |
+
from olmo.model import HFMixinOLMo
|
23 |
+
from huggingface_hub import snapshot_download
|
24 |
+
|
25 |
+
tmp_dir = "tmp"
|
26 |
+
model_name = "L2L_starcoder_N1.1e08_D3.0e08_C2.0e17"
|
27 |
+
|
28 |
+
snapshot_download(
|
29 |
+
repo_id="KempnerInstituteAI/loss-to-loss",
|
30 |
+
allow_patterns=f"{model_name}/*",
|
31 |
+
local_dir=tmp_dir,
|
32 |
+
)
|
33 |
+
|
34 |
+
model = HFMixinOLMo.from_pretrained(f"{tmp_dir}/{model_name}")
|
35 |
+
```
|
36 |
+
|
37 |
+
|
38 |
+
# Citation
|
39 |
+
|
40 |
+
If you use these models in your research, please cite this paper:
|
41 |
+
|
42 |
+
```bibtex
|
43 |
+
TODO
|