Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,17 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: apache-2.0
|
3 |
+
datasets:
|
4 |
+
- allenai/dolma
|
5 |
+
---
|
6 |
+
# Training run to compare Mixture-of-Depths, Bitnet
|
7 |
+
[Wandb Report](https://api.wandb.ai/links/tulasiram/pw76q41i)
|
8 |
+
|
9 |
+
![image/png"](https://cdn-uploads.huggingface.co/production/uploads/6382255fcae34727b9cc149e/-ovvzj0ZvzuArH0cdOz8b.png)
|
10 |
+
|
11 |
+
#### 4 Models trained for 100k steps on Dolma
|
12 |
+
- OLMo-50M - 50M parameter model
|
13 |
+
- OLMo-50M-bitlinear - 50M parameter bitnet model
|
14 |
+
- OLMo-50M-mod - 50M parameter mixture-of-depths model
|
15 |
+
- OLMo-50M-mod-bitlinear - 50M parameter mixture-of-depths bitnet model
|
16 |
+
|
17 |
+
Repo has zip files which include training states and other files for each model. I am not the author of the mixture-of-depths implementation, it can be found [here](https://github.com/thepowerfuldeez/OLMo)
|