Update README.md
Browse files
README.md
CHANGED
@@ -38,7 +38,7 @@ widget:
|
|
38 |
- text: In the context of computer programming, an algorithm is
|
39 |
example_title: Algorithm Definition
|
40 |
---
|
41 |
-
# Mixsmol-4x400M-v0.1
|
42 |
This is the first checkpoint (Epoch 1) of Mixsmol-4x400M-v0.1
|
43 |
Note that this is an experimental in data mixing. Therefore, we only trained the model on 50B tokens (95% English and 5% Vietnamese) to test the following:
|
44 |
- Reasoining capabilities through high-quality synthetic textbooks data pretraining
|
@@ -71,3 +71,6 @@ After verifying our hypothesis with this run, we will schedule a second run on b
|
|
71 |
|truthfulqa_mc2|Yaml |none | 0|acc |0.3909|± |0.0148|
|
72 |
|winogrande|Yaml |none | 5|acc |0.5107|± | 0.014|
|
73 |
|gsm8k|Yaml |get-answer| 5|exact_match| 0|± | 0|
|
|
|
|
|
|
|
|
38 |
- text: In the context of computer programming, an algorithm is
|
39 |
example_title: Algorithm Definition
|
40 |
---
|
41 |
+
# Mixsmol-4x400M-v0.1 by Ontocord
|
42 |
This is the first checkpoint (Epoch 1) of Mixsmol-4x400M-v0.1
|
43 |
Note that this is an experimental in data mixing. Therefore, we only trained the model on 50B tokens (95% English and 5% Vietnamese) to test the following:
|
44 |
- Reasoining capabilities through high-quality synthetic textbooks data pretraining
|
|
|
71 |
|truthfulqa_mc2|Yaml |none | 0|acc |0.3909|± |0.0148|
|
72 |
|winogrande|Yaml |none | 5|acc |0.5107|± | 0.014|
|
73 |
|gsm8k|Yaml |get-answer| 5|exact_match| 0|± | 0|
|
74 |
+
|
75 |
+
## Contribution
|
76 |
+
This work is a shared contribution between **Ontocord, BEE-spoke-data and VILM**
|