RylanSchaeffer
commited on
End of training
Browse files- README.md +21 -33
- model-00001-of-00002.safetensors +1 -1
- model-00002-of-00002.safetensors +1 -1
- training_args.bin +2 -2
README.md
CHANGED
@@ -6,19 +6,19 @@ tags:
|
|
6 |
- sft
|
7 |
- generated_from_trainer
|
8 |
model-index:
|
9 |
-
- name: collapse_gemma-2-
|
10 |
results: []
|
11 |
---
|
12 |
|
13 |
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
|
14 |
should probably proofread and complete it, then remove this comment. -->
|
15 |
|
16 |
-
# collapse_gemma-2-
|
17 |
|
18 |
This model is a fine-tuned version of [google/gemma-2-2b](https://huggingface.co/google/gemma-2-2b) on an unknown dataset.
|
19 |
It achieves the following results on the evaluation set:
|
20 |
-
- Loss: 1.
|
21 |
-
- Num Input Tokens Seen:
|
22 |
|
23 |
## Model description
|
24 |
|
@@ -52,35 +52,23 @@ The following hyperparameters were used during training:
|
|
52 |
|
53 |
| Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen |
|
54 |
|:-------------:|:------:|:----:|:---------------:|:-----------------:|
|
55 |
-
| No log | 0 | 0 | 1.
|
56 |
-
| 1.
|
57 |
-
| 1.
|
58 |
-
| 1.
|
59 |
-
|
|
60 |
-
| 0.
|
61 |
-
| 0.
|
62 |
-
| 0.
|
63 |
-
| 0.
|
64 |
-
| 0.
|
65 |
-
| 0.
|
66 |
-
| 0.
|
67 |
-
| 0.
|
68 |
-
| 0.
|
69 |
-
| 0.
|
70 |
-
| 0.
|
71 |
-
| 0.
|
72 |
-
| 0.1286 | 0.5949 | 85 | 1.5182 | 4763072 |
|
73 |
-
| 0.1044 | 0.6299 | 90 | 1.4204 | 5043448 |
|
74 |
-
| 0.19 | 0.6649 | 95 | 1.4679 | 5318848 |
|
75 |
-
| 0.1548 | 0.6999 | 100 | 1.4739 | 5601360 |
|
76 |
-
| 0.1394 | 0.7349 | 105 | 1.4093 | 5877032 |
|
77 |
-
| 0.1386 | 0.7699 | 110 | 1.4460 | 6162712 |
|
78 |
-
| 0.1775 | 0.8049 | 115 | 1.4499 | 6435944 |
|
79 |
-
| 0.2135 | 0.8399 | 120 | 1.4051 | 6717936 |
|
80 |
-
| 0.1515 | 0.8749 | 125 | 1.5017 | 6994336 |
|
81 |
-
| 0.1906 | 0.9099 | 130 | 1.4869 | 7270544 |
|
82 |
-
| 0.1433 | 0.9449 | 135 | 1.4074 | 7542248 |
|
83 |
-
| 0.1096 | 0.9799 | 140 | 1.4848 | 7811456 |
|
84 |
|
85 |
|
86 |
### Framework versions
|
|
|
6 |
- sft
|
7 |
- generated_from_trainer
|
8 |
model-index:
|
9 |
+
- name: collapse_gemma-2-2b_hs2_replace_iter2_sftsd0
|
10 |
results: []
|
11 |
---
|
12 |
|
13 |
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
|
14 |
should probably proofread and complete it, then remove this comment. -->
|
15 |
|
16 |
+
# collapse_gemma-2-2b_hs2_replace_iter2_sftsd0
|
17 |
|
18 |
This model is a fine-tuned version of [google/gemma-2-2b](https://huggingface.co/google/gemma-2-2b) on an unknown dataset.
|
19 |
It achieves the following results on the evaluation set:
|
20 |
+
- Loss: 1.4538
|
21 |
+
- Num Input Tokens Seen: 4832464
|
22 |
|
23 |
## Model description
|
24 |
|
|
|
52 |
|
53 |
| Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen |
|
54 |
|:-------------:|:------:|:----:|:---------------:|:-----------------:|
|
55 |
+
| No log | 0 | 0 | 1.3909 | 0 |
|
56 |
+
| 1.6784 | 0.0591 | 5 | 1.2633 | 282096 |
|
57 |
+
| 1.3537 | 0.1183 | 10 | 1.1871 | 571576 |
|
58 |
+
| 1.0696 | 0.1774 | 15 | 1.2164 | 857160 |
|
59 |
+
| 0.9162 | 0.2365 | 20 | 1.2391 | 1142344 |
|
60 |
+
| 0.7598 | 0.2956 | 25 | 1.3479 | 1427536 |
|
61 |
+
| 0.5372 | 0.3548 | 30 | 1.4227 | 1715736 |
|
62 |
+
| 0.4796 | 0.4139 | 35 | 1.4737 | 2003760 |
|
63 |
+
| 0.3889 | 0.4730 | 40 | 1.5021 | 2286384 |
|
64 |
+
| 0.1994 | 0.5322 | 45 | 1.5032 | 2573248 |
|
65 |
+
| 0.3391 | 0.5913 | 50 | 1.4714 | 2862104 |
|
66 |
+
| 0.3297 | 0.6504 | 55 | 1.4358 | 3145472 |
|
67 |
+
| 0.2038 | 0.7095 | 60 | 1.4488 | 3432144 |
|
68 |
+
| 0.195 | 0.7687 | 65 | 1.4273 | 3724448 |
|
69 |
+
| 0.1749 | 0.8278 | 70 | 1.4248 | 4016736 |
|
70 |
+
| 0.1654 | 0.8869 | 75 | 1.4554 | 4305224 |
|
71 |
+
| 0.1846 | 0.9460 | 80 | 1.4274 | 4595952 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
72 |
|
73 |
|
74 |
### Framework versions
|
model-00001-of-00002.safetensors
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
size 4988025760
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:c8923d0ce24fc19ff925f57ba737262a3b651f7af10da5b3c3708d73d6a013fc
|
3 |
size 4988025760
|
model-00002-of-00002.safetensors
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
size 240691728
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:8884483d7b8b59dc8c03fd2f12897e7e5088e654072e12c531ece63cc55a75ba
|
3 |
size 240691728
|
training_args.bin
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
-
size
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:99b64a8f734610c1930219e035f3328b78f014bc58cffac3230063c0fa0f529c
|
3 |
+
size 5624
|