RylanSchaeffer commited on
Commit
ea5a1e6
·
verified ·
1 Parent(s): 47a303e

End of training

Browse files
README.md CHANGED
@@ -6,19 +6,19 @@ tags:
6
  - sft
7
  - generated_from_trainer
8
  model-index:
9
- - name: collapse_gemma-2-2b_hs2_iter2_sftsd0
10
  results: []
11
  ---
12
 
13
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
14
  should probably proofread and complete it, then remove this comment. -->
15
 
16
- # collapse_gemma-2-2b_hs2_iter2_sftsd0
17
 
18
  This model is a fine-tuned version of [google/gemma-2-2b](https://huggingface.co/google/gemma-2-2b) on an unknown dataset.
19
  It achieves the following results on the evaluation set:
20
- - Loss: 1.5115
21
- - Num Input Tokens Seen: 7923536
22
 
23
  ## Model description
24
 
@@ -52,35 +52,23 @@ The following hyperparameters were used during training:
52
 
53
  | Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen |
54
  |:-------------:|:------:|:----:|:---------------:|:-----------------:|
55
- | No log | 0 | 0 | 1.3956 | 0 |
56
- | 1.7735 | 0.0350 | 5 | 1.3066 | 280096 |
57
- | 1.5215 | 0.0700 | 10 | 1.1998 | 560120 |
58
- | 1.3668 | 0.1050 | 15 | 1.1725 | 838176 |
59
- | 1.0619 | 0.1400 | 20 | 1.1790 | 1120704 |
60
- | 0.917 | 0.1750 | 25 | 1.2479 | 1397888 |
61
- | 0.9118 | 0.2100 | 30 | 1.3144 | 1680224 |
62
- | 0.7159 | 0.2450 | 35 | 1.3931 | 1963544 |
63
- | 0.5111 | 0.2800 | 40 | 1.4439 | 2241792 |
64
- | 0.4749 | 0.3150 | 45 | 1.5136 | 2518608 |
65
- | 0.427 | 0.3500 | 50 | 1.5106 | 2799872 |
66
- | 0.3428 | 0.3850 | 55 | 1.5751 | 3084560 |
67
- | 0.3927 | 0.4199 | 60 | 1.4907 | 3368728 |
68
- | 0.2933 | 0.4549 | 65 | 1.5076 | 3648312 |
69
- | 0.249 | 0.4899 | 70 | 1.4746 | 3928200 |
70
- | 0.2253 | 0.5249 | 75 | 1.4913 | 4211080 |
71
- | 0.1422 | 0.5599 | 80 | 1.4445 | 4488088 |
72
- | 0.1286 | 0.5949 | 85 | 1.5182 | 4763072 |
73
- | 0.1044 | 0.6299 | 90 | 1.4204 | 5043448 |
74
- | 0.19 | 0.6649 | 95 | 1.4679 | 5318848 |
75
- | 0.1548 | 0.6999 | 100 | 1.4739 | 5601360 |
76
- | 0.1394 | 0.7349 | 105 | 1.4093 | 5877032 |
77
- | 0.1386 | 0.7699 | 110 | 1.4460 | 6162712 |
78
- | 0.1775 | 0.8049 | 115 | 1.4499 | 6435944 |
79
- | 0.2135 | 0.8399 | 120 | 1.4051 | 6717936 |
80
- | 0.1515 | 0.8749 | 125 | 1.5017 | 6994336 |
81
- | 0.1906 | 0.9099 | 130 | 1.4869 | 7270544 |
82
- | 0.1433 | 0.9449 | 135 | 1.4074 | 7542248 |
83
- | 0.1096 | 0.9799 | 140 | 1.4848 | 7811456 |
84
 
85
 
86
  ### Framework versions
 
6
  - sft
7
  - generated_from_trainer
8
  model-index:
9
+ - name: collapse_gemma-2-2b_hs2_replace_iter2_sftsd0
10
  results: []
11
  ---
12
 
13
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
14
  should probably proofread and complete it, then remove this comment. -->
15
 
16
+ # collapse_gemma-2-2b_hs2_replace_iter2_sftsd0
17
 
18
  This model is a fine-tuned version of [google/gemma-2-2b](https://huggingface.co/google/gemma-2-2b) on an unknown dataset.
19
  It achieves the following results on the evaluation set:
20
+ - Loss: 1.4538
21
+ - Num Input Tokens Seen: 4832464
22
 
23
  ## Model description
24
 
 
52
 
53
  | Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen |
54
  |:-------------:|:------:|:----:|:---------------:|:-----------------:|
55
+ | No log | 0 | 0 | 1.3909 | 0 |
56
+ | 1.6784 | 0.0591 | 5 | 1.2633 | 282096 |
57
+ | 1.3537 | 0.1183 | 10 | 1.1871 | 571576 |
58
+ | 1.0696 | 0.1774 | 15 | 1.2164 | 857160 |
59
+ | 0.9162 | 0.2365 | 20 | 1.2391 | 1142344 |
60
+ | 0.7598 | 0.2956 | 25 | 1.3479 | 1427536 |
61
+ | 0.5372 | 0.3548 | 30 | 1.4227 | 1715736 |
62
+ | 0.4796 | 0.4139 | 35 | 1.4737 | 2003760 |
63
+ | 0.3889 | 0.4730 | 40 | 1.5021 | 2286384 |
64
+ | 0.1994 | 0.5322 | 45 | 1.5032 | 2573248 |
65
+ | 0.3391 | 0.5913 | 50 | 1.4714 | 2862104 |
66
+ | 0.3297 | 0.6504 | 55 | 1.4358 | 3145472 |
67
+ | 0.2038 | 0.7095 | 60 | 1.4488 | 3432144 |
68
+ | 0.195 | 0.7687 | 65 | 1.4273 | 3724448 |
69
+ | 0.1749 | 0.8278 | 70 | 1.4248 | 4016736 |
70
+ | 0.1654 | 0.8869 | 75 | 1.4554 | 4305224 |
71
+ | 0.1846 | 0.9460 | 80 | 1.4274 | 4595952 |
 
 
 
 
 
 
 
 
 
 
 
 
72
 
73
 
74
  ### Framework versions
model-00001-of-00002.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:efd07ab3cc40ce9bd97b1836c678f8d3c8efe2ae48cefb0cf5560f5508716fa3
3
  size 4988025760
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c8923d0ce24fc19ff925f57ba737262a3b651f7af10da5b3c3708d73d6a013fc
3
  size 4988025760
model-00002-of-00002.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:c4d872fc598d532fb81b05d66ad3e33f3527de8f56b89024f97762d4fa512976
3
  size 240691728
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8884483d7b8b59dc8c03fd2f12897e7e5088e654072e12c531ece63cc55a75ba
3
  size 240691728
training_args.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:364f58ab83df95b858312c13228fb9cf63c5f59048e3967577ee0dc99c331f87
3
- size 5560
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:99b64a8f734610c1930219e035f3328b78f014bc58cffac3230063c0fa0f529c
3
+ size 5624