mnoukhov commited on
Commit
06f2966
·
verified ·
1 Parent(s): f8f49c9

Model save

Browse files
Files changed (2) hide show
  1. README.md +10 -7
  2. model.safetensors +1 -1
README.md CHANGED
@@ -17,7 +17,7 @@ should probably proofread and complete it, then remove this comment. -->
17
 
18
  This model is a fine-tuned version of [EleutherAI/pythia-160m-deduped](https://huggingface.co/EleutherAI/pythia-160m-deduped) on an unknown dataset.
19
  It achieves the following results on the evaluation set:
20
- - Loss: 3.3864
21
 
22
  ## Model description
23
 
@@ -37,11 +37,14 @@ More information needed
37
 
38
  The following hyperparameters were used during training:
39
  - learning_rate: 1e-05
40
- - train_batch_size: 32
41
  - eval_batch_size: 8
42
  - seed: 42
43
- - gradient_accumulation_steps: 4
 
 
44
  - total_train_batch_size: 128
 
45
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
46
  - lr_scheduler_type: cosine
47
  - num_epochs: 1
@@ -50,10 +53,10 @@ The following hyperparameters were used during training:
50
 
51
  | Training Loss | Epoch | Step | Validation Loss |
52
  |:-------------:|:------:|:----:|:---------------:|
53
- | 4.2966 | 0.2007 | 183 | 3.5566 |
54
- | 3.5154 | 0.4013 | 366 | 3.4336 |
55
- | 3.4239 | 0.6020 | 549 | 3.4037 |
56
- | 3.3856 | 0.8026 | 732 | 3.3864 |
57
 
58
 
59
  ### Framework versions
 
17
 
18
  This model is a fine-tuned version of [EleutherAI/pythia-160m-deduped](https://huggingface.co/EleutherAI/pythia-160m-deduped) on an unknown dataset.
19
  It achieves the following results on the evaluation set:
20
+ - Loss: 2.7993
21
 
22
  ## Model description
23
 
 
37
 
38
  The following hyperparameters were used during training:
39
  - learning_rate: 1e-05
40
+ - train_batch_size: 16
41
  - eval_batch_size: 8
42
  - seed: 42
43
+ - distributed_type: multi-GPU
44
+ - num_devices: 4
45
+ - gradient_accumulation_steps: 2
46
  - total_train_batch_size: 128
47
+ - total_eval_batch_size: 32
48
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
49
  - lr_scheduler_type: cosine
50
  - num_epochs: 1
 
53
 
54
  | Training Loss | Epoch | Step | Validation Loss |
55
  |:-------------:|:------:|:----:|:---------------:|
56
+ | 3.2275 | 0.2007 | 183 | 2.8647 |
57
+ | 2.8581 | 0.4013 | 366 | 2.8291 |
58
+ | 2.8213 | 0.6020 | 549 | 2.8076 |
59
+ | 2.799 | 0.8026 | 732 | 2.7993 |
60
 
61
 
62
  ### Framework versions
model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:5ef91cb4782828840161f6bbfe31866748c30e164f99194f0d55ced62cfc2a46
3
  size 649308728
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:31660be27be25fe34fe32421fdab805203a15dd10e60fae06c945e1a4d6ec998
3
  size 649308728