mnoukhov
/

pythia160m-sft-tldr

Text Generation

Generated from Trainer

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

mnoukhov commited on Jul 3, 2024

Commit

06f2966

·

verified ·

1 Parent(s): f8f49c9

Model save

Files changed (2) hide show

README.md +10 -7
model.safetensors +1 -1

README.md CHANGED Viewed

@@ -17,7 +17,7 @@ should probably proofread and complete it, then remove this comment. -->
 This model is a fine-tuned version of [EleutherAI/pythia-160m-deduped](https://huggingface.co/EleutherAI/pythia-160m-deduped) on an unknown dataset.
 It achieves the following results on the evaluation set:
-- Loss: 3.3864
 ## Model description
@@ -37,11 +37,14 @@ More information needed
 The following hyperparameters were used during training:
 - learning_rate: 1e-05
-- train_batch_size: 32
 - eval_batch_size: 8
 - seed: 42
-- gradient_accumulation_steps: 4
 - total_train_batch_size: 128
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: cosine
 - num_epochs: 1
@@ -50,10 +53,10 @@ The following hyperparameters were used during training:
 | Training Loss | Epoch  | Step | Validation Loss |
 |:-------------:|:------:|:----:|:---------------:|
-| 4.2966        | 0.2007 | 183  | 3.5566          |
-| 3.5154        | 0.4013 | 366  | 3.4336          |
-| 3.4239        | 0.6020 | 549  | 3.4037          |
-| 3.3856        | 0.8026 | 732  | 3.3864          |
 ### Framework versions

 This model is a fine-tuned version of [EleutherAI/pythia-160m-deduped](https://huggingface.co/EleutherAI/pythia-160m-deduped) on an unknown dataset.
 It achieves the following results on the evaluation set:
+- Loss: 2.7993
 ## Model description
 The following hyperparameters were used during training:
 - learning_rate: 1e-05
+- train_batch_size: 16
 - eval_batch_size: 8
 - seed: 42
+- distributed_type: multi-GPU
+- num_devices: 4
+- gradient_accumulation_steps: 2
 - total_train_batch_size: 128
+- total_eval_batch_size: 32
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: cosine
 - num_epochs: 1
 | Training Loss | Epoch  | Step | Validation Loss |
 |:-------------:|:------:|:----:|:---------------:|
+| 3.2275        | 0.2007 | 183  | 2.8647          |
+| 2.8581        | 0.4013 | 366  | 2.8291          |
+| 2.8213        | 0.6020 | 549  | 2.8076          |
+| 2.799         | 0.8026 | 732  | 2.7993          |
 ### Framework versions

model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:5ef91cb4782828840161f6bbfe31866748c30e164f99194f0d55ced62cfc2a46
 size 649308728

 version https://git-lfs.github.com/spec/v1
+oid sha256:31660be27be25fe34fe32421fdab805203a15dd10e60fae06c945e1a4d6ec998
 size 649308728