Model save

Browse files

Files changed (8) hide show

README.md +68 -0
adapter_config.json +2 -2
all_results.json +9 -0
runs/Dec17_23-46-54_ellis-compute-02.cs.cornell.edu/events.out.tfevents.1734497240.ellis-compute-02.cs.cornell.edu.267747.0 +3 -0
tokenizer.json +1 -6
train_results.json +9 -0
trainer_state.json +0 -0
training_args.bin +1 -1

README.md ADDED Viewed

	@@ -0,0 +1,68 @@

+---
+base_model: barc0/cot-transduction-arc-heavy
+library_name: peft
+license: llama3.1
+tags:
+- trl
+- sft
+- generated_from_trainer
+model-index:
+- name: cot-trainset-ft-transduction-v3-lora-train
+  results: []
+---
+<!-- This model card has been generated automatically according to the information the Trainer had access to. You
+should probably proofread and complete it, then remove this comment. -->
+# cot-trainset-ft-transduction-v3-lora-train
+This model is a fine-tuned version of [barc0/cot-transduction-arc-heavy](https://huggingface.co/barc0/cot-transduction-arc-heavy) on the None dataset.
+It achieves the following results on the evaluation set:
+- Loss: 0.1333
+## Model description
+More information needed
+## Intended uses & limitations
+More information needed
+## Training and evaluation data
+More information needed
+## Training procedure
+### Training hyperparameters
+The following hyperparameters were used during training:
+- learning_rate: 0.0002
+- train_batch_size: 2
+- eval_batch_size: 2
+- seed: 42
+- distributed_type: multi-GPU
+- num_devices: 4
+- gradient_accumulation_steps: 2
+- total_train_batch_size: 16
+- total_eval_batch_size: 8
+- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
+- lr_scheduler_type: cosine
+- lr_scheduler_warmup_ratio: 0.1
+- num_epochs: 2
+### Training results
+| Training Loss | Epoch  | Step | Validation Loss |
+|:-------------:|:------:|:----:|:---------------:|
+| 0.1018        | 0.9982 | 277  | 0.1238          |
+| 0.0822        | 1.9964 | 554  | 0.1333          |
+### Framework versions
+- PEFT 0.12.0
+- Transformers 4.45.0.dev0
+- Pytorch 2.4.0+cu121
+- Datasets 2.21.0
+- Tokenizers 0.19.1

adapter_config.json CHANGED Viewed

@@ -21,10 +21,10 @@
   "revision": null,
   "target_modules": [
     "v_proj",
-    "q_proj",
     "o_proj",
     "k_proj",
-    "gate_proj",
     "up_proj",
     "down_proj"
   ],

   "revision": null,
   "target_modules": [
     "v_proj",
+    "gate_proj",
     "o_proj",
     "k_proj",
+    "q_proj",
     "up_proj",
     "down_proj"
   ],

all_results.json ADDED Viewed

	@@ -0,0 +1,9 @@

+{
+    "epoch": 1.9982608695652173,
+    "total_flos": 466200922914816.0,
+    "train_loss": 0.0,
+    "train_runtime": 1.1199,
+    "train_samples": 4596,
+    "train_samples_per_second": 8208.056,
+    "train_steps_per_second": 512.557
+}

runs/Dec17_23-46-54_ellis-compute-02.cs.cornell.edu/events.out.tfevents.1734497240.ellis-compute-02.cs.cornell.edu.267747.0 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e4628b45b86b7776b74f64cf2f43fbe45b709892b9657e4bdf212e7f6b724fce
+size 6316

tokenizer.json CHANGED Viewed

@@ -1,11 +1,6 @@
 {
   "version": "1.0",
-  "truncation": {
-    "direction": "Right",
-    "max_length": 8192,
-    "strategy": "LongestFirst",
-    "stride": 0
-  },
   "padding": null,
   "added_tokens": [
     {

 {
   "version": "1.0",
+  "truncation": null,
   "padding": null,
   "added_tokens": [
     {

train_results.json ADDED Viewed

	@@ -0,0 +1,9 @@

+{
+    "epoch": 1.9982608695652173,
+    "total_flos": 466200922914816.0,
+    "train_loss": 0.0,
+    "train_runtime": 1.1199,
+    "train_samples": 4596,
+    "train_samples_per_second": 8208.056,
+    "train_steps_per_second": 512.557
+}

trainer_state.json ADDED Viewed

The diff for this file is too large to render. See raw diff

training_args.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:3a84114748ad62c0d4f9c9e06c3346b561079cd50af3f29b507fdb3264b25354
 size 7096

 version https://git-lfs.github.com/spec/v1
+oid sha256:db80d5bb89bdf57c8b24fbdf846f8359d7628d4388dfbf74995de60f55fcdc35
 size 7096