jeiku commited on
Commit
fc4d844
·
verified ·
1 Parent(s): 8736d30

Model save

Browse files
Files changed (1) hide show
  1. README.md +159 -0
README.md ADDED
@@ -0,0 +1,159 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ license: other
4
+ base_model: IntervitensInc/Llama-3.1-Minitron-4B-Width-Base-chatml
5
+ tags:
6
+ - axolotl
7
+ - generated_from_trainer
8
+ model-index:
9
+ - name: completion4B
10
+ results: []
11
+ ---
12
+
13
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
14
+ should probably proofread and complete it, then remove this comment. -->
15
+
16
+ [<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
17
+ <details><summary>See axolotl config</summary>
18
+
19
+ axolotl version: `0.4.1`
20
+ ```yaml
21
+ base_model: IntervitensInc/Llama-3.1-Minitron-4B-Width-Base-chatml
22
+ model_type: AutoModelForCausalLM
23
+ tokenizer_type: AutoTokenizer
24
+
25
+ load_in_8bit: false
26
+ load_in_4bit: false
27
+ strict: false
28
+
29
+ hub_model_id: jeiku/completion4B
30
+ hub_strategy: "all_checkpoints"
31
+ push_dataset_to_hub:
32
+ hf_use_auth_token: true
33
+
34
+ datasets:
35
+ - path: Mielikki/Erebus-87k
36
+ type: completion
37
+ field: body
38
+
39
+ shuffle_merged_datasets: true
40
+ val_set_size: 0.0025
41
+ output_dir: ./outputs/out
42
+
43
+ adapter:
44
+ lora_r:
45
+ lora_alpha:
46
+ lora_dropout:
47
+ lora_target_linear:
48
+
49
+ sequence_len: 8192
50
+ sample_packing: true
51
+ eval_sample_packing: false
52
+ pad_to_sequence_len: true
53
+
54
+ plugins:
55
+ - axolotl.integrations.liger.LigerPlugin
56
+ liger_rope: true
57
+ liger_rms_norm: true
58
+ liger_swiglu: true
59
+ liger_fused_linear_cross_entropy: true
60
+
61
+ wandb_project: EXP4B
62
+ wandb_entity:
63
+ wandb_watch:
64
+ wandb_name: EXP4B
65
+ wandb_log_model:
66
+
67
+ gradient_accumulation_steps: 12
68
+ micro_batch_size: 3
69
+ num_epochs: 1
70
+ optimizer: adamw_bnb_8bit
71
+ lr_scheduler: cosine
72
+ learning_rate: 0.00001
73
+ weight_decay: 0.05
74
+
75
+ train_on_inputs: false
76
+ group_by_length: false
77
+ bf16: auto
78
+ fp16:
79
+ tf32: true
80
+
81
+ gradient_checkpointing: true
82
+ early_stopping_patience:
83
+ resume_from_checkpoint:
84
+ local_rank:
85
+ logging_steps: 1
86
+ xformers_attention:
87
+ flash_attention: true
88
+
89
+ warmup_ratio: 0.1
90
+ evals_per_epoch: 4
91
+ eval_table_size:
92
+ eval_max_new_tokens: 128
93
+ saves_per_epoch: 1
94
+
95
+ debug:
96
+ deepspeed: deepspeed_configs/zero3_bf16.json
97
+ fsdp:
98
+ fsdp_config:
99
+
100
+ special_tokens:
101
+ pad_token: <|finetune_right_pad_id|>
102
+
103
+ ```
104
+
105
+ </details><br>
106
+
107
+ # completion4B
108
+
109
+ This model is a fine-tuned version of [IntervitensInc/Llama-3.1-Minitron-4B-Width-Base-chatml](https://huggingface.co/IntervitensInc/Llama-3.1-Minitron-4B-Width-Base-chatml) on the None dataset.
110
+ It achieves the following results on the evaluation set:
111
+ - Loss: 2.9360
112
+
113
+ ## Model description
114
+
115
+ More information needed
116
+
117
+ ## Intended uses & limitations
118
+
119
+ More information needed
120
+
121
+ ## Training and evaluation data
122
+
123
+ More information needed
124
+
125
+ ## Training procedure
126
+
127
+ ### Training hyperparameters
128
+
129
+ The following hyperparameters were used during training:
130
+ - learning_rate: 1e-05
131
+ - train_batch_size: 3
132
+ - eval_batch_size: 3
133
+ - seed: 42
134
+ - distributed_type: multi-GPU
135
+ - num_devices: 2
136
+ - gradient_accumulation_steps: 12
137
+ - total_train_batch_size: 72
138
+ - total_eval_batch_size: 6
139
+ - optimizer: Use OptimizerNames.ADAMW_BNB with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
140
+ - lr_scheduler_type: cosine
141
+ - lr_scheduler_warmup_steps: 34
142
+ - num_epochs: 1
143
+
144
+ ### Training results
145
+
146
+ | Training Loss | Epoch | Step | Validation Loss |
147
+ |:-------------:|:------:|:----:|:---------------:|
148
+ | 2.5227 | 0.0029 | 1 | 2.9798 |
149
+ | 2.5027 | 0.2520 | 88 | 2.9501 |
150
+ | 2.481 | 0.5039 | 176 | 2.9398 |
151
+ | 2.4313 | 0.7559 | 264 | 2.9360 |
152
+
153
+
154
+ ### Framework versions
155
+
156
+ - Transformers 4.46.0.dev0
157
+ - Pytorch 2.4.0+cu121
158
+ - Datasets 2.21.0
159
+ - Tokenizers 0.20.0