See axolotl config

axolotl version: 0.4.1

adapter: lora
auto_find_batch_size: true
base_model: echarlaix/tiny-random-mistral
bf16: auto
chat_template: llama3
dataloader_num_workers: 12
dataset_prepared_path: null
datasets:
- data_files:
  - 82efb243c4acc5d5_train_data.json
  ds_type: json
  format: custom
  path: /workspace/input_data/82efb243c4acc5d5_train_data.json
  type:
    field_instruction: document_extracted
    field_output: answer
    format: '{instruction}'
    no_input_format: '{instruction}'
    system_format: '{system}'
    system_prompt: ''
debug: null
deepspeed: null
early_stopping_patience: 3
early_stopping_threshold: 0.001
eval_max_new_tokens: 128
eval_steps: 40
flash_attention: false
fp16: null
fsdp: null
fsdp_config: null
gradient_accumulation_steps: 2
gradient_checkpointing: false
group_by_length: false
hub_model_id: mrferr3t/171e70a9-ac4c-43a1-b27d-2d7fb42dc386
hub_repo: null
hub_strategy: checkpoint
hub_token: null
learning_rate: 0.0003
load_in_4bit: false
load_in_8bit: false
local_rank: null
logging_steps: 100
lora_alpha: 16
lora_dropout: 0.05
lora_fan_in_fan_out: null
lora_model_dir: null
lora_r: 8
lora_target_linear: true
lr_scheduler: cosine
micro_batch_size: 32
mlflow_experiment_name: /tmp/82efb243c4acc5d5_train_data.json
model_type: AutoModelForCausalLM
num_epochs: 50
optimizer: adamw_bnb_8bit
output_dir: miner_id_24
pad_to_sequence_len: true
s2_attention: null
sample_packing: false
save_steps: 40
saves_per_epoch: 0
sequence_len: 512
special_tokens:
  pad_token: </s>
strict: false
tf32: false
tokenizer_type: AutoTokenizer
train_on_inputs: false
trust_remote_code: true
val_set_size: 0.02
wandb_entity: null
wandb_mode: online
wandb_name: f291d08a-7bab-40e2-a056-41d22f449784
wandb_project: Gradients-On-Demand
wandb_run: your_name
wandb_runid: f291d08a-7bab-40e2-a056-41d22f449784
warmup_ratio: 0.05
weight_decay: 0.0
xformers_attention: null

171e70a9-ac4c-43a1-b27d-2d7fb42dc386

This model is a fine-tuned version of echarlaix/tiny-random-mistral on the None dataset. It achieves the following results on the evaluation set:

Loss: 10.2150

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0003
train_batch_size: 32
eval_batch_size: 32
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 64
optimizer: Use adamw_bnb_8bit with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 1367
num_epochs: 50

Training results

Training Loss	Epoch	Step	Validation Loss
No log	0.0018	1	10.3778
No log	0.0731	40	10.3776
No log	0.1463	80	10.3770
20.7539	0.2194	120	10.3759
20.7539	0.2925	160	10.3737
20.7481	0.3656	200	10.3695
20.7481	0.4388	240	10.3590
20.7481	0.5119	280	10.3314
20.7006	0.5850	320	10.3121
20.7006	0.6581	360	10.3067
20.6188	0.7313	400	10.3020
20.6188	0.8044	440	10.2942
20.6188	0.8775	480	10.2896
20.5903	0.9506	520	10.2872
20.5903	1.0238	560	10.2845
20.573	1.0969	600	10.2791
20.573	1.1700	640	10.2740
20.573	1.2431	680	10.2694
20.5561	1.3163	720	10.2649
20.5561	1.3894	760	10.2608
20.5316	1.4625	800	10.2570
20.5316	1.5356	840	10.2524
20.5316	1.6088	880	10.2474
20.5134	1.6819	920	10.2440
20.5134	1.7550	960	10.2411
20.4978	1.8282	1000	10.2388
20.4978	1.9013	1040	10.2363
20.4978	1.9744	1080	10.2341
20.4883	2.0475	1120	10.2326
20.4883	2.1207	1160	10.2310
20.4799	2.1938	1200	10.2297
20.4799	2.2669	1240	10.2282
20.4799	2.3400	1280	10.2270
20.4725	2.4132	1320	10.2260
20.4725	2.4863	1360	10.2251
20.4671	2.5594	1400	10.2242
20.4671	2.6325	1440	10.2234
20.4671	2.7057	1480	10.2223
20.4601	2.7788	1520	10.2210
20.4601	2.8519	1560	10.2200
20.4566	2.9250	1600	10.2196
20.4566	2.9982	1640	10.2191
20.4566	3.0713	1680	10.2184
20.4548	3.1444	1720	10.2179
20.4548	3.2176	1760	10.2177
20.4496	3.2907	1800	10.2174
20.4496	3.3638	1840	10.2171
20.4496	3.4369	1880	10.2168
20.4522	3.5101	1920	10.2165
20.4522	3.5832	1960	10.2164
20.445	3.6563	2000	10.2161
20.445	3.7294	2040	10.2160
20.445	3.8026	2080	10.2159
20.4546	3.8757	2120	10.2157
20.4546	3.9488	2160	10.2154
20.4492	4.0219	2200	10.2153
20.4492	4.0951	2240	10.2153
20.4492	4.1682	2280	10.2150
20.4499	4.2413	2320	10.2150
20.4499	4.3144	2360	10.2150
20.4462	4.3876	2400	10.2150

Framework versions

PEFT 0.13.2
Transformers 4.46.0
Pytorch 2.3.1+cu121
Datasets 3.0.1
Tokenizers 0.20.1

mrferr3t
/

171e70a9-ac4c-43a1-b27d-2d7fb42dc386

171e70a9-ac4c-43a1-b27d-2d7fb42dc386

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for mrferr3t/171e70a9-ac4c-43a1-b27d-2d7fb42dc386

Evaluation results