Abhishek Mishra commited on
Commit
d4a88e4
·
unverified ·
1 Parent(s): 2d60ba3

Adding qlora config for Mistral (#675)

Browse files

* Adding qlora config for Mistral

Contains fix for Mistral FA issue - ValueError: You are attempting to perform batched generation with padding_side='right' this may lead to unexpected behaviour for Flash Attention version of Mistral. Make sure to call tokenizer.padding_side = 'left' before tokenizing the input.

Fix for now is to set sample_packing: true and pad_to_sequence_len: true

* Renamed to qlora.yml

Files changed (1) hide show
  1. examples/mistral/qlora.yml +79 -0
examples/mistral/qlora.yml ADDED
@@ -0,0 +1,79 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ base_model: mistralai/Mistral-7B-v0.1
2
+ base_model_config: mistralai/Mistral-7B-v0.1
3
+ model_type: MistralForCausalLM
4
+ tokenizer_type: LlamaTokenizer
5
+ is_mistral_derived_model: true
6
+
7
+ load_in_8bit: false
8
+ load_in_4bit: true
9
+ strict: false
10
+
11
+ datasets:
12
+ - path: mhenrichsen/alpaca_2k_test
13
+ type: alpaca
14
+ dataset_prepared_path: last_run_prepared
15
+ val_set_size: 0.01
16
+ output_dir: ./qlora-out
17
+
18
+ adapter: qlora
19
+ lora_model_dir:
20
+
21
+ sequence_len: 8192
22
+ sample_packing: True
23
+ pad_to_sequence_len: True
24
+
25
+ lora_r: 32
26
+ lora_alpha: 16
27
+ lora_dropout: 0.05
28
+ lora_target_linear: true
29
+ lora_fan_in_fan_out:
30
+ lora_target_modules:
31
+ - gate_proj
32
+ - down_proj
33
+ - up_proj
34
+ - q_proj
35
+ - v_proj
36
+ - k_proj
37
+ - o_proj
38
+
39
+ wandb_project:
40
+ wandb_entity:
41
+ wandb_watch:
42
+ wandb_run_id:
43
+ wandb_log_model:
44
+
45
+ gradient_accumulation_steps: 4
46
+ micro_batch_size: 4
47
+ num_epochs: 1
48
+ optimizer: adamw_bnb_8bit
49
+ lr_scheduler: cosine
50
+ learning_rate: 0.0002
51
+
52
+ train_on_inputs: false
53
+ group_by_length: false
54
+ bf16: true
55
+ fp16: false
56
+ tf32: false
57
+
58
+ gradient_checkpointing: true
59
+ early_stopping_patience:
60
+ resume_from_checkpoint:
61
+ local_rank:
62
+ logging_steps: 1
63
+ xformers_attention:
64
+ flash_attention: true
65
+
66
+ warmup_steps: 10
67
+ eval_steps: 20
68
+ eval_table_size: 5
69
+ eval_table_max_new_tokens: 128
70
+ save_steps:
71
+ debug:
72
+ deepspeed:
73
+ weight_decay: 0.0
74
+ fsdp:
75
+ fsdp_config:
76
+ special_tokens:
77
+ bos_token: "<s>"
78
+ eos_token: "</s>"
79
+ unk_token: "<unk>"