nbeerbower commited on
Commit
2b0b547
·
verified ·
1 Parent(s): 21efb68

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +72 -1
README.md CHANGED
@@ -6,5 +6,76 @@ base_model:
6
  datasets:
7
  - flammenai/FlameMix-DPO-v1
8
  ---
 
9
 
10
- # Mahou-1.0-mistral-7B
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6
  datasets:
7
  - flammenai/FlameMix-DPO-v1
8
  ---
9
+ ![image/png](https://huggingface.co/flammenai/Mahou-1.0-mistral-7B/resolve/main/mahou1.png)
10
 
11
+ # Mahou-1.0-mistral-7B
12
+
13
+ Mahou is our attempt to build a production-ready conversational/roleplay LLM.
14
+
15
+ Future versions will be released iteratively and finetuned from flammen.ai conversational data.
16
+
17
+ NOTE: this model is experimental and currently significantly flawed.
18
+
19
+ ### Method
20
+
21
+ Finetuned using an A100 on Google Colab.
22
+
23
+ [Fine-tune a Mistral-7b model with Direct Preference Optimization](https://towardsdatascience.com/fine-tune-a-mistral-7b-model-with-direct-preference-optimization-708042745aac) - [Maxime Labonne](https://huggingface.co/mlabonne)
24
+
25
+ ### Configuration
26
+
27
+ LoRA, model, and training settings:
28
+
29
+ ```python
30
+ # LoRA configuration
31
+ peft_config = LoraConfig(
32
+ r=16,
33
+ lora_alpha=16,
34
+ lora_dropout=0.05,
35
+ bias="none",
36
+ task_type="CAUSAL_LM",
37
+ target_modules=['k_proj', 'gate_proj', 'v_proj', 'up_proj', 'q_proj', 'o_proj', 'down_proj']
38
+ )
39
+ # Model to fine-tune
40
+ model = AutoModelForCausalLM.from_pretrained(
41
+ model_name,
42
+ torch_dtype=torch.bfloat16,
43
+ load_in_4bit=True
44
+ )
45
+ model.config.use_cache = False
46
+ # Reference model
47
+ ref_model = AutoModelForCausalLM.from_pretrained(
48
+ model_name,
49
+ torch_dtype=torch.bfloat16,
50
+ load_in_4bit=True
51
+ )
52
+ # Training arguments
53
+ training_args = TrainingArguments(
54
+ per_device_train_batch_size=4,
55
+ gradient_accumulation_steps=4,
56
+ gradient_checkpointing=True,
57
+ learning_rate=5e-5,
58
+ lr_scheduler_type="cosine",
59
+ max_steps=2000,
60
+ save_strategy="no",
61
+ logging_steps=1,
62
+ output_dir=new_model,
63
+ optim="paged_adamw_32bit",
64
+ warmup_steps=100,
65
+ bf16=True,
66
+ report_to="wandb",
67
+ )
68
+ # Create DPO trainer
69
+ dpo_trainer = DPOTrainer(
70
+ model,
71
+ ref_model,
72
+ args=training_args,
73
+ train_dataset=dataset,
74
+ tokenizer=tokenizer,
75
+ peft_config=peft_config,
76
+ beta=0.1,
77
+ max_prompt_length=2048,
78
+ max_length=8192,
79
+ force_use_ref_model=True
80
+ )
81
+ ```