File size: 2,679 Bytes

52ffbb0
 
efe1f76
 
 
 
 
5389d02
52ffbb0
efe1f76
52ffbb0
efe1f76
52ffbb0
efe1f76
52ffbb0
efe1f76
52ffbb0
efe1f76
52ffbb0
efe1f76
 
 
 
 
 
 
 
52ffbb0
efe1f76
52ffbb0
efe1f76
 
52ffbb0
efe1f76
 
 
52ffbb0
5d85114
52ffbb0
efe1f76
 
 
 
52ffbb0
efe1f76
52ffbb0
5389d02
52ffbb0
5389d02
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5fb8a3e
5389d02

---
library_name: transformers
license: apache-2.0
base_model:
- nbeerbower/Mahou-1.2a-mistral-7B
datasets:
- flammenai/MahouMix-v1
- flammenai/FlameMix-DPO-v1
---
![image/png](https://huggingface.co/flammenai/Mahou-1.0-mistral-7B/resolve/main/mahou1.png)

# Mahou-1.2b-mistral-7B

Mahou is designed to provide short messages in a conversational context. It is capable of casual conversation and character roleplay.

### Chat Format

This model has been trained to use ChatML format.

```
<|im_start|>system
{{system}}<|im_end|>
<|im_start|>{{char}}
{{message}}<|im_end|>
<|im_start|>{{user}}
{{message}}<|im_end|>
```

### Roleplay Format

- Speech without quotes.
- Actions in `*asterisks*`

```
*leans against wall cooly* so like, i just casted a super strong spell at magician academy today, not gonna lie, felt badass.
```

### SillyTavern Settings

1. Use ChatML for the Context Template.
2. Enable Instruct Mode.
3. Use the [Mahou preset](https://huggingface.co/datasets/flammenai/Mahou-ST-ChatML-Instruct/raw/main/Mahou.json).
4. *Recommended* Additonal stopping strings: `["\n", "<|", "</"]`

### Method

DPO finetuned using an A100 on Google Colab.

[Fine-tune a Mistral-7b model with Direct Preference Optimization](https://towardsdatascience.com/fine-tune-a-mistral-7b-model-with-direct-preference-optimization-708042745aac) - [Maxime Labonne](https://huggingface.co/mlabonne)

### Configuration

LoRA, model, and training settings:

```python
# LoRA configuration
peft_config = LoraConfig(
    r=16,
    lora_alpha=16,
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM",
    target_modules=['k_proj', 'gate_proj', 'v_proj', 'up_proj', 'q_proj', 'o_proj', 'down_proj']
)
# Model to fine-tune
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    load_in_4bit=True
)
model.config.use_cache = False
# Reference model
ref_model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    load_in_4bit=True
)
# Training arguments
training_args = TrainingArguments(
    per_device_train_batch_size=4,
    gradient_accumulation_steps=4,
    gradient_checkpointing=True,
    learning_rate=5e-5,
    lr_scheduler_type="cosine",
    max_steps=200,
    save_strategy="no",
    logging_steps=1,
    output_dir=new_model,
    optim="paged_adamw_32bit",
    warmup_steps=100,
    bf16=True,
    report_to="wandb",
)
# Create DPO trainer
dpo_trainer = DPOTrainer(
    model,
    ref_model,
    args=training_args,
    train_dataset=dataset,
    tokenizer=tokenizer,
    peft_config=peft_config,
    beta=0.1,
    force_use_ref_model=True
)
# Fine-tune model with DPO
dpo_trainer.train()
```