File size: 2,679 Bytes
52ffbb0 efe1f76 5389d02 52ffbb0 efe1f76 52ffbb0 efe1f76 52ffbb0 efe1f76 52ffbb0 efe1f76 52ffbb0 efe1f76 52ffbb0 efe1f76 52ffbb0 efe1f76 52ffbb0 efe1f76 52ffbb0 efe1f76 52ffbb0 5d85114 52ffbb0 efe1f76 52ffbb0 efe1f76 52ffbb0 5389d02 52ffbb0 5389d02 5fb8a3e 5389d02 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 |
---
library_name: transformers
license: apache-2.0
base_model:
- nbeerbower/Mahou-1.2a-mistral-7B
datasets:
- flammenai/MahouMix-v1
- flammenai/FlameMix-DPO-v1
---
![image/png](https://huggingface.co/flammenai/Mahou-1.0-mistral-7B/resolve/main/mahou1.png)
# Mahou-1.2b-mistral-7B
Mahou is designed to provide short messages in a conversational context. It is capable of casual conversation and character roleplay.
### Chat Format
This model has been trained to use ChatML format.
```
<|im_start|>system
{{system}}<|im_end|>
<|im_start|>{{char}}
{{message}}<|im_end|>
<|im_start|>{{user}}
{{message}}<|im_end|>
```
### Roleplay Format
- Speech without quotes.
- Actions in `*asterisks*`
```
*leans against wall cooly* so like, i just casted a super strong spell at magician academy today, not gonna lie, felt badass.
```
### SillyTavern Settings
1. Use ChatML for the Context Template.
2. Enable Instruct Mode.
3. Use the [Mahou preset](https://huggingface.co/datasets/flammenai/Mahou-ST-ChatML-Instruct/raw/main/Mahou.json).
4. *Recommended* Additonal stopping strings: `["\n", "<|", "</"]`
### Method
DPO finetuned using an A100 on Google Colab.
[Fine-tune a Mistral-7b model with Direct Preference Optimization](https://towardsdatascience.com/fine-tune-a-mistral-7b-model-with-direct-preference-optimization-708042745aac) - [Maxime Labonne](https://huggingface.co/mlabonne)
### Configuration
LoRA, model, and training settings:
```python
# LoRA configuration
peft_config = LoraConfig(
r=16,
lora_alpha=16,
lora_dropout=0.05,
bias="none",
task_type="CAUSAL_LM",
target_modules=['k_proj', 'gate_proj', 'v_proj', 'up_proj', 'q_proj', 'o_proj', 'down_proj']
)
# Model to fine-tune
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.bfloat16,
load_in_4bit=True
)
model.config.use_cache = False
# Reference model
ref_model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.bfloat16,
load_in_4bit=True
)
# Training arguments
training_args = TrainingArguments(
per_device_train_batch_size=4,
gradient_accumulation_steps=4,
gradient_checkpointing=True,
learning_rate=5e-5,
lr_scheduler_type="cosine",
max_steps=200,
save_strategy="no",
logging_steps=1,
output_dir=new_model,
optim="paged_adamw_32bit",
warmup_steps=100,
bf16=True,
report_to="wandb",
)
# Create DPO trainer
dpo_trainer = DPOTrainer(
model,
ref_model,
args=training_args,
train_dataset=dataset,
tokenizer=tokenizer,
peft_config=peft_config,
beta=0.1,
force_use_ref_model=True
)
# Fine-tune model with DPO
dpo_trainer.train()
``` |