|
--- |
|
license: apache-2.0 |
|
base_model: |
|
- mistralai/Mistral-Nemo-Base-2407 |
|
language: |
|
- en |
|
- ko |
|
- ja |
|
- zh |
|
datasets: |
|
- 4DR1455/finance_questions |
|
- Aratako/Synthetic-JP-Conversations-Magpie-Nemotron-4-10k |
|
- Aratako/Synthetic-JP-EN-Coding-Dataset-Magpie-69k |
|
- Aratako/Synthetic-Japanese-Roleplay-NSFW-Claude-3.5s-10.5k-formatted |
|
- BCCard/BCCard-Finance-Kor-QnA |
|
- CarrotAI/ko-code-alpaca-QA |
|
- ChuGyouk/AI_healthcare_QA_samples_Sonnet3.5 |
|
- DavidLanz/medical_instruction |
|
- Dusker/lawyer-llama |
|
- Gryphe/Sonnet3.5-Charcard-Roleplay |
|
- HAERAE-HUB/qarv-instruct-ko |
|
- HachiML/alpaca_jp_math |
|
- Magpie-Align/Magpie-Llama-3.1-Pro-MT-300K-v0.1 |
|
- Magpie-Align/Magpie-Qwen2-Pro-200K-Chinese |
|
- beomi/KoAlpaca-v1.1a |
|
- codefuse-ai/Evol-instruction-66k |
|
- frankminors123/belle-math-zh |
|
- gbharti/wealth-alpaca_lora |
|
- iam-ajaymeena/Self-Instruct-Japanese-Elzya-13B |
|
- jihye-moon/LawQA-Ko |
|
- jondurbin/gutenberg-dpo-v0.1 |
|
- junyeong-nero/kin_med_100K_edited |
|
- kyujinpy/KOR-OpenOrca-Platypus-v3 |
|
- lavita/medical-qa-datasets |
|
- microsoft/orca-math-word-problems-200k |
|
- neural-bridge/rag-dataset-12000 |
|
- p1atdev/ichikara-instruction |
|
- qiaojin/PubMedQA |
|
- shibing624/roleplay-zh-sharegpt-gpt4-data |
|
- team-hatakeyama-phase2/AutoMultiTurnByCalm3-22B-Corrected-reformatted |
|
- ymoslem/Law-StackExchange |
|
- zzunyang/LawQA_LawSee |
|
--- |
|
# Mistral-Nemo-NT-Ko-12B-sft |
|
|
|
## Description |
|
|
|
**Mistral-Nemo-NT-Ko-12B-sft** is an instruction-tuned version of [*mistralai/Mistral-Nemo-Base-2407*](https://huggingface.co/mistralai/Mistral-Nemo-Base-2407), fine-tuned across four languages: English, Korean, Chinese, and Japanese. |
|
|
|
The primary goals of this model are **language alignment**, **cross-lingual knowledge transfer** and **ChatML formatting**. This is an intermediate version since preference optimization has not yet been applied. |
|
|
|
|
|
## Features |
|
|
|
- The base model supports a context length of 128K, while I fine-tuned this model with an 8K context size. |
|
|
|
- The model follows to the input language unless the user explicitly specifies an output language (If the language is set by a system role, it may be ignored). |
|
|
|
- Answer length tends to vary by language: English responses are generally longer than average, while Korean responses tend to be shorter. The behavior for Japanese and Chinese is still under observation. |
|
|
|
- Recommended temperature settings: 0.3 to 0.7. |
|
|
|
|
|
# Evaluation |
|
|
|
## LogicKor |
|
|
|
| 모델 | 방법 | 추론 | 수학 | 글쓰기 | 코딩 | 이해 | 문법 | 싱글턴 | 멀티턴 | 총점 | |
|
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | |
|
|Mistral-Nemo-NT-Ko-12B-sft| cot-1-shot |7.36 | 6.57 | 8.71 | 8.57 | 9.57 | 6.43 | 7.81 | 7.93 | **7.87** | |
|
|Mistral-Nemo-NT-Ko-12B-sft| 1-shot | 9.00 | 5.71 | 7.93 | 8.29 | 7.93 | 5.21 | 7.29 | 7.40 | 7.35 | |
|
| Mistral Nemo | 1-shot | 5.00, | 6.50 | 6.86 | 8.07 | 7.64 | 8.43 | 7.60 | 6.57 |7.08| |
|
| Mistral Nemo | cot-1-shot | 5.43, | 6.86 | 6.07 | 7.57 | 5.86 | 7.57 | 7.50 | 5.62 |6.56| |
|
|Mistral-Nemo-NT-Ko-12B-sft| default | 6.00 | 4.93 | 5.43 | 7.14 | 9.71 | 4.00 | 6.45 | 5.95 | 6.20 | |
|
| Mistral Nemo | default | 0.43, | 7.64 | 6.21 | 7.14 | 6.79 | 7.21 | 6.26 | 5.55 |5.90| |
|
|
|
## MT-Bench |
|
|
|
| Model | First | Second | Average | |
|
| --- | --- | --- | --- | |
|
|Mistral-Nemo-NT-Ko-12B-sft| 8.39 | 7.99 | 8.19 | |
|
\* ```judge-model: GPT-4``` |
|
|
|
## Language-Confusion(Korean Only) |
|
|
|
| Model | Monolingual-LPR | Monolingual-WPR | Crosslingual-LPR | Crosslingual-WPR | |
|
| --- | --- | --- | --- | --- | |
|
|Mistral-Nemo-NT-Ko-12B-sft| 100.00% | 99.00% | 87.51% | 96.96% | |
|
|Mistral-Nemo-Instruct-2407 | 90.72% | 93.18% | 46.75% | 92.84% | |
|
|Meta-Llama-3.1-8B-Instruct | 99.00% | 96.97% | 91.45% | 93.01% | |
|
|gemma-2-9b-it | 100.00% | 98.00% | 87.93% | 95.58% | |
|
|
|
|
|
example: |
|
|
|
``` |
|
<|im_start|>system |
|
You are a helpful AI assistant.<|im_end|> |
|
<|im_start|>user |
|
{prompt}<|im_end|> |
|
<|im_start|>assistant |
|
``` |
|
|
|
*I trained Mistral-Nemo-NT-Ko-12B with various system prompt from dozens of dataset. You can chat with/without your system prompt.* |
|
|
|
|
|
# Dataset |
|
|
|
[werty1248/multilingual-instruct-balanced](https://huggingface.co/datasets/werty1248/multilingual-instruct-balanced) |
|
|
|
# Training Details |
|
|
|
- GPU: 8xA40 |
|
- epoch: 3 |
|
- total batch size: 8 |
|
- learning rate: 7e-6 |
|
- weight decay: 0.01 |
|
|
|
[<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl) |
|
<details><summary>See axolotl config</summary> |
|
|
|
axolotl version: `0.4.1` |
|
```yaml |
|
base_model: mistralai/Mistral-Nemo-Base-2407 |
|
model_type: MistralForCausalLM |
|
tokenizer_config: nothingiisreal/MN-12B-Celeste-V1.9 ##axolotl-ai-co/Mistral-Nemo-Base-2407-chatml makes error, why? |
|
tokenizer_type: AutoTokenizer |
|
|
|
load_in_8bit: false |
|
load_in_4bit: false |
|
strict: false |
|
|
|
chat_template: chatml |
|
datasets: |
|
- path: werty1248/multilingual-instruct-balanced |
|
type: sharegpt |
|
chat_template: chatml |
|
|
|
dataset_prepared_path: ./data_preparation |
|
output_dir: /workspace/data |
|
|
|
hf_use_auth_token: true |
|
|
|
sequence_len: 8192 |
|
sample_packing: true |
|
pad_to_sequence_len: true |
|
|
|
wandb_project: |
|
#wandb_entity: |
|
#wandb_watch: |
|
wandb_name: |
|
#wandb_log_model: |
|
|
|
gradient_accumulation_steps: 1 ## total_batch = 8 |
|
micro_batch_size: 1 |
|
num_epochs: 3 |
|
optimizer: paged_adamw_32bit |
|
lr_scheduler: cosine |
|
learning_rate: 0.000007 |
|
|
|
train_on_inputs: false |
|
group_by_length: false |
|
bf16: auto |
|
fp16: |
|
tf32: false |
|
|
|
gradient_checkpointing: true |
|
early_stopping_patience: |
|
resume_from_checkpoint: |
|
local_rank: |
|
logging_steps: 1 |
|
xformers_attention: |
|
flash_attention: true |
|
|
|
warmup_steps: 1000 |
|
evals_per_epoch: 1 |
|
eval_table_size: |
|
save_steps: 1000 |
|
debug: |
|
deepspeed: deepspeed_configs/zero3_bf16.json |
|
weight_decay: 0.01 |
|
special_tokens: |
|
pad_token: <pad> |
|
``` |
|
|
|
</details><br> |
|
|
|
|
|
- Training loss |
|
|
|
![image/png](https://cdn-uploads.huggingface.co/production/uploads/6629154d55d7c289634b8c5d/Xcat10ejYX1nU4cH94vJF.png) |