File size: 5,827 Bytes
3babde8 648f930 3babde8 648f930 3babde8 648f930 3babde8 648f930 3babde8 3d60d52 3babde8 648f930 3babde8 648f930 3babde8 648f930 3babde8 648f930 3babde8 648f930 3babde8 648f930 3babde8 648f930 3babde8 648f930 da6bc35 648f930 da6bc35 648f930 3babde8 648f930 3babde8 648f930 3babde8 648f930 3babde8 648f930 3babde8 648f930 3babde8 648f930 3babde8 2045681 3babde8 648f930 3babde8 648f930 3babde8 648f930 3babde8 648f930 3babde8 648f930 3babde8 648f930 3babde8 648f930 3babde8 648f930 3babde8 648f930 3babde8 648f930 3babde8 648f930 3babde8 e2f2e61 648f930 e2f2e61 648f930 3babde8 648f930 3babde8 648f930 3babde8 648f930 3babde8 648f930 3babde8 648f930 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 |
---
license: apache-2.0
base_model:
- mistralai/Mistral-Nemo-Base-2407
language:
- en
- ko
- ja
- zh
datasets:
- 4DR1455/finance_questions
- Aratako/Synthetic-JP-Conversations-Magpie-Nemotron-4-10k
- Aratako/Synthetic-JP-EN-Coding-Dataset-Magpie-69k
- Aratako/Synthetic-Japanese-Roleplay-NSFW-Claude-3.5s-10.5k-formatted
- BCCard/BCCard-Finance-Kor-QnA
- CarrotAI/ko-code-alpaca-QA
- ChuGyouk/AI_healthcare_QA_samples_Sonnet3.5
- DavidLanz/medical_instruction
- Dusker/lawyer-llama
- Gryphe/Sonnet3.5-Charcard-Roleplay
- HAERAE-HUB/qarv-instruct-ko
- HachiML/alpaca_jp_math
- Magpie-Align/Magpie-Llama-3.1-Pro-MT-300K-v0.1
- Magpie-Align/Magpie-Qwen2-Pro-200K-Chinese
- beomi/KoAlpaca-v1.1a
- codefuse-ai/Evol-instruction-66k
- frankminors123/belle-math-zh
- gbharti/wealth-alpaca_lora
- iam-ajaymeena/Self-Instruct-Japanese-Elzya-13B
- jihye-moon/LawQA-Ko
- jondurbin/gutenberg-dpo-v0.1
- junyeong-nero/kin_med_100K_edited
- kyujinpy/KOR-OpenOrca-Platypus-v3
- lavita/medical-qa-datasets
- microsoft/orca-math-word-problems-200k
- neural-bridge/rag-dataset-12000
- p1atdev/ichikara-instruction
- qiaojin/PubMedQA
- shibing624/roleplay-zh-sharegpt-gpt4-data
- team-hatakeyama-phase2/AutoMultiTurnByCalm3-22B-Corrected-reformatted
- ymoslem/Law-StackExchange
- zzunyang/LawQA_LawSee
---
# Mistral-Nemo-NT-Ko-12B-sft
## Description
**Mistral-Nemo-NT-Ko-12B-sft** is an instruction-tuned version of [*mistralai/Mistral-Nemo-Base-2407*](https://huggingface.co/mistralai/Mistral-Nemo-Base-2407), fine-tuned across four languages: English, Korean, Chinese, and Japanese.
The primary goals of this model are **language alignment**, **cross-lingual knowledge transfer** and **ChatML formatting**. This is an intermediate version since preference optimization has not yet been applied.
## Features
- The base model supports a context length of 128K, while I fine-tuned this model with an 8K context size.
- The model follows to the input language unless the user explicitly specifies an output language (If the language is set by a system role, it may be ignored).
- Answer length tends to vary by language: English responses are generally longer than average, while Korean responses tend to be shorter. The behavior for Japanese and Chinese is still under observation.
- Recommended temperature settings: 0.3 to 0.7.
# Evaluation
## LogicKor
| 모델 | 방법 | 추론 | 수학 | 글쓰기 | 코딩 | 이해 | 문법 | 싱글턴 | 멀티턴 | 총점 |
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
|Mistral-Nemo-NT-Ko-12B-sft| cot-1-shot |7.36 | 6.57 | 8.71 | 8.57 | 9.57 | 6.43 | 7.81 | 7.93 | **7.87** |
|Mistral-Nemo-NT-Ko-12B-sft| 1-shot | 9.00 | 5.71 | 7.93 | 8.29 | 7.93 | 5.21 | 7.29 | 7.40 | 7.35 |
| Mistral Nemo | 1-shot | 5.00, | 6.50 | 6.86 | 8.07 | 7.64 | 8.43 | 7.60 | 6.57 |7.08|
| Mistral Nemo | cot-1-shot | 5.43, | 6.86 | 6.07 | 7.57 | 5.86 | 7.57 | 7.50 | 5.62 |6.56|
|Mistral-Nemo-NT-Ko-12B-sft| default | 6.00 | 4.93 | 5.43 | 7.14 | 9.71 | 4.00 | 6.45 | 5.95 | 6.20 |
| Mistral Nemo | default | 0.43, | 7.64 | 6.21 | 7.14 | 6.79 | 7.21 | 6.26 | 5.55 |5.90|
## MT-Bench
| Model | First | Second | Average |
| --- | --- | --- | --- |
|Mistral-Nemo-NT-Ko-12B-sft| 8.39 | 7.99 | 8.19 |
\* ```judge-model: GPT-4```
## Language-Confusion(Korean Only)
| Model | Monolingual-LPR | Monolingual-WPR | Crosslingual-LPR | Crosslingual-WPR |
| --- | --- | --- | --- | --- |
|Mistral-Nemo-NT-Ko-12B-sft| 100.00% | 99.00% | 87.51% | 96.96% |
|Mistral-Nemo-Instruct-2407 | 90.72% | 93.18% | 46.75% | 92.84% |
|Meta-Llama-3.1-8B-Instruct | 99.00% | 96.97% | 91.45% | 93.01% |
|gemma-2-9b-it | 100.00% | 98.00% | 87.93% | 95.58% |
example:
```
<|im_start|>system
You are a helpful AI assistant.<|im_end|>
<|im_start|>user
{prompt}<|im_end|>
<|im_start|>assistant
```
*I trained Mistral-Nemo-NT-Ko-12B with various system prompt from dozens of dataset. You can chat with/without your system prompt.*
# Dataset
[werty1248/multilingual-instruct-balanced](https://huggingface.co/datasets/werty1248/multilingual-instruct-balanced)
# Training Details
- GPU: 8xA40
- epoch: 3
- total batch size: 8
- learning rate: 7e-6
- weight decay: 0.01
[<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
<details><summary>See axolotl config</summary>
axolotl version: `0.4.1`
```yaml
base_model: mistralai/Mistral-Nemo-Base-2407
model_type: MistralForCausalLM
tokenizer_config: nothingiisreal/MN-12B-Celeste-V1.9 ##axolotl-ai-co/Mistral-Nemo-Base-2407-chatml makes error, why?
tokenizer_type: AutoTokenizer
load_in_8bit: false
load_in_4bit: false
strict: false
chat_template: chatml
datasets:
- path: werty1248/multilingual-instruct-balanced
type: sharegpt
chat_template: chatml
dataset_prepared_path: ./data_preparation
output_dir: /workspace/data
hf_use_auth_token: true
sequence_len: 8192
sample_packing: true
pad_to_sequence_len: true
wandb_project:
#wandb_entity:
#wandb_watch:
wandb_name:
#wandb_log_model:
gradient_accumulation_steps: 1 ## total_batch = 8
micro_batch_size: 1
num_epochs: 3
optimizer: paged_adamw_32bit
lr_scheduler: cosine
learning_rate: 0.000007
train_on_inputs: false
group_by_length: false
bf16: auto
fp16:
tf32: false
gradient_checkpointing: true
early_stopping_patience:
resume_from_checkpoint:
local_rank:
logging_steps: 1
xformers_attention:
flash_attention: true
warmup_steps: 1000
evals_per_epoch: 1
eval_table_size:
save_steps: 1000
debug:
deepspeed: deepspeed_configs/zero3_bf16.json
weight_decay: 0.01
special_tokens:
pad_token: <pad>
```
</details><br>
- Training loss
![image/png](https://cdn-uploads.huggingface.co/production/uploads/6629154d55d7c289634b8c5d/Xcat10ejYX1nU4cH94vJF.png) |