werty1248
/

Mistral-Nemo-NT-Ko-12B-sft

Safetensors

mistral

Model card Files Files and versions Community

werty1248 commited on Sep 19, 2024

Commit

648f930

verified ·

1 Parent(s): 0b7c4d8

Update README.md

Browse files

Files changed (1) hide show

README.md +150 -158

README.md CHANGED Viewed

@@ -1,199 +1,191 @@
 ---
-library_name: transformers
-tags: []
 ---
-# Model Card for Model ID
-<!-- Provide a quick summary of what the model is/does. -->
-## Model Details
-### Model Description
-<!-- Provide a longer summary of what this model is. -->
-This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
-- **Developed by:** [More Information Needed]
-- **Funded by [optional]:** [More Information Needed]
-- **Shared by [optional]:** [More Information Needed]
-- **Model type:** [More Information Needed]
-- **Language(s) (NLP):** [More Information Needed]
-- **License:** [More Information Needed]
-- **Finetuned from model [optional]:** [More Information Needed]
-### Model Sources [optional]
-<!-- Provide the basic links for the model. -->
-- **Repository:** [More Information Needed]
-- **Paper [optional]:** [More Information Needed]
-- **Demo [optional]:** [More Information Needed]
-## Uses
-<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
-### Direct Use
-<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
-[More Information Needed]
-### Downstream Use [optional]
-<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
-[More Information Needed]
-### Out-of-Scope Use
-<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
-[More Information Needed]
-## Bias, Risks, and Limitations
-<!-- This section is meant to convey both technical and sociotechnical limitations. -->
-[More Information Needed]
-### Recommendations
-<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
-Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
-## How to Get Started with the Model
-Use the code below to get started with the model.
-[More Information Needed]
-## Training Details
-### Training Data
-<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
-[More Information Needed]
-### Training Procedure
-<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
-#### Preprocessing [optional]
-[More Information Needed]
-#### Training Hyperparameters
-- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
-#### Speeds, Sizes, Times [optional]
-<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
-[More Information Needed]
-## Evaluation
-<!-- This section describes the evaluation protocols and provides the results. -->
-### Testing Data, Factors & Metrics
-#### Testing Data
-<!-- This should link to a Dataset Card if possible. -->
-[More Information Needed]
-#### Factors
-<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
-[More Information Needed]
-#### Metrics
-<!-- These are the evaluation metrics being used, ideally with a description of why. -->
-[More Information Needed]
-### Results
-[More Information Needed]
-#### Summary
-## Model Examination [optional]
-<!-- Relevant interpretability work for the model goes here -->
-[More Information Needed]
-## Environmental Impact
-<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
-Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
-- **Hardware Type:** [More Information Needed]
-- **Hours used:** [More Information Needed]
-- **Cloud Provider:** [More Information Needed]
-- **Compute Region:** [More Information Needed]
-- **Carbon Emitted:** [More Information Needed]
-## Technical Specifications [optional]
-### Model Architecture and Objective
-[More Information Needed]
-### Compute Infrastructure
-[More Information Needed]
-#### Hardware
-[More Information Needed]
-#### Software
-[More Information Needed]
-## Citation [optional]
-<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
-**BibTeX:**
-[More Information Needed]
-**APA:**
-[More Information Needed]
-## Glossary [optional]
-<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
-[More Information Needed]
-## More Information [optional]
-[More Information Needed]
-## Model Card Authors [optional]
-[More Information Needed]
-## Model Card Contact
-[More Information Needed]

 ---
+license: apache-2.0
+base_model:
+- mistralai/Mistral-Nemo-Base-2407
+language:
+- en
+- ko
+- ja
+- zh
+datasets:
+- 4DR1455/finance_questions
+- Aratako/Synthetic-JP-Conversations-Magpie-Nemotron-4-10k
+- Aratako/Synthetic-JP-EN-Coding-Dataset-Magpie-69k
+- Aratako/Synthetic-Japanese-Roleplay-NSFW-Claude-3.5s-10.5k-formatted
+- BCCard/BCCard-Finance-Kor-QnA
+- CarrotAI/ko-code-alpaca-QA
+- ChuGyouk/AI_healthcare_QA_samples_Sonnet3.5
+- DavidLanz/medical_instruction
+- Dusker/lawyer-llama
+- Gryphe/Sonnet3.5-Charcard-Roleplay
+- HAERAE-HUB/qarv-instruct-ko
+- HachiML/alpaca_jp_math
+- Magpie-Align/Magpie-Llama-3.1-Pro-MT-300K-v0.1
+- Magpie-Align/Magpie-Qwen2-Pro-200K-Chinese
+- beomi/KoAlpaca-v1.1a
+- codefuse-ai/Evol-instruction-66k
+- frankminors123/belle-math-zh
+- gbharti/wealth-alpaca_lora
+- iam-ajaymeena/Self-Instruct-Japanese-Elzya-13B
+- jihye-moon/LawQA-Ko
+- jondurbin/gutenberg-dpo-v0.1
+- junyeong-nero/kin_med_100K_edited
+- kyujinpy/KOR-OpenOrca-Platypus-v3
+- lavita/medical-qa-datasets
+- microsoft/orca-math-word-problems-200k
+- neural-bridge/rag-dataset-12000
+- p1atdev/ichikara-instruction
+- qiaojin/PubMedQA
+- shibing624/roleplay-zh-sharegpt-gpt4-data
+- team-hatakeyama-phase2/AutoMultiTurnByCalm3-22B-Corrected-reformatted
+- ymoslem/Law-StackExchange
+- zzunyang/LawQA_LawSee
 ---
+# Mistral-Nemo-NT-Ko-12B-sft
+## Description
+**Mistral-Nemo-NT-Ko-12B-sft** is an instruction-tuned version of [*mistralai/Mistral-Nemo-Base-2407*](https://huggingface.co/mistralai/Mistral-Nemo-Base-2407), fine-tuned across four languages: English, Korean, Chinese, and Japanese.
+The primary goals of this model are **language alignment** and **ChatML formatting**. This is an intermediate version since preference optimization has not yet been applied.
+## Features
+- The base model supports a context length of 128K, while I fine-tuned this model with an 8K context size.
+- The model follows to the input language unless the user explicitly specifies an output language (If the language is set by a system role, it may be ignored).
+- Answer length tends to vary by language: English responses are generally longer than average, while Korean responses tend to be shorter. The behavior for Japanese and Chinese is still under observation.
+- Recommended temperature settings: 0.3 to 0.7.
+# Evaluation
+## LogicKor
+| 모델 | 방법 | 추론 | 수학 | 글쓰기 | 코딩 | 이해 | 문법 | 싱글턴 | 멀티턴 | 총점 |
+| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
+|Mistral-Nemo-NT-Ko-12B-sft| cot-1-shot | 6.57 | 7.36 | 8.57 | 8.71 | 9.57 | 6.43 | 7.81 | 7.93 |**7.87**|
+|Mistral-Nemo-NT-Ko-12B-sft| 1-shot | 8.29 | 5.71 | 7.93 | 9.00 | 7.93 | 5.21 | 7.29 | 7.40 |7.35|
+| Mistral Nemo | 1-shot | 5.00, | 6.50 | 6.86 | 8.07 | 7.64 | 8.43 | 7.60 | 6.57 |7.08|
+|Mistral-Nemo-NT-Ko-12B-sft| default | 4.93 | 6.00 | 7.14 | 5.43 | 9.71 | 4.00 | 6.45 | 5.95 |6.20|
+| Mistral Nemo | cot-1-shot | 5.43, | 6.86 | 6.07 | 7.57 | 5.86 | 7.57 | 7.50 | 5.62 |6.56|
+| Mistral Nemo | default | 0.43, | 7.64 | 6.21 | 7.14 | 6.79 | 7.21 | 6.26 | 5.55 |5.90|
+## MT-Bench
+| Model | First | Second | Average |
+| --- | --- | --- | --- |
+|Mistral-Nemo-NT-Ko-12B-sft| 8.39 | 7.99 | 8.19 |
+\* ```judge-model: GPT-4```
+## Language-Confusion(Korean Only)
+| Model | Monolingual-LPR | Monolingual-WPR | Crosslingual-LPR | Crosslingual-WPR |
+| --- | --- | --- | --- | --- |
+|Mistral-Nemo-NT-Ko-12B-sft| 100.00% | 99.00% | 87.51% | 96.96% |
+|Mistral-Nemo-Instruct-2407 | 90.72% | 93.18% | 46.75% | 92.84% |
+|Meta-Llama-3.1-8B-Instruct | 99.00% | 96.97% | 91.45% | 93.01% |
+|gemma-2-9b-it | 100.00% | 98.00% | 87.93% | 95.58% |
+example:
+```
+<|im_start|>system
+You are a helpful AI assistant.<|im_end|>
+<|im_start|>user
+{prompt}<|im_end|>
+<|im_start|>assistant
+```
+*I trained Mistral-Nemo-NineTail with various system prompt from dozens of dataset. You can chat with/without your system prompt.*
+# Dataset
+[werty1248/multilingual-instruct-balanced](https://huggingface.co/datasets/werty1248/multilingual-instruct-balanced)
+# Training Details
+- GPU: 8xA40
+- epoch: 3
+- total batch size: 8
+- learning rate: 7e-6
+- weight decay: 0.01
+[<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
+<details><summary>See axolotl config</summary>
+axolotl version: `0.4.1`
+```yaml
+base_model: mistralai/Mistral-Nemo-Base-2407
+model_type: MistralForCausalLM
+tokenizer_config: nothingiisreal/MN-12B-Celeste-V1.9 ##axolotl-ai-co/Mistral-Nemo-Base-2407-chatml makes error, why?
+tokenizer_type: AutoTokenizer
+load_in_8bit: false
+load_in_4bit: false
+strict: false
+chat_template: chatml
+datasets:
+  - path: werty1248/multilingual-instruct-balanced
+    type: sharegpt
+    chat_template: chatml
+dataset_prepared_path: ./data_preparation
+output_dir: /workspace/data
+hf_use_auth_token: true
+sequence_len: 8192
+sample_packing: true
+pad_to_sequence_len: true
+wandb_project: mistral-nine-tail
+#wandb_entity:
+#wandb_watch:
+wandb_name: 8xA40-dsz3bf16_padw
+#wandb_log_model:
+gradient_accumulation_steps: 1 ## total_batch = 8
+micro_batch_size: 1
+num_epochs: 3
+optimizer: paged_adamw_32bit
+lr_scheduler: cosine
+learning_rate: 0.000007
+train_on_inputs: false
+group_by_length: false
+bf16: auto
+fp16:
+tf32: false
+gradient_checkpointing: true
+early_stopping_patience:
+resume_from_checkpoint:
+local_rank:
+logging_steps: 1
+xformers_attention:
+flash_attention: true
+warmup_steps: 1000
+evals_per_epoch: 1
+eval_table_size:
+save_steps: 1000
+debug:
+deepspeed: deepspeed_configs/zero3_bf16.json
+weight_decay: 0.01
+special_tokens:
+  pad_token: <pad>
+```
+</details><br>
+- Training loss
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/6629154d55d7c289634b8c5d/Xcat10ejYX1nU4cH94vJF.png)