--- license: apache-2.0 datasets: - kunishou/oasst1-89k-ja - kunishou/databricks-dolly-15k-ja language: - ja --- # How to use We write our prompts in the ChatML format. ### With vLLM (recommended for much faster inference)
Install vLLM [Reference](https://vllm.readthedocs.io/en/latest/getting_started/installation.html) ```bash pip install vllm ```
```python from vllm import LLM, SamplingParams model_name = "lightblue/jod" llm = LLM(model=model_name) SYSTEM_MESSAGE = "You are a helpful assistant." def process_chat_history(next_user_msg, text_chat_history = []): prompt_text = "<|im_start|>system\n" prompt_text += SYSTEM_MESSAGE prompt_text += "<|im_end|>\n\n" for user_msg, ai_msg in text_chat_history: prompt_text += "<|im_start|>user\n" prompt_text += user_msg prompt_text += "<|im_end|>\n\n" prompt_text += "<|im_start|>assistant\n" prompt_text += ai_msg prompt_text += "<|im_end|>\n\n" prompt_text += "<|im_start|>user\n" prompt_text += next_user_msg prompt_text += "<|im_end|>\n\n" prompt_text += "<|im_start|>assistant\n" return prompt_text user_prompt = "日本の一番高い山は?" prompt = process_chat_history(user_prompt) sampling_params = SamplingParams(temperature=0, max_tokens=528) outputs = llm.generate(prompt, sampling_params) bot_message = outputs[0].outputs[0].text.strip() print(bot_message) ``` ### With Huggingface ```python from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline model_name = "lightblue/jod" tokenizer = AutoTokenizer.from_pretrained(model_dir) model = AutoModelForCausalLM.from_pretrained( model_dir, torch_dtype=torch.bfloat16, device_map='auto', load_in_4bit=True, ) pipe = pipeline("text-generation", model=model, tokenizer=tokenizer) SYSTEM_MESSAGE = "You are a helpful assistant." def process_chat_history(next_user_msg, text_chat_history = []): prompt_text = "<|im_start|>system\n" prompt_text += SYSTEM_MESSAGE prompt_text += "<|im_end|>\n\n" for user_msg, ai_msg in text_chat_history: prompt_text += "<|im_start|>user\n" prompt_text += user_msg prompt_text += "<|im_end|>\n\n" prompt_text += "<|im_start|>assistant\n" prompt_text += ai_msg prompt_text += "<|im_end|>\n\n" prompt_text += "<|im_start|>user\n" prompt_text += next_user_msg prompt_text += "<|im_end|>\n\n" prompt_text += "<|im_start|>assistant\n" return prompt_text user_prompt = "日本の一番高い山は?" prompt = process_chat_history(user_prompt) bot_message = pipe(do_closed_qa(test_article, question), max_new_tokens=128, temperature=0)[0]["generated_text"] print(bot_message) ``` # Training details We trained on the following 3 datasets: * (J) - [JASTER](https://github.com/llm-jp/llm-jp-eval) * (O) - [kunishou/oasst1-89k-ja](https://huggingface.co/datasets/kunishou/oasst1-89k-ja/) * (D) - [kunishou/databricks-dolly-15k-ja](https://huggingface.co/datasets/kunishou/databricks-dolly-15k-ja/) using the ([Open-Orca/Mistral-7B-SlimOrca](https://huggingface.co/Open-Orca/Mistral-7B-SlimOrca)) model as our base checkpoint. This model was trained using the ChatML format, so it should be used for inference using the ChatML chatbot format. We chose this format as the base model ([Open-Orca/Mistral-7B-SlimOrca](https://huggingface.co/Open-Orca/Mistral-7B-SlimOrca)) was trained with this format, and we find the chatbot format more compelling for practical use compared to the Alpaca style instruction format. We trained for 1 epoch using the following Axolotl config. (Early stopping was not performed during our training.)
Axolotl config .yaml ```yaml base_model: Open-Orca/Mistral-7B-SlimOrca base_model_config: Open-Orca/Mistral-7B-SlimOrca model_type: MistralForCausalLM tokenizer_type: LlamaTokenizer is_mistral_derived_model: true load_in_8bit: false load_in_4bit: true strict: false datasets: - path: ./data/jaster_plus.jsonl ds_type: json # see other options below type: sharegpt conversation: chatml dataset_prepared_path: false val_set_size: 0.002 output_dir: ./train_output/openorca-mistral-jaster-1epoch use_wandb: true wandb_project: \ wandb_entity: \ debug: adapter: qlora lora_model_dir: sequence_len: 4096 sample_packing: true pad_to_sequence_len: true lora_r: 32 lora_alpha: 16 lora_dropout: 0.05 lora_target_linear: true lora_fan_in_fan_out: lora_target_modules: - gate_proj - down_proj - up_proj - q_proj - v_proj - k_proj - o_proj gradient_accumulation_steps: 1 micro_batch_size: 10 eval_batch_size: 4 num_epochs: 1 optimizer: adamw_bnb_8bit lr_scheduler: cosine learning_rate: 0.0002 train_on_inputs: false group_by_length: false bf16: true fp16: false tf32: false gradient_checkpointing: true early_stopping_patience: 10 resume_from_checkpoint: local_rank: logging_steps: 1 xformers_attention: flash_attention: true warmup_steps: 10 eval_steps: 10 eval_table_size: 5 eval_table_max_new_tokens: 128 save_steps: 10 debug: deepspeed: weight_decay: 0.0 fsdp: fsdp_config: special_tokens: bos_token: "" eos_token: "" unk_token: "" ```