--- license: llama2 language: - ja tags: - moe --- # youri-2x7b_v0.2 This model is a Mixture of Experts (MoE) merger of the following two models: - [rinna/youri-7b-instruction](https://huggingface.co/rinna/youri-7b-instruction) - [rinna/youri-7b-chat](https://huggingface.co/rinna/youri-7b-chat) ## ๐Ÿ† Evaluation All scores for these benchmarks have been evaluated using the [Stability-AI/lm-evaluation-harness](https://github.com/Stability-AI/lm-evaluation-harness/tree/jp-stable). The results of the benchmark scores are stored in [benchmark_scores](https://huggingface.co/HachiML/youri-2x7b_v0.2/tree/main/benchmark_scores). For detailed information on the scores and the conditions under which they were obtained, please refer to this link. | Model |JCommonsenseQA(3-shot,acc.)|JNLI(3-shot,balanced acc.)|MARC-ja(0-shot,balanced acc.)|JSQuAD(2-shot,F1)|4-AVERAGE| |----------------------------------------------------------------|------:|------:|---------:|-------:|------:| |[**youri-2x7b_v0.2**](https://huggingface.co/HachiML/youri-2x7b_v0.2)| **90.97**| **71.18**| **95.95**| **91.31**| **87.35**| |[**youri-2x7b_dev**](https://huggingface.co/HachiML/youri-2x7b_dev)| **91.15**| **71.03**| **95.90**| **91.30**| **87.34**| |[youri-7b-instruction](https://huggingface.co/rinna/youri-7b-instruction) *1| 88.83| 63.56| 93.78| 92.19| 84.59| |[youri-7b-chat](https://huggingface.co/rinna/youri-7b-chat) *1| 91.78| 70.35| 96.69| 79.62| 84.61| | Model |jaqket-v2(1-shot,F1)|xlsum(1-shot,ROUGE 2) *2|6-AVERAGE| |----------------------------------------------------------------|------:|------:|------:| |[**youri-2x7b_v0.2**](https://huggingface.co/HachiML/youri-2x7b_v0.2)| **84.86**| **25.44**| **76.61**| |[**youri-2x7b_dev**](https://huggingface.co/HachiML/youri-2x7b_dev)| **84.59**| **25.62**| **76.59**| |[youri-7b-instruction](https://huggingface.co/rinna/youri-7b-instruction) *1| 83.92| 24.67| 75.13| |[youri-7b-chat](https://huggingface.co/rinna/youri-7b-chat) *1| 83.71| 24.21| 75.33| | Model |xwinograd(0-shot,acc.) *2|mgsm(5-shot,acc.) *2|JCoLA(2-shot,balanced acc.) *2|9-AVERAGE| |----------------------------------------------------------------|------:|------:|---------:|------:| |[**youri-2x7b_v0.2**](https://huggingface.co/HachiML/youri-2x7b_v0.2)| **81.43**| **23.20**| **58.41**| **69.19**| |[**youri-2x7b_dev**](https://huggingface.co/HachiML/youri-2x7b_dev)| **81.43**| **24.80**| **59.09**| **69.43**| |[youri-7b-instruction](https://huggingface.co/rinna/youri-7b-instruction) *1| 78.94 | 17.20| 54.04| 66.35| |[youri-7b-chat](https://huggingface.co/rinna/youri-7b-chat) *1| 80.92| 25.20| 53.78| 67.36| *1 From the [rinna's LM Benchmark](https://rinnakk.github.io/research/benchmarks/lm/index.html). *2 Since there was no mention of these template versions in rinna's LM Benchmark, the scores were calculated without specifying a template. ## ๐Ÿงฉ Configuration The model has been made with a custom version of the [mergekit](https://github.com/cg123/mergekit) library (mixtral branch) and the following configuration: ```yaml base_model: rinna/youri-7b-chat gate_mode: hidden # one of "hidden", "cheap_embed", or "random" dtype: bfloat16 # output dtype (float32, float16, or bfloat16) experts: - source_model: rinna/youri-7b-chat positive_prompts: - "่ณชๅ•ใจๅ›ž็ญ”ใฎ้ธๆŠž่‚ขใ‚’ๅ…ฅๅŠ›ใจใ—ใฆๅ—ใ‘ๅ–ใ‚Šใ€้ธๆŠž่‚ขใ‹ใ‚‰ๅ›ž็ญ”ใ‚’้ธๆŠžใ—ใฆใใ ใ•ใ„ใ€‚" - "ๅ‰ๆใจไปฎ่ชฌใฎ้–ขไฟ‚ใ‚’ๅซๆ„ใ€็Ÿ›็›พใ€ไธญ็ซ‹ใฎไธญใ‹ใ‚‰ๅ›ž็ญ”ใ—ใฆใใ ใ•ใ„ใ€‚" - "ไปฅไธ‹ใฎใƒ†ใ‚ญใ‚นใƒˆใ‚’ใ€ใƒใ‚ธใƒ†ใ‚ฃใƒ–ใพใŸใฏใƒใ‚ฌใƒ†ใ‚ฃใƒ–ใฎๆ„Ÿๆƒ…ใ‚ฏใƒฉใ‚นใฎใ„ใšใ‚Œใ‹ใซๅˆ†้กžใ—ใฆใใ ใ•ใ„ใ€‚" - "ไธŽใˆใ‚‰ใ‚ŒใŸๅ•้กŒใซๅฏพใ—ใฆใ€ใ‚นใƒ†ใƒƒใƒ—ใ”ใจใซ็ญ”ใˆใ‚’ๅฐŽใๅ‡บใ—ใฆใใ ใ•ใ„ใ€‚" - source_model: rinna/youri-7b-instruction positive_prompts: - "่ณชๅ•ใซๅฏพใ™ใ‚‹ๅ›ž็ญ”ใ‚’้กŒๅใจๆ–‡็ซ ใ‹ใ‚‰ไธ€่จ€ใงๆŠฝๅ‡บใ—ใฆใใ ใ•ใ„ใ€‚ๅ›ž็ญ”ใฏๅ่ฉžใง็ญ”ใˆใฆใใ ใ•ใ„ใ€‚" - "ไธŽใˆใ‚‰ใ‚ŒใŸใƒ‹ใƒฅใƒผใ‚น่จ˜ไบ‹ใ‚’่ฆ็ด„ใ—ใฆใใ ใ•ใ„ใ€‚" - "ไธŽใˆใ‚‰ใ‚ŒใŸๆ–‡ใŒๆ–‡ๆณ•็š„ใงใ‚ใ‚‹ใ‹ใ‚’ๅ›ž็ญ”ใ—ใฆใใ ใ•ใ„ใ€‚" ``` The `positive_prompts` in the above configuration are extracted from the instructions of benchmarks that each model excels in. For reference on the benchmarks for each model, please see the LM Benchmark at [rinna's LM Benchmark](https://rinnakk.github.io/research/benchmarks/lm/index.html). These benchmarks provide a detailed overview of the areas where each individual model performs particularly well, guiding the effective use of the merged model in various natural language processing tasks. ## ๐Ÿ’ป Usage ```python !pip install -q --upgrade transformers einops accelerate bitsandbytes import torch from transformers import AutoModelForCausalLM, AutoTokenizer model_name = "HachiML/youri-2x7b_v0.2" torch.set_default_device("cuda") # Load the model and tokenizer model = AutoModelForCausalLM.from_pretrained( model_name, torch_dtype="auto", load_in_4bit=True, trust_remote_code=True ) tokenizer = AutoTokenizer.from_pretrained( model_name, trust_remote_code=True ) torch.set_default_device("cuda") # Create input instruction = "ๆฌกใฎๆ—ฅๆœฌ่ชžใ‚’่‹ฑ่ชžใซ็ฟป่จณใ—ใฆใใ ใ•ใ„ใ€‚" input = "ๅคง่ฆๆจก่จ€่ชžใƒขใƒ‡ใƒซ๏ผˆใ ใ„ใใผใ’ใ‚“ใ”ใƒขใƒ‡ใƒซใ€่‹ฑ: large language modelใ€LLM๏ผ‰ใฏใ€ๅคšๆ•ฐใฎใƒ‘ใƒฉใƒกใƒผใ‚ฟ๏ผˆๆ•ฐๅƒไธ‡ใ‹ใ‚‰ๆ•ฐๅๅ„„๏ผ‰ใ‚’ๆŒใคไบบๅทฅใƒ‹ใƒฅใƒผใƒฉใƒซใƒใƒƒใƒˆใƒฏใƒผใ‚ฏใงๆง‹ๆˆใ•ใ‚Œใ‚‹ใ‚ณใƒณใƒ”ใƒฅใƒผใ‚ฟ่จ€่ชžใƒขใƒ‡ใƒซใงใ€่†จๅคงใชใƒฉใƒ™ใƒซใชใ—ใƒ†ใ‚ญใ‚นใƒˆใ‚’ไฝฟ็”จใ—ใฆ่‡ชๅทฑๆ•™ๅธซใ‚ใ‚Šๅญฆ็ฟ’ใพใŸใฏๅŠๆ•™ๅธซใ‚ใ‚Šๅญฆ็ฟ’ใซใ‚ˆใฃใฆ่จ“็ทดใŒ่กŒใ‚ใ‚Œใ‚‹ใ€‚" prompt = f""" ไปฅไธ‹ใฏใ€ใ‚ฟใ‚นใ‚ฏใ‚’่ชฌๆ˜Žใ™ใ‚‹ๆŒ‡็คบใจใ€ๆ–‡่„ˆใฎใ‚ใ‚‹ๅ…ฅๅŠ›ใฎ็ต„ใฟๅˆใ‚ใ›ใงใ™ใ€‚่ฆๆฑ‚ใ‚’้ฉๅˆ‡ใซๆบ€ใŸใ™ๅฟœ็ญ”ใ‚’ๆ›ธใใชใ•ใ„ใ€‚ ### ๆŒ‡็คบ: {instruction} ### ๅ…ฅๅŠ›: {input} ### ๅฟœ็ญ”: """ # Tokenize the input string token_ids = tokenizer.encode(prompt, add_special_tokens=False, return_tensors="pt") # Generate text using the model with torch.no_grad(): output_ids = model.generate( token_ids.to(model.device), max_new_tokens=200, do_sample=True, temperature=0.5, pad_token_id=tokenizer.pad_token_id, bos_token_id=tokenizer.bos_token_id, eos_token_id=tokenizer.eos_token_id ) # Decode and print the output output = tokenizer.decode(output_ids.tolist()[0]) print(output) ```