Orion-MoE8x7B / README.md
huangyi
readme: Update inference code
f82faaa
|
raw
history blame
13.8 kB
metadata
language:
  - en
  - zh
  - ja
  - ko
metrics:
  - accuracy
pipeline_tag: text-generation
tags:
  - code
  - model
  - llm
logo

Orion-MOE8x7B

🌐English | 🇨🇳中文

Table of Contents


1. Model Introduction

  • Orion-MOE8x7B-Base Large Language Model(LLM) is a pretrained generative Sparse Mixture of Experts, trained from scratch by OrionStarAI. The base model is trained on multilingual corpus, including Chinese, English, Japanese, Korean, etc, and it exhibits superior performance in these languages.

  • The Orion-MOE8x7B series models exhibit the following features:

    • The model demonstrates excellent performance in comprehensive evaluations compared to other base models of the same parameter scale.
    • It has strong multilingual capabilities, significantly leading in Japanese and Korean test sets, and also performing comprehensively better in Arabic, German, French, and Spanish test sets.


2. Model Download

Model release and download links are provided in the table below:

Model Name HuggingFace Download Links ModelScope Download Links
⚾Orion-MOE8x7B-Base Orion-MOE8x7B-Base Orion-MOE8x7B-Base


3. Model Benchmarks

3.1. Base Model Orion-MOE8x7B-Base Benchmarks

3.1.1. LLM evaluation results on examination and professional knowledge

Model ceval cmmlu mmlu mmlu_pro ARC_c hellaswag
Mixtral 8x7B 54.0861 53.21 70.4000 38.5000 85.0847 81.9458
Qwen1.5-32b 83.5000 82.3000 73.4000 45.2500 90.1695 81.9757
Qwen2.5-32b 87.7414 89.0088 82.9000 58.0100 94.2373 82.5134
Orion 14B 72.8000 70.5700 69.9400 33.9500 79.6600 78.5300
Orion 8x7B 89.7400 89.1555 85.9000 58.3100 91.8644 89.19
Model lambada bbh musr piqa commonsense_qa IFEval
Mixtral 8x7B 76.7902 50.87 43.21 83.41 69.62 24.15
Qwen1.5-32b 73.7434 57.2800 42.6500 82.1500 74.6900 32.9700
Qwen2.5-32b 75.3736 67.6900 49.7800 80.0500 72.9700 41.5900
Orion 14B 78.8300 50.3500 43.6100 79.5400 66.9100 29.0800
Orion 8x7B 79.7399 55.82 49.93 87.32 73.05 30.06
Model GQPA human-eval MBPP math_lv5 gsm8k math
Mixtral 8x7B 30.9000 33.5366 60.7000 9.0000 47.5000 28.4000
Qwen1.5-32b 33.4900 35.9756 49.4000 25.0000 77.4000 36.1000
Qwen2.5-32b 49.5000 46.9512 71.0000 31.7200 80.3630 48.8800
Orion 14B 28.5300 20.1200 30.0000 2.5400 52.0100 7.8400
Orion 8x7B 52.1700 44.5122 43.4 5.07 59.8200 23.6800

3.1.2. Comparison of LLM performances on Japanese testsets

Model jsquad jcommonsenseqa jnli marc_ja jaqket_v2 paws_ja avg
Mixtral-8x7B 0.8900 0.7873 0.3213 0.9544 0.7886 44.5000 8.0403
Qwen1.5-32B 0.8986 0.8454 0.5099 0.9708 0.8214 0.4380 0.7474
Qwen2.5-32B 0.8909 0.9383 0.7214 0.9786 0.8927 0.4215 0.8073
Orion-14B-Base 0.7422 0.8820 0.7285 0.9406 0.6620 0.4990 0.7424
Orion 8x7B 0.9177 0.9043 0.9046 0.9640 0.8119 0.4735 0.8293

3.1.3. Comparison of LLM performances on Korean testsets

Model haerae kobest boolq kobest copa kobest hellaswag kobest sentineg kobest wic paws_ko avg
Mixtral-8x7B 53.16 78.56 66.2 56.6 77.08 49.37 44.05 60.71714286
Qwen1.5-32B 46.38 76.28 60.4 53 78.34 52.14 43.4 58.56285714
Qwen2.5-32B 70.67 80.27 76.7 61.2 96.47 77.22 37.05 71.36857143
Orion-14B-Base 69.66 80.63 77.1 58.2 92.44 51.19 44.55 67.68142857
Orion 8x7B 65.17 85.4 80.4 56 96.98 73.57 46.35 71.98142857

3.1.4. Comparison of LLM performances on Arabic, German, French, and Spanish testsets

Lang ar de fr es
model hellaswag arc hellaswag arc hellaswag arc hellaswag arc
Mixtral-8x7B 47.93 36.27 69.17 52.35 73.9 55.86 74.25 54.79
Qwen1.5-32B 50.07 39.95 63.77 50.81 68.86 55.95 70.5 55.13
Qwen2.5-32B 59.76 52.87 69.82 61.76 74.15 62.7 75.04 65.3
Orion-14B-Base 42.26 33.88 54.65 38.92 60.21 42.34 62 44.62
Orion 8x7B 69.39 54.32 80.6 63.47 85.56 68.78 87.41 70.09

3.1.5. Leakage Detection Benchmark

The proportion of leakage data(from various evaluation benchmarks) in the pre-trained corpus; the higher the proportion, the more leakage it indicates.

Threshold 0.2 qwen2.5 32b qwen1.5 32b orion 8x7b orion 14b mixtral 8x7b
mmlu 0.3 0.27 0.22 0.28 0.25
ceval 0.39 0.38 0.27 0.26 0.26
cmmlu 0.38 0.39 0.23 0.27 0.22

3.1.6. Inference speed

Based on 8x Nvidia RTX3090, in unit of tokens per second.

OrionLLM_V2.4.6.1 1para_out62 1para_out85 1para_out125 1para_out210
OrionMOE 33.03544296 33.43113606 33.53014102 33.58693529
Qwen32B 26.46267188 26.72846906 26.80413838 27.03123611
Orion14B 41.69121312 41.77423491 41.76050902 42.26096669
OrionLLM_V2.4.6.1 4para_out62 4para_out90 4para_out125 4para_out220
OrionMOE 29.45015743 30.4472947 31.03748516 31.45783599
Qwen32B 23.60912215 24.30431956 24.86132023 25.16827535
Orion14B 38.08240373 38.8572788 39.50040645 40.44875947
OrionLLM_V2.4.6.1 8para_out62 8para_out85 8para_out125 8para_out220
OrionMOE 25.71006327 27.13446743 28.89463226 29.70440167
Qwen32B 21.15920951 21.92001035 23.13867947 23.5649106
Orion14B 34.4151923 36.05635893 37.0874908 37.91705944
inf_speed


4. Model Inference

Model weights, source code, and configuration needed for inference are published on Hugging Face, and the download link is available in the table at the beginning of this document. We demonstrate various inference methods here, and the program will automatically download the necessary resources from Hugging Face.

4.1. Python Code

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from transformers.generation.utils import GenerationConfig

tokenizer = AutoTokenizer.from_pretrained("OrionStarAI/Orion-MOE8x7B-Base",
                                          use_fast=False,
                                          trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("OrionStarAI/Orion-MOE8x7B-Base",
                                             device_map="auto",
                                             torch_dtype=torch.bfloat16,
                                             trust_remote_code=True)

model.generation_config = GenerationConfig.from_pretrained("OrionStarAI/Orion-MOE8x7B-Base")
messages = [{"role": "user", "content": "Hello, what is your name? "}]
response = model.chat(tokenizer, messages, streaming=False)
print(response)

In the above Python code, the model is loaded with device_map='auto' to utilize all available GPUs. To specify the device, you can use something like export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 (using GPUs 0,1,2,3,4,5,6,7).

4.2. Direct Script Inference


# base model
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python demo/text_generation_base.py --model OrionStarAI/Orion-MOE8x7B-Base --tokenizer OrionStarAI/Orion-MOE8x7B-Base --prompt hello


5. Declarations, License

5.1. Declarations

We strongly urge all users not to use the Orion-MOE8x7B model for any activities that may harm national or social security or violate the law. Additionally, we request users not to use the Orion-MOE8x7B model for internet services without proper security review and filing. We hope all users abide by this principle to ensure that technological development takes place in a regulated and legal environment. We have done our best to ensure the compliance of the data used in the model training process. However, despite our significant efforts, unforeseen issues may still arise due to the complexity of the model and data. Therefore, if any problems arise due to the use of the Orion-MOE8x7B open-source model, including but not limited to data security issues, public opinion risks, or any risks and issues arising from the model being misled, abused, disseminated, or improperly utilized, we will not assume any responsibility.

5.2. License

Community use of the Orion-MOE8x7B series models


6. Company Introduction

OrionStar is a leading global service robot solutions company, founded in September 2016. OrionStar is dedicated to using artificial intelligence technology to create the next generation of revolutionary robots, allowing people to break free from repetitive physical labor and making human work and life more intelligent and enjoyable. Through technology, OrionStar aims to make society and the world a better place.

OrionStar possesses fully self-developed end-to-end artificial intelligence technologies, such as voice interaction and visual navigation. It integrates product development capabilities and technological application capabilities. Based on the Orion robotic arm platform, it has launched products such as OrionStar AI Robot Greeting, AI Robot Greeting Mini, Lucki, Coffee Master, and established the open platform OrionOS for Orion robots. Following the philosophy of "Born for Truly Useful Robots", OrionStar empowers more people through AI technology.

The core strengths of OrionStar lies in possessing end-to-end AI application capabilities, including big data preprocessing, large model pretraining, fine-tuning, prompt engineering, agent, etc. With comprehensive end-to-end model training capabilities, including systematic data processing workflows and the parallel model training capability of hundreds of GPUs, it has been successfully applied in various industry scenarios such as government affairs, cloud services, international e-commerce, and fast-moving consumer goods.

Companies with demands for deploying large-scale model applications are welcome to contact us.
Enquiry Hotline: 400-898-7779
E-mail: [email protected]
Discord Link: https://discord.gg/zumjDWgdAs

wechat