metadata

language:
  - en
  - zh
  - ja
  - ko
metrics:
  - accuracy
pipeline_tag: text-generation
tags:
  - code
  - model
  - llm

Orion-MOE8x7B

🌐English | 🇨🇳中文

📖 Model Introduction
🔗 Model Download
🔖 Model Benchmark
📜 Declarations & License
🥇 Company Introduction

1. Model Introduction

Orion-MOE8x7B-Base Large Language Model(LLM) is a pretrained generative Sparse Mixture of Experts, trained from scratch by OrionStarAI. The base model is trained on multilingual corpus, including Chinese, English, Japanese, Korean, etc, and it exhibits superior performance in these languages.
The Orion-MOE8x7B series models exhibit the following features:
- The model demonstrates excellent performance in comprehensive evaluations compared to other base models of the same parameter scale.
- It has strong multilingual capabilities, significantly leading in Japanese and Korean test sets, and also performing comprehensively better in Arabic, German, French, and Spanish test sets.

2. Model Download

Model release and download links are provided in the table below:

Model Name	HuggingFace Download Links	ModelScope Download Links
⚾Orion-MOE8x7B-Base	Orion-MOE8x7B-Base	Orion-MOE8x7B-Base

3. Model Benchmarks

3.1. Base Model Orion-MOE8x7B-Base Benchmarks

3.1.1. LLM evaluation results on examination and professional knowledge

Model	ceval	cmmlu	mmlu	mmlu_pro	ARC_c	hellaswag
Mixtral 8x7B	54.0861	53.21	70.4000	38.5000	85.0847	81.9458
Qwen1.5-32b	83.5000	82.3000	73.4000	45.2500	90.1695	81.9757
Qwen2.5-32b	87.7414	89.0088	82.9000	58.0100	94.2373	82.5134
Orion 14B	72.8000	70.5700	69.9400	33.9500	79.6600	78.5300
Orion 8x7B	89.7400	89.1555	85.9000	58.3100	91.8644	89.19
Model	lambada	bbh	musr	piqa	commonsense_qa	IFEval
Mixtral 8x7B	76.7902	50.87	43.21	83.41	69.62	24.15
Qwen1.5-32b	73.7434	57.2800	42.6500	82.1500	74.6900	32.9700
Qwen2.5-32b	75.3736	67.6900	49.7800	80.0500	72.9700	41.5900
Orion 14B	78.8300	50.3500	43.6100	79.5400	66.9100	29.0800
Orion 8x7B	79.7399	55.82	49.93	87.32	73.05	30.06
Model	GQPA	human-eval	MBPP	math_lv5	gsm8k	math
Mixtral 8x7B	30.9000	33.5366	60.7000	9.0000	47.5000	28.4000
Qwen1.5-32b	33.4900	35.9756	49.4000	25.0000	77.4000	36.1000
Qwen2.5-32b	49.5000	46.9512	71.0000	31.7200	80.3630	48.8800
Orion 14B	28.5300	20.1200	30.0000	2.5400	52.0100	7.8400
Orion 8x7B	52.1700	44.5122	43.4	5.07	59.8200	23.6800

3.1.2. Comparison of LLM performances on Japanese testsets

Model	jsquad	jcommonsenseqa	jnli	marc_ja	jaqket_v2	paws_ja	avg
Mixtral-8x7B	0.8900	0.7873	0.3213	0.9544	0.7886	44.5000	8.0403
Qwen1.5-32B	0.8986	0.8454	0.5099	0.9708	0.8214	0.4380	0.7474
Qwen2.5-32B	0.8909	0.9383	0.7214	0.9786	0.8927	0.4215	0.8073
Orion-14B-Base	0.7422	0.8820	0.7285	0.9406	0.6620	0.4990	0.7424
Orion 8x7B	0.9177	0.9043	0.9046	0.9640	0.8119	0.4735	0.8293

3.1.3. Comparison of LLM performances on Korean testsets

Model	haerae	kobest boolq	kobest copa	kobest hellaswag	kobest sentineg	kobest wic	paws_ko	avg
Mixtral-8x7B	53.16	78.56	66.2	56.6	77.08	49.37	44.05	60.71714286
Qwen1.5-32B	46.38	76.28	60.4	53	78.34	52.14	43.4	58.56285714
Qwen2.5-32B	70.67	80.27	76.7	61.2	96.47	77.22	37.05	71.36857143
Orion-14B-Base	69.66	80.63	77.1	58.2	92.44	51.19	44.55	67.68142857
Orion 8x7B	65.17	85.4	80.4	56	96.98	73.57	46.35	71.98142857

3.1.4. Comparison of LLM performances on Arabic, German, French, and Spanish testsets

Lang	ar		de		fr		es
model	hellaswag	arc	hellaswag	arc	hellaswag	arc	hellaswag	arc
Mixtral-8x7B	47.93	36.27	69.17	52.35	73.9	55.86	74.25	54.79
Qwen1.5-32B	50.07	39.95	63.77	50.81	68.86	55.95	70.5	55.13
Qwen2.5-32B	59.76	52.87	69.82	61.76	74.15	62.7	75.04	65.3
Orion-14B-Base	42.26	33.88	54.65	38.92	60.21	42.34	62	44.62
Orion 8x7B	69.39	54.32	80.6	63.47	85.56	68.78	87.41	70.09

3.1.5. Leakage Detection Benchmark

The proportion of leakage data(from various evaluation benchmarks) in the pre-trained corpus; the higher the proportion, the more leakage it indicates.

Code: https://github.com/nishiwen1214/Benchmark-leakage-detection
Paper: https://web3.arxiv.org/pdf/2409.01790
Blog: https://mp.weixin.qq.com/s/BtcJmDEUyzAYG-fqCal2lA
English Test: mmlu
Chinese Test: ceval, cmmlu

Threshold 0.2	qwen2.5 32b	qwen1.5 32b	orion 8x7b	orion 14b	mixtral 8x7b
mmlu	0.3	0.27	0.22	0.28	0.25
ceval	0.39	0.38	0.27	0.26	0.26
cmmlu	0.38	0.39	0.23	0.27	0.22

3.1.6. Inference speed

Based on 8x Nvidia RTX3090， in unit of tokens per second.

OrionLLM_V2.4.6.1	1para_out62	1para_out85	1para_out125	1para_out210
OrionMOE	33.03544296	33.43113606	33.53014102	33.58693529
Qwen32B	26.46267188	26.72846906	26.80413838	27.03123611
Orion14B	41.69121312	41.77423491	41.76050902	42.26096669

OrionLLM_V2.4.6.1	4para_out62	4para_out90	4para_out125	4para_out220
OrionMOE	29.45015743	30.4472947	31.03748516	31.45783599
Qwen32B	23.60912215	24.30431956	24.86132023	25.16827535
Orion14B	38.08240373	38.8572788	39.50040645	40.44875947

OrionLLM_V2.4.6.1	8para_out62	8para_out85	8para_out125	8para_out220
OrionMOE	25.71006327	27.13446743	28.89463226	29.70440167
Qwen32B	21.15920951	21.92001035	23.13867947	23.5649106
Orion14B	34.4151923	36.05635893	37.0874908	37.91705944

4. Model Inference

Model weights, source code, and configuration needed for inference are published on Hugging Face, and the download link is available in the table at the beginning of this document. We demonstrate various inference methods here, and the program will automatically download the necessary resources from Hugging Face.

4.1. Python Code

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from transformers.generation.utils import GenerationConfig

tokenizer = AutoTokenizer.from_pretrained("OrionStarAI/Orion-MOE8x7B-Base",
                                          use_fast=False,
                                          trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("OrionStarAI/Orion-MOE8x7B-Base",
                                             device_map="auto",
                                             torch_dtype=torch.bfloat16,
                                             trust_remote_code=True)

model.generation_config = GenerationConfig.from_pretrained("OrionStarAI/Orion-MOE8x7B-Base")
messages = [{"role": "user", "content": "Hello, what is your name? "}]
response = model.chat(tokenizer, messages, streaming=False)
print(response)

In the above Python code, the model is loaded with device_map='auto' to utilize all available GPUs. To specify the device, you can use something like export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 (using GPUs 0,1,2,3,4,5,6,7).

4.2. Direct Script Inference


# base model
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python demo/text_generation_base.py --model OrionStarAI/Orion-MOE8x7B-Base --tokenizer OrionStarAI/Orion-MOE8x7B-Base --prompt hello

5. Declarations, License

5.1. Declarations

We strongly urge all users not to use the Orion-MOE8x7B model for any activities that may harm national or social security or violate the law. Additionally, we request users not to use the Orion-MOE8x7B model for internet services without proper security review and filing. We hope all users abide by this principle to ensure that technological development takes place in a regulated and legal environment. We have done our best to ensure the compliance of the data used in the model training process. However, despite our significant efforts, unforeseen issues may still arise due to the complexity of the model and data. Therefore, if any problems arise due to the use of the Orion-MOE8x7B open-source model, including but not limited to data security issues, public opinion risks, or any risks and issues arising from the model being misled, abused, disseminated, or improperly utilized, we will not assume any responsibility.

5.2. License

Community use of the Orion-MOE8x7B series models

For code, please comply with Apache License Version 2.0
For model, please comply with 【Orion Series】 Models Community License Agreement

6. Company Introduction

OrionStar is a leading global service robot solutions company, founded in September 2016. OrionStar is dedicated to using artificial intelligence technology to create the next generation of revolutionary robots, allowing people to break free from repetitive physical labor and making human work and life more intelligent and enjoyable. Through technology, OrionStar aims to make society and the world a better place.

OrionStar possesses fully self-developed end-to-end artificial intelligence technologies, such as voice interaction and visual navigation. It integrates product development capabilities and technological application capabilities. Based on the Orion robotic arm platform, it has launched products such as OrionStar AI Robot Greeting, AI Robot Greeting Mini, Lucki, Coffee Master, and established the open platform OrionOS for Orion robots. Following the philosophy of "Born for Truly Useful Robots", OrionStar empowers more people through AI technology.

The core strengths of OrionStar lies in possessing end-to-end AI application capabilities, including big data preprocessing, large model pretraining, fine-tuning, prompt engineering, agent, etc. With comprehensive end-to-end model training capabilities, including systematic data processing workflows and the parallel model training capability of hundreds of GPUs, it has been successfully applied in various industry scenarios such as government affairs, cloud services, international e-commerce, and fast-moving consumer goods.

Companies with demands for deploying large-scale model applications are welcome to contact us.
Enquiry Hotline: 400-898-7779
E-mail: [email protected]
Discord Link: https://discord.gg/zumjDWgdAs

OrionStarAI
/

Orion-MoE8x7B