OrionStarAI
/

Orion-MoE8x7B

@@ -20,14 +20,14 @@ tags:
 <div align="center">
 <h1>
-  Orion-MOE8x7B
 </h1>
 </div>
 <div align="center">
 <div align="center">
-     <b>🌐English</b> | <a href="https://huggingface.co/OrionStarAI/Orion-MOE8x7B-Base/blob/main/README_zh.md" target="_blank">🇨🇳中文</a>
 </div>
@@ -48,7 +48,7 @@ tags:
 <a name="model-introduction"></a><br>
 # 1. Model Introduction
-- Orion-MOE8x7B is a pretrained foundation large language model with a sparse Mixture of Experts (MoE) architecture. The model is trained from scratch on a multilingual corpus comprising approximately 5 trillion tokens, including launguages such as Chinese, English, Japanese, Korean, and more.
 - Key Features of Orion-MoE8x7B
   - The model demonstrates exceptional performance in comprehensive evaluations compared to other models of the same parameter scale.
@@ -91,105 +91,103 @@ Model release and download links are provided in the table below:
 | Model Name | HuggingFace Download Links | ModelScope Download Links |
 |------------|----------------------------|---------------------------|
-| ⚾Orion-MOE8x7B | [Orion-MOE8x7B](https://huggingface.co/OrionStarAI/Orion-MOE8x7B-Base) | [Orion-MOE8x7B](https://modelscope.cn/models/OrionStarAI/Orion-MOE8x7B-Base/summary) |
 <a name="model-benchmark"></a><br>
 # 3. Model Benchmarks
-## 3.1. Orion-MOE8x7B Benchmarks
-### 3.1.1. LLM evaluation results on examination and professional knowledge
 <style>
 table th {
   background-color: #f2f2f2;
 }
-/* 对全局生效了
-table td:last-child {
-  background-color: #e6ffe6;
 }
-*/
 </style>
-|TestSet|Mixtral 8x7B|Qwen1.5-32b|Qwen2.5-32b|Orion 14B|Orion MOE8x7B|
-| -------------- | ---- | ---- | ---- | ---- | ---- |
-| MMLU           | 70.4 | 73.4 | 82.9 | 69.9 | <span style="background-color: #add8e6;">**85.9**</span> |
-| MMLU Pro       | 38.5 | 45.3 | 58.0 | 34.0 | <span style="background-color: #add8e6;">**58.3**</span> |
-| CEval          | 54.1 | 83.5 | 87.7 | 72.8 | <span style="background-color: #add8e6;">**89.7**</span> |
-| CMMLU          | 53.2 | 82.3 | 89.0 | 70.6 | <span style="background-color: #add8e6;">**89.2**</span> |
-| ARC_c          | 85.1 | 90.2 | **94.2** | 79.7 | <span style="background-color: #add8e6;">91.9</span> |
-| HellaSwag      | 81.9 | 82.0 | 82.5 | 78.5 | <span style="background-color: #add8e6;">**89.2**</span> |
-| LAMBADA        | 76.8 | 73.7 | 75.4 | 78.8 | <span style="background-color: #add8e6;">**79.7**</span> |
-| BBH            | 50.9 | 57.3 | **67.7** | 50.4 | <span style="background-color: #add8e6;">55.8</span> |
-| MuSR           | 43.2 | 42.7 | 49.8 | 43.6 | <span style="background-color: #add8e6;">**49.9**</span> |
-| PIQA           | 83.4 | 82.2 | 80.1 | 79.5 | <span style="background-color: #add8e6;">**87.3**</span> |
-| CommonSenseQA  | 69.6 | **74.7** | 73.0 | 66.9 | <span style="background-color: #add8e6;">73.1</span> |
-| IFEval         | 24.2 | 33.0 | **41.6** | 29.1 | <span style="background-color: #add8e6;">30.1</span> |
-| GQPA           | 30.9 | 33.5 | 49.5 | 28.5 | <span style="background-color: #add8e6;">**52.2**</span> |
-| HumanEval      | 33.5 | 36.0 | **47.0** | 20.1 | <span style="background-color: #add8e6;">44.5</span> |
-### 3.1.2. Comparison of LLM performances on Japanese testsets
-|Model        |Average|JSQuAD|JCommonSenseQA|JNLI|MARC-ja|JAQKET v2|PAWS-ja|
-|-------------|-------|-------|---------------|-----|-------|---------|-------|
-|Mixtral-8x7B |<span style="background-color: #ffffe0;">69.8</span> |89.0 |78.7 |32.1 |95.4 |78.9 |44.5 |
-|Qwen1.5-32B  |<span style="background-color: #ffffe0;">74.7</span> |89.9 |84.5 |51.0 |97.1 |82.1 |43.8 |
-|Qwen2.5-32B  |<span style="background-color: #ffffe0;">80.7</span> |89.1 |93.8 |72.1 |**97.9** |**89.3** |42.2 |
-|Orion-14B    |<span style="background-color: #ffffe0;">74.2</span> |74.2 |88.2 |72.8 |94.1 |66.2 |49.9 |
-|Orion-MOE8x7B|<span style="background-color: #ffffe0;">**82.9**</span> |<span style="background-color: #add8e6;">**91.8**</span> |<span style="background-color: #add8e6;">90.4</span> |<span style="background-color: #add8e6;">**90.5**</span> |<span style="background-color: #add8e6;">96.4</span> |<span style="background-color: #add8e6;">81.2</span> |<span style="background-color: #add8e6;">**47.4**</span> |
-### 3.1.3. Comparison of LLM performances on Korean testsets
-|Model|Average|HAE-RAE|KoBEST BoolQ|KoBEST COPA|KoBEST HellaSwag|KoBEST SentiNeg|KoBEST WiC|PAWS-ko|
-|-----|-------|-------|------------|-----------|----------------|---------------|----------|-------|
-|Mixtral-8x7B |<span style="background-color: #ffffe0;">60.7</span> |53.2 |78.6 |66.2 |56.6 |77.1 |49.4 |44.1 |
-|Qwen1.5-32B  |<span style="background-color: #ffffe0;">58.6</span> |46.4 |76.3 |60.4 |53.0 |78.3 |52.1 |43.4 |
-|Qwen2.5-32B  |<span style="background-color: #ffffe0;">71.4</span> |**70.7** |80.3 |76.7 |**61.2** |96.5 |**77.2** |37.1 |
-|Orion-14B    |<span style="background-color: #ffffe0;">67.7</span> |69.7 |80.6 |77.1 |58.2 |92.4 |51.2 |44.6 |
-|Orion-MOE8x7B|<span style="background-color: #ffffe0;">**72.0**</span> |<span style="background-color: #add8e6;">65.2</span> |<span style="background-color: #add8e6;">**85.4**</span> |<span style="background-color: #add8e6;">**80.4**</span> |<span style="background-color: #add8e6;">56.0</span> |<span style="background-color: #add8e6;">**97.0**</span> |<span style="background-color: #add8e6;">73.6</span> |<span style="background-color: #add8e6;">**46.4**</span> |
-### 3.1.4. Comparison of LLM performances on Arabic, German, French, and Spanish testsets
 | Language | Spanish |  | French |  | German |  | Arabic |  |
 |----|----|----|----|----|----|----|----|----|
 |**Model**|**HellaSwag**|**ARC**|**HellaSwag**|**ARC**|**HellaSwag**|**ARC**|**HellaSwag**|**ARC**|
 |Mixtral-8x7B |74.3 |54.8 |73.9 |55.9 |69.2 |52.4 |47.9 |36.3 |
 |Qwen1.5-32B  |70.5 |55.1 |68.9 |56.0 |63.8 |50.8 |50.1 |40.0 |
 |Qwen2.5-32B  |75.0 |65.3 |74.2 |62.7 |69.8 |61.8 |59.8 |52.9 |
-|Orion-14B    |62.0 |44.6 |60.2 |42.3 |54.7 |38.9 |42.3 |33.9 |
-|Orion-MOE8x7B|<span style="background-color: #add8e6;">**87.4**</span> |<span style="background-color: #add8e6;">**70.1**</span> |<span style="background-color: #add8e6;">**85.6**</span> |<span style="background-color: #add8e6;">**68.8**</span> |<span style="background-color: #add8e6;">**80.6**</span> |<span style="background-color: #add8e6;">**63.5**</span> |<span style="background-color: #add8e6;">**69.4**</span> |<span style="background-color: #add8e6;">**54.3</span>** |
-### 3.1.5. Leakage Detection Benchmark
 When the pre-training data of a large language model contains content from a specific dataset, the model’s performance on that dataset may be artificially enhanced, leading to inaccurate performance evaluations. To address this issue, researchers from the Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, and other institutions have proposed a simple and effective method for detecting data leakage. This method leverages the interchangeable nature of multiple-choice options by shuffling the options in the original dataset to generate derived data. The log-probability distribution of the derived dataset is then computed using the model to detect whether the original dataset has been leaked.
 We conducted data leakage detection experiments on three benchmark datasets: MMLU, CMMLU, and C-Eval.<br>
 More details can be found in the paper: https://web3.arxiv.org/pdf/2409.01790.<br>
 Test code: https://github.com/nishiwen1214/Benchmark-leakage-detection.
-|Threshold 0.2|Qwen2.5 32B|Qwen1.5 32B|Orion MOE8x7B|Orion 14B|Mixtral 8x7B|
 |------|------|------|------|------|------|
-|MMLU  | 0.30 | 0.27 | <span style="background-color: #add8e6;">**0.22**</span> | 0.28 | 0.25 |
-|CEval | 0.39 | 0.38 | <span style="background-color: #add8e6;">0.27</span> | **0.26** | **0.26** |
-|CMMLU | 0.38 | 0.39 | <span style="background-color: #add8e6;">0.23</span> | 0.27 | **0.22** |
-### 3.1.6. Inference speed
 Setup inference server on 8x Nvidia RTX3090， and get results from client in unit of 'tokens per second'.
-|Models | 8x3090 1concurrent | 8x3090 4concurrent | 4xA100 1concurrent | 4xA100 4concurrent|
 |---------|--------|-------|--------|-------|
-|OrionMOE | <span style="background-color: #add8e6;">**102.77**</span> | <span style="background-color: #add8e6;">**54.61**</span> | <span style="background-color: #add8e6;">**107.76**</span> | <span style="background-color: #add8e6;">**61.83**</span> |
-|Qwen32   | 52.93  | 46.06 | 62.43  | 56.81 |
 <br>
 We also tested on a 4x A100, comparing inference speeds based on different input lengths (tokens), get results from client in unit of 'tokens per second'.
 | Input | 4k | 8k | 12k | 16k | 32k | 64k |
 |---------|-------|-------|-------|-------|-------|-------|
-|OrionMOE | **90.86** | **54.40** | **31.08** | **29.04** | **22.69** | **14.51** |
-|Qwen32   | 53.99 | 47.59 | 25.98 | 24.35 | 18.64 | 11.86 |
 <a name="model-inference"></a><br>
 # 4. Model Inference
@@ -205,15 +203,15 @@ import torch
 from transformers import AutoModelForCausalLM, AutoTokenizer
 from transformers.generation.utils import GenerationConfig
-tokenizer = AutoTokenizer.from_pretrained("OrionStarAI/Orion-MOE8x7B-Base",
                                           use_fast=False,
                                           trust_remote_code=True)
-model = AutoModelForCausalLM.from_pretrained("OrionStarAI/Orion-MOE8x7B-Base",
                                              device_map="auto",
                                              torch_dtype=torch.bfloat16,
                                              trust_remote_code=True)
-model.generation_config = GenerationConfig.from_pretrained("OrionStarAI/Orion-MOE8x7B-Base")
 messages = [{"role": "user", "content": "Hello, what is your name? "}]
 response = model.chat(tokenizer, messages, streaming=False)
 print(response)
@@ -228,7 +226,7 @@ device, you can use something like `export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7`
 ```shell
 # foundation model
-CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python demo/text_generation_base.py --model OrionStarAI/Orion-MOE8x7B-Base --tokenizer OrionStarAI/Orion-MOE8x7B-Base --prompt hello
 ```
 ## 4.3. vLLM Inference Service
@@ -240,7 +238,7 @@ docker build -t vllm_server:0.0.0.0 -f Dockerfile .
 ```
 Start docker service
 ```shell
-docker run --gpus all -it -p 9999:9999 -v $(pwd)/logs:/workspace/logs:rw -v $HOME/Downloads:/workspace/models -e CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 -e MODEL_DIR=Orion-MOE8x7B-Base -e MODEL_NAME=orion-moe vllm_server:0.0.0.0
 ```
 Run inference
 ```shell
@@ -253,18 +251,18 @@ curl http://0.0.0.0:9999/v1/chat/completions -H "Content-Type: application/json"
 ## 5.1. Declarations
-We strongly urge all users not to use the Orion-MOE8x7B model for any activities that may harm national or social security or violate the law.
-Additionally, we request users not to use the Orion-MOE8x7B model for internet services without proper security review and filing.
 We hope all users abide by this principle to ensure that technological development takes place in a regulated and legal environment.
 We have done our best to ensure the compliance of the data used in the model training process. However, despite our
 significant efforts, unforeseen issues may still arise due to the complexity of the model and data. Therefore, if any
-problems arise due to the use of the Orion-MOE8x7B open-source model, including but not limited to data security
 issues, public opinion risks, or any risks and issues arising from the model being misled, abused, disseminated, or
 improperly utilized, we will not assume any responsibility.
 ## 5.2. License
-Community use of the Orion-MOE8x7B series models
 - For code, please comply with  [Apache License Version 2.0](./LICENSE)<br>
 - For model, please comply with [【Orion Series】 Models Community License Agreement](./ModelsCommunityLicenseAgreement)

 <div align="center">
 <h1>
+  Orion-MoE8x7B
 </h1>
 </div>
 <div align="center">
 <div align="center">
+     <b>🌐English</b> | <a href="https://huggingface.co/OrionStarAI/Orion-MoE8x7B/blob/main/README_zh.md" target="_blank">🇨🇳中文</a>
 </div>
 <a name="model-introduction"></a><br>
 # 1. Model Introduction
+- Orion-MoE8x7B is a pretrained foundation large language model with a sparse Mixture of Experts (MoE) architecture. The model is trained from scratch on a multilingual corpus comprising approximately 5 trillion tokens, including launguages such as Chinese, English, Japanese, Korean, and more.
 - Key Features of Orion-MoE8x7B
   - The model demonstrates exceptional performance in comprehensive evaluations compared to other models of the same parameter scale.
 | Model Name | HuggingFace Download Links | ModelScope Download Links |
 |------------|----------------------------|---------------------------|
+| ⚾Orion-MoE8x7B | [Orion-MoE8x7B](https://huggingface.co/OrionStarAI/Orion-MoE8x7B) | [Orion-MoE8x7B](https://modelscope.cn/models/OrionStarAI/Orion-MoE8x7B-Base/summary) |
 <a name="model-benchmark"></a><br>
 # 3. Model Benchmarks
+### 3.1. LLM evaluation results on examination and professional knowledge
 <style>
 table th {
   background-color: #f2f2f2;
 }
+td.orion{
+  background-color: #e6ffe6;
 }
+td.avg{
+  background-color: #ffffe0;
+}
 </style>
+|TestSet|Mixtral 8x7B|Qwen1.5-32b|Qwen2.5-32b|Orion 14B |Qwen2-57B-A14 <th> Orion MoE8x7B</th>
+| -------------- | ---- | ---- | ---- | ---- | ----
+| MMLU           | 70.4 | 73.4 | 82.9 | 69.9  | 76.5  <td class="orion">**85.9**</td>
+| MMLU Pro       | 38.5 | 45.3 | 58.0 | 34.0  |48.6  <td class="orion">**58.3**</td>
+| CEval          | 54.1 | 83.5 | 87.7 | 72.8 | 87.7  <td class="orion">**89.7**</td>
+| CMMLU          | 53.2 | 82.3 | 89.0 | 70.6 | 88.5  <td class="orion">**89.2**</td>
+| ARC_c          | 85.1 | 90.2 | **94.2** | 79.7 |91.5  <td class="orion">91.9</td>
+| HellaSwag      | 81.9 | 82.0 | 82.5 | 78.5 | 85.2  <td class="orion">**89.2**</td>
+| LAMBADA        | 76.8 | 73.7 | 75.4 | 78.8 | 72.6  <td class="orion">**79.7**</td>
+| BBH            | 50.9 | 57.3 | **67.7** | 50.4 | 55.1  <td class="orion">55.8</td>
+| MuSR           | 43.2 | 42.7 | 49.8 | 43.6 | 39.0   <td class="orion">**49.9**</td>
+| PIQA           | 83.4 | 82.2 | 80.1 | 79.5 | 81.9  <td class="orion">**87.3**</td>
+| CommonSenseQA  | 69.6 | **74.7** | 73.0 | 66.9 | 69.9  <td class="orion">73.1</td>
+| IFEval         | 24.2 | 33.0 | **41.6** | 29.1 | 31.2  <td class="orion">30.1</td>
+| GQPA           | 30.9 | 33.5 | 49.5 | 28.5 | 32.6  <td class="orion">**52.2**</td>
+| HumanEval      | 33.5 | 36.0 | **47.0** | 20.1 | 53.0  <td class="orion">44.5</td>
+### 3.2. Comparison of LLM performances on Japanese testsets
+|Model        <th>Average</th>|JSQuAD|JCommonSenseQA|JNLI|MARC-ja|JAQKET v2|PAWS-ja|
+|-------------|-------|-------|---------------|-----|-------|---------|
+|Mixtral-8x7B <td class="avg">69.8</td> |89.0 |78.7 |32.1 |95.4 |78.9 |44.5 |
+|Qwen1.5-32B  <td class="avg">74.7</td> |89.9 |84.5 |51.0 |97.1 |82.1 |43.8 |
+|Qwen2.5-32B  <td class="avg">80.7</td> |89.1 |93.8 |72.1 |**97.9** |**89.3** |42.2 |
+|Orion-14B    <td class="avg">74.2</td> |74.2 |88.2 |72.8 |94.1 |66.2 |49.9 |
+|Orion-MoE8x7B <td class="avg">**82.9**</td> | **91.8** | 90.4 | **90.5** | 96.4 | 81.2 | **47.4** |
+### 3.3. Comparison of LLM performances on Korean testsets
+|Model <th>Average</th>|HAE-RAE|KoBEST BoolQ|KoBEST COPA|KoBEST HellaSwag|KoBEST SentiNeg|KoBEST WiC|PAWS-ko|
+|-----|-------|-------|------------|-----------|----------------|---------------|----------|
+|Mixtral-8x7B   <td class="avg">60.7</td> |53.2 |78.6 |66.2 |56.6 |77.1 |49.4 |44.1 |
+|Qwen1.5-32B    <td class="avg">58.6</td> |46.4 |76.3 |60.4 |53.0 |78.3 |52.1 |43.4 |
+|Qwen2.5-32B    <td class="avg">71.4</td> |**70.7** |80.3 |76.7 |**61.2** |96.5 |**77.2** |37.1 |
+|Orion-14B      <td class="avg">67.7</td> |69.7 |80.6 |77.1 |58.2 |92.4 |51.2 |44.6 |
+|Orion-MoE8x7B  <td class="avg">**72.0**</td> | 65.2 | **85.4** | **80.4** | 56.0 | **97.0** | 73.6 | **46.4** |
+### 3.4. Comparison of LLM performances on Arabic, German, French, and Spanish testsets
 | Language | Spanish |  | French |  | German |  | Arabic |  |
 |----|----|----|----|----|----|----|----|----|
 |**Model**|**HellaSwag**|**ARC**|**HellaSwag**|**ARC**|**HellaSwag**|**ARC**|**HellaSwag**|**ARC**|
 |Mixtral-8x7B |74.3 |54.8 |73.9 |55.9 |69.2 |52.4 |47.9 |36.3 |
 |Qwen1.5-32B  |70.5 |55.1 |68.9 |56.0 |63.8 |50.8 |50.1 |40.0 |
 |Qwen2.5-32B  |75.0 |65.3 |74.2 |62.7 |69.8 |61.8 |59.8 |52.9 |
+|Orion-14B    |62.0 |44.6 |60.2 |42.3 |54.7 |38.9 |42.3 |33.9  <tr><td> Orion-MoE8x7B</td>  <td class="orion">**87.4**</td>  <td class="orion">**70.1**</td>  <td class="orion">**85.6**</td>  <td class="orion">**68.8**</td>  <td class="orion">**80.6**</td>  <td class="orion">**63.5**</td>  <td class="orion">**69.4**</td>  <td class="orion">**54.3</td>** </tr>
+### 3.5. Leakage Detection Benchmark
 When the pre-training data of a large language model contains content from a specific dataset, the model’s performance on that dataset may be artificially enhanced, leading to inaccurate performance evaluations. To address this issue, researchers from the Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, and other institutions have proposed a simple and effective method for detecting data leakage. This method leverages the interchangeable nature of multiple-choice options by shuffling the options in the original dataset to generate derived data. The log-probability distribution of the derived dataset is then computed using the model to detect whether the original dataset has been leaked.
 We conducted data leakage detection experiments on three benchmark datasets: MMLU, CMMLU, and C-Eval.<br>
 More details can be found in the paper: https://web3.arxiv.org/pdf/2409.01790.<br>
 Test code: https://github.com/nishiwen1214/Benchmark-leakage-detection.
+|Threshold 0.2|Qwen2.5 32B|Qwen1.5 32B| Orion MoE8x7B |Orion 14B|Mixtral 8x7B|
 |------|------|------|------|------|------|
+|MMLU  | 0.30 | 0.27 | 0.22 | 0.28 | 0.25 |
+|CEval | 0.39 | 0.38 | 0.27 | 0.26 | 0.26 |
+|CMMLU | 0.38 | 0.39 | 0.23 | 0.27 | 0.22 |
+### 3.6. Inference speed
 Setup inference server on 8x Nvidia RTX3090， and get results from client in unit of 'tokens per second'.
+|Models | 8x3090 1 concurrent | 8x3090 4 concurrent | 4xA100 1 concurrent | 4xA100 4 concurrent|
 |---------|--------|-------|--------|-------|
+|Qwen32   | 52.93  | 46.06 | 62.43  | 56.81  <tr><td>OrionMOE</td>  <td class="orion">**102.77**</td>  <td class="orion">**54.61**</td>  <td class="orion">**107.76**</td>  <td class="orion">**61.83**</td> </tr>
 <br>
 We also tested on a 4x A100, comparing inference speeds based on different input lengths (tokens), get results from client in unit of 'tokens per second'.
 | Input | 4k | 8k | 12k | 16k | 32k | 64k |
 |---------|-------|-------|-------|-------|-------|-------|
+|Qwen32   | 53.99 | 47.59 | 25.98 | 24.35 | 18.64 | 11.86 <tr><td>OrionMOE</td>  <td class="orion">**90.86**</td>  <td class="orion">**54.40**</td>  <td class="orion">**31.08**</td>  <td class="orion">**29.04**</td>  <td class="orion">**22.69**</td>  <td class="orion">**14.51**</td> </tr>
 <a name="model-inference"></a><br>
 # 4. Model Inference
 from transformers import AutoModelForCausalLM, AutoTokenizer
 from transformers.generation.utils import GenerationConfig
+tokenizer = AutoTokenizer.from_pretrained("OrionStarAI/Orion-MoE8x7B",
                                           use_fast=False,
                                           trust_remote_code=True)
+model = AutoModelForCausalLM.from_pretrained("OrionStarAI/Orion-MoE8x7B",
                                              device_map="auto",
                                              torch_dtype=torch.bfloat16,
                                              trust_remote_code=True)
+model.generation_config = GenerationConfig.from_pretrained("OrionStarAI/Orion-MoE8x7B")
 messages = [{"role": "user", "content": "Hello, what is your name? "}]
 response = model.chat(tokenizer, messages, streaming=False)
 print(response)
 ```shell
 # foundation model
+CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python demo/text_generation_base.py --model OrionStarAI/Orion-MoE8x7B --tokenizer OrionStarAI/Orion-MoE8x7B --prompt hello
 ```
 ## 4.3. vLLM Inference Service
 ```
 Start docker service
 ```shell
+docker run --gpus all -it -p 9999:9999 -v $(pwd)/logs:/workspace/logs:rw -v $HOME/Downloads:/workspace/models -e CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 -e MODEL_DIR=Orion-MoE8x7B -e MODEL_NAME=orion-moe vllm_server:0.0.0.0
 ```
 Run inference
 ```shell
 ## 5.1. Declarations
+We strongly urge all users not to use the Orion-MoE8x7B model for any activities that may harm national or social security or violate the law.
+Additionally, we request users not to use the Orion-MoE8x7B model for internet services without proper security review and filing.
 We hope all users abide by this principle to ensure that technological development takes place in a regulated and legal environment.
 We have done our best to ensure the compliance of the data used in the model training process. However, despite our
 significant efforts, unforeseen issues may still arise due to the complexity of the model and data. Therefore, if any
+problems arise due to the use of the Orion-MoE8x7B open-source model, including but not limited to data security
 issues, public opinion risks, or any risks and issues arising from the model being misled, abused, disseminated, or
 improperly utilized, we will not assume any responsibility.
 ## 5.2. License
+Community use of the Orion-MoE8x7B series models
 - For code, please comply with  [Apache License Version 2.0](./LICENSE)<br>
 - For model, please comply with [【Orion Series】 Models Community License Agreement](./ModelsCommunityLicenseAgreement)