renillhuang commited on
Commit
45ccc2b
·
verified ·
1 Parent(s): acea0e0

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +72 -74
README.md CHANGED
@@ -20,14 +20,14 @@ tags:
20
 
21
  <div align="center">
22
  <h1>
23
- Orion-MOE8x7B
24
  </h1>
25
  </div>
26
 
27
  <div align="center">
28
 
29
  <div align="center">
30
- <b>🌐English</b> | <a href="https://huggingface.co/OrionStarAI/Orion-MOE8x7B-Base/blob/main/README_zh.md" target="_blank">🇨🇳中文</a>
31
  </div>
32
 
33
 
@@ -48,7 +48,7 @@ tags:
48
  <a name="model-introduction"></a><br>
49
  # 1. Model Introduction
50
 
51
- - Orion-MOE8x7B is a pretrained foundation large language model with a sparse Mixture of Experts (MoE) architecture. The model is trained from scratch on a multilingual corpus comprising approximately 5 trillion tokens, including launguages such as Chinese, English, Japanese, Korean, and more.
52
 
53
  - Key Features of Orion-MoE8x7B
54
  - The model demonstrates exceptional performance in comprehensive evaluations compared to other models of the same parameter scale.
@@ -91,105 +91,103 @@ Model release and download links are provided in the table below:
91
 
92
  | Model Name | HuggingFace Download Links | ModelScope Download Links |
93
  |------------|----------------------------|---------------------------|
94
- | ⚾Orion-MOE8x7B | [Orion-MOE8x7B](https://huggingface.co/OrionStarAI/Orion-MOE8x7B-Base) | [Orion-MOE8x7B](https://modelscope.cn/models/OrionStarAI/Orion-MOE8x7B-Base/summary) |
95
 
96
 
97
  <a name="model-benchmark"></a><br>
98
  # 3. Model Benchmarks
99
 
100
- ## 3.1. Orion-MOE8x7B Benchmarks
101
- ### 3.1.1. LLM evaluation results on examination and professional knowledge
102
 
103
  <style>
104
  table th {
105
  background-color: #f2f2f2;
106
  }
107
 
108
- /* 对全局生效了
109
- table td:last-child {
110
- background-color: #e6ffe6;
111
  }
112
- */
 
 
 
 
113
  </style>
114
 
115
- |TestSet|Mixtral 8x7B|Qwen1.5-32b|Qwen2.5-32b|Orion 14B|Orion MOE8x7B|
116
- | -------------- | ---- | ---- | ---- | ---- | ---- |
117
- | MMLU | 70.4 | 73.4 | 82.9 | 69.9 | <span style="background-color: #add8e6;">**85.9**</span> |
118
- | MMLU Pro | 38.5 | 45.3 | 58.0 | 34.0 | <span style="background-color: #add8e6;">**58.3**</span> |
119
- | CEval | 54.1 | 83.5 | 87.7 | 72.8 | <span style="background-color: #add8e6;">**89.7**</span> |
120
- | CMMLU | 53.2 | 82.3 | 89.0 | 70.6 | <span style="background-color: #add8e6;">**89.2**</span> |
121
- | ARC_c | 85.1 | 90.2 | **94.2** | 79.7 | <span style="background-color: #add8e6;">91.9</span> |
122
- | HellaSwag | 81.9 | 82.0 | 82.5 | 78.5 | <span style="background-color: #add8e6;">**89.2**</span> |
123
- | LAMBADA | 76.8 | 73.7 | 75.4 | 78.8 | <span style="background-color: #add8e6;">**79.7**</span> |
124
- | BBH | 50.9 | 57.3 | **67.7** | 50.4 | <span style="background-color: #add8e6;">55.8</span> |
125
- | MuSR | 43.2 | 42.7 | 49.8 | 43.6 | <span style="background-color: #add8e6;">**49.9**</span> |
126
- | PIQA | 83.4 | 82.2 | 80.1 | 79.5 | <span style="background-color: #add8e6;">**87.3**</span> |
127
- | CommonSenseQA | 69.6 | **74.7** | 73.0 | 66.9 | <span style="background-color: #add8e6;">73.1</span> |
128
- | IFEval | 24.2 | 33.0 | **41.6** | 29.1 | <span style="background-color: #add8e6;">30.1</span> |
129
- | GQPA | 30.9 | 33.5 | 49.5 | 28.5 | <span style="background-color: #add8e6;">**52.2**</span> |
130
- | HumanEval | 33.5 | 36.0 | **47.0** | 20.1 | <span style="background-color: #add8e6;">44.5</span> |
131
-
132
-
133
-
134
-
135
- ### 3.1.2. Comparison of LLM performances on Japanese testsets
136
- |Model |Average|JSQuAD|JCommonSenseQA|JNLI|MARC-ja|JAQKET v2|PAWS-ja|
137
- |-------------|-------|-------|---------------|-----|-------|---------|-------|
138
- |Mixtral-8x7B |<span style="background-color: #ffffe0;">69.8</span> |89.0 |78.7 |32.1 |95.4 |78.9 |44.5 |
139
- |Qwen1.5-32B |<span style="background-color: #ffffe0;">74.7</span> |89.9 |84.5 |51.0 |97.1 |82.1 |43.8 |
140
- |Qwen2.5-32B |<span style="background-color: #ffffe0;">80.7</span> |89.1 |93.8 |72.1 |**97.9** |**89.3** |42.2 |
141
- |Orion-14B |<span style="background-color: #ffffe0;">74.2</span> |74.2 |88.2 |72.8 |94.1 |66.2 |49.9 |
142
- |Orion-MOE8x7B|<span style="background-color: #ffffe0;">**82.9**</span> |<span style="background-color: #add8e6;">**91.8**</span> |<span style="background-color: #add8e6;">90.4</span> |<span style="background-color: #add8e6;">**90.5**</span> |<span style="background-color: #add8e6;">96.4</span> |<span style="background-color: #add8e6;">81.2</span> |<span style="background-color: #add8e6;">**47.4**</span> |
143
-
144
- ### 3.1.3. Comparison of LLM performances on Korean testsets
145
- |Model|Average|HAE-RAE|KoBEST BoolQ|KoBEST COPA|KoBEST HellaSwag|KoBEST SentiNeg|KoBEST WiC|PAWS-ko|
146
- |-----|-------|-------|------------|-----------|----------------|---------------|----------|-------|
147
- |Mixtral-8x7B |<span style="background-color: #ffffe0;">60.7</span> |53.2 |78.6 |66.2 |56.6 |77.1 |49.4 |44.1 |
148
- |Qwen1.5-32B |<span style="background-color: #ffffe0;">58.6</span> |46.4 |76.3 |60.4 |53.0 |78.3 |52.1 |43.4 |
149
- |Qwen2.5-32B |<span style="background-color: #ffffe0;">71.4</span> |**70.7** |80.3 |76.7 |**61.2** |96.5 |**77.2** |37.1 |
150
- |Orion-14B |<span style="background-color: #ffffe0;">67.7</span> |69.7 |80.6 |77.1 |58.2 |92.4 |51.2 |44.6 |
151
- |Orion-MOE8x7B|<span style="background-color: #ffffe0;">**72.0**</span> |<span style="background-color: #add8e6;">65.2</span> |<span style="background-color: #add8e6;">**85.4**</span> |<span style="background-color: #add8e6;">**80.4**</span> |<span style="background-color: #add8e6;">56.0</span> |<span style="background-color: #add8e6;">**97.0**</span> |<span style="background-color: #add8e6;">73.6</span> |<span style="background-color: #add8e6;">**46.4**</span> |
152
-
153
-
154
- ### 3.1.4. Comparison of LLM performances on Arabic, German, French, and Spanish testsets
 
155
  | Language | Spanish | | French | | German | | Arabic | |
156
  |----|----|----|----|----|----|----|----|----|
157
  |**Model**|**HellaSwag**|**ARC**|**HellaSwag**|**ARC**|**HellaSwag**|**ARC**|**HellaSwag**|**ARC**|
158
  |Mixtral-8x7B |74.3 |54.8 |73.9 |55.9 |69.2 |52.4 |47.9 |36.3 |
159
  |Qwen1.5-32B |70.5 |55.1 |68.9 |56.0 |63.8 |50.8 |50.1 |40.0 |
160
  |Qwen2.5-32B |75.0 |65.3 |74.2 |62.7 |69.8 |61.8 |59.8 |52.9 |
161
- |Orion-14B |62.0 |44.6 |60.2 |42.3 |54.7 |38.9 |42.3 |33.9 |
162
- |Orion-MOE8x7B|<span style="background-color: #add8e6;">**87.4**</span> |<span style="background-color: #add8e6;">**70.1**</span> |<span style="background-color: #add8e6;">**85.6**</span> |<span style="background-color: #add8e6;">**68.8**</span> |<span style="background-color: #add8e6;">**80.6**</span> |<span style="background-color: #add8e6;">**63.5**</span> |<span style="background-color: #add8e6;">**69.4**</span> |<span style="background-color: #add8e6;">**54.3</span>** |
163
 
164
- ### 3.1.5. Leakage Detection Benchmark
165
  When the pre-training data of a large language model contains content from a specific dataset, the model’s performance on that dataset may be artificially enhanced, leading to inaccurate performance evaluations. To address this issue, researchers from the Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, and other institutions have proposed a simple and effective method for detecting data leakage. This method leverages the interchangeable nature of multiple-choice options by shuffling the options in the original dataset to generate derived data. The log-probability distribution of the derived dataset is then computed using the model to detect whether the original dataset has been leaked.
166
 
167
  We conducted data leakage detection experiments on three benchmark datasets: MMLU, CMMLU, and C-Eval.<br>
168
  More details can be found in the paper: https://web3.arxiv.org/pdf/2409.01790.<br>
169
  Test code: https://github.com/nishiwen1214/Benchmark-leakage-detection.
170
 
171
- |Threshold 0.2|Qwen2.5 32B|Qwen1.5 32B|Orion MOE8x7B|Orion 14B|Mixtral 8x7B|
172
  |------|------|------|------|------|------|
173
- |MMLU | 0.30 | 0.27 | <span style="background-color: #add8e6;">**0.22**</span> | 0.28 | 0.25 |
174
- |CEval | 0.39 | 0.38 | <span style="background-color: #add8e6;">0.27</span> | **0.26** | **0.26** |
175
- |CMMLU | 0.38 | 0.39 | <span style="background-color: #add8e6;">0.23</span> | 0.27 | **0.22** |
176
 
177
- ### 3.1.6. Inference speed
178
  Setup inference server on 8x Nvidia RTX3090, and get results from client in unit of 'tokens per second'.
179
- |Models | 8x3090 1concurrent | 8x3090 4concurrent | 4xA100 1concurrent | 4xA100 4concurrent|
180
  |---------|--------|-------|--------|-------|
181
- |OrionMOE | <span style="background-color: #add8e6;">**102.77**</span> | <span style="background-color: #add8e6;">**54.61**</span> | <span style="background-color: #add8e6;">**107.76**</span> | <span style="background-color: #add8e6;">**61.83**</span> |
182
- |Qwen32 | 52.93 | 46.06 | 62.43 | 56.81 |
183
 
184
  <br>
185
  We also tested on a 4x A100, comparing inference speeds based on different input lengths (tokens), get results from client in unit of 'tokens per second'.
186
 
187
  | Input | 4k | 8k | 12k | 16k | 32k | 64k |
188
  |---------|-------|-------|-------|-------|-------|-------|
189
- |OrionMOE | **90.86** | **54.40** | **31.08** | **29.04** | **22.69** | **14.51** |
190
- |Qwen32 | 53.99 | 47.59 | 25.98 | 24.35 | 18.64 | 11.86 |
191
-
192
-
193
 
194
  <a name="model-inference"></a><br>
195
  # 4. Model Inference
@@ -205,15 +203,15 @@ import torch
205
  from transformers import AutoModelForCausalLM, AutoTokenizer
206
  from transformers.generation.utils import GenerationConfig
207
 
208
- tokenizer = AutoTokenizer.from_pretrained("OrionStarAI/Orion-MOE8x7B-Base",
209
  use_fast=False,
210
  trust_remote_code=True)
211
- model = AutoModelForCausalLM.from_pretrained("OrionStarAI/Orion-MOE8x7B-Base",
212
  device_map="auto",
213
  torch_dtype=torch.bfloat16,
214
  trust_remote_code=True)
215
 
216
- model.generation_config = GenerationConfig.from_pretrained("OrionStarAI/Orion-MOE8x7B-Base")
217
  messages = [{"role": "user", "content": "Hello, what is your name? "}]
218
  response = model.chat(tokenizer, messages, streaming=False)
219
  print(response)
@@ -228,7 +226,7 @@ device, you can use something like `export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7`
228
  ```shell
229
 
230
  # foundation model
231
- CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python demo/text_generation_base.py --model OrionStarAI/Orion-MOE8x7B-Base --tokenizer OrionStarAI/Orion-MOE8x7B-Base --prompt hello
232
 
233
  ```
234
  ## 4.3. vLLM Inference Service
@@ -240,7 +238,7 @@ docker build -t vllm_server:0.0.0.0 -f Dockerfile .
240
  ```
241
  Start docker service
242
  ```shell
243
- docker run --gpus all -it -p 9999:9999 -v $(pwd)/logs:/workspace/logs:rw -v $HOME/Downloads:/workspace/models -e CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 -e MODEL_DIR=Orion-MOE8x7B-Base -e MODEL_NAME=orion-moe vllm_server:0.0.0.0
244
  ```
245
  Run inference
246
  ```shell
@@ -253,18 +251,18 @@ curl http://0.0.0.0:9999/v1/chat/completions -H "Content-Type: application/json"
253
 
254
  ## 5.1. Declarations
255
 
256
- We strongly urge all users not to use the Orion-MOE8x7B model for any activities that may harm national or social security or violate the law.
257
- Additionally, we request users not to use the Orion-MOE8x7B model for internet services without proper security review and filing.
258
  We hope all users abide by this principle to ensure that technological development takes place in a regulated and legal environment.
259
  We have done our best to ensure the compliance of the data used in the model training process. However, despite our
260
  significant efforts, unforeseen issues may still arise due to the complexity of the model and data. Therefore, if any
261
- problems arise due to the use of the Orion-MOE8x7B open-source model, including but not limited to data security
262
  issues, public opinion risks, or any risks and issues arising from the model being misled, abused, disseminated, or
263
  improperly utilized, we will not assume any responsibility.
264
 
265
  ## 5.2. License
266
 
267
- Community use of the Orion-MOE8x7B series models
268
  - For code, please comply with [Apache License Version 2.0](./LICENSE)<br>
269
  - For model, please comply with [【Orion Series】 Models Community License Agreement](./ModelsCommunityLicenseAgreement)
270
 
 
20
 
21
  <div align="center">
22
  <h1>
23
+ Orion-MoE8x7B
24
  </h1>
25
  </div>
26
 
27
  <div align="center">
28
 
29
  <div align="center">
30
+ <b>🌐English</b> | <a href="https://huggingface.co/OrionStarAI/Orion-MoE8x7B/blob/main/README_zh.md" target="_blank">🇨🇳中文</a>
31
  </div>
32
 
33
 
 
48
  <a name="model-introduction"></a><br>
49
  # 1. Model Introduction
50
 
51
+ - Orion-MoE8x7B is a pretrained foundation large language model with a sparse Mixture of Experts (MoE) architecture. The model is trained from scratch on a multilingual corpus comprising approximately 5 trillion tokens, including launguages such as Chinese, English, Japanese, Korean, and more.
52
 
53
  - Key Features of Orion-MoE8x7B
54
  - The model demonstrates exceptional performance in comprehensive evaluations compared to other models of the same parameter scale.
 
91
 
92
  | Model Name | HuggingFace Download Links | ModelScope Download Links |
93
  |------------|----------------------------|---------------------------|
94
+ | ⚾Orion-MoE8x7B | [Orion-MoE8x7B](https://huggingface.co/OrionStarAI/Orion-MoE8x7B) | [Orion-MoE8x7B](https://modelscope.cn/models/OrionStarAI/Orion-MoE8x7B-Base/summary) |
95
 
96
 
97
  <a name="model-benchmark"></a><br>
98
  # 3. Model Benchmarks
99
 
100
+ ### 3.1. LLM evaluation results on examination and professional knowledge
 
101
 
102
  <style>
103
  table th {
104
  background-color: #f2f2f2;
105
  }
106
 
107
+ td.orion{
108
+ background-color: #e6ffe6;
 
109
  }
110
+
111
+ td.avg{
112
+ background-color: #ffffe0;
113
+ }
114
+
115
  </style>
116
 
117
+
118
+ |TestSet|Mixtral 8x7B|Qwen1.5-32b|Qwen2.5-32b|Orion 14B |Qwen2-57B-A14 <th> Orion MoE8x7B</th>
119
+ | -------------- | ---- | ---- | ---- | ---- | ----
120
+ | MMLU | 70.4 | 73.4 | 82.9 | 69.9 | 76.5 <td class="orion">**85.9**</td>
121
+ | MMLU Pro | 38.5 | 45.3 | 58.0 | 34.0 |48.6 <td class="orion">**58.3**</td>
122
+ | CEval | 54.1 | 83.5 | 87.7 | 72.8 | 87.7 <td class="orion">**89.7**</td>
123
+ | CMMLU | 53.2 | 82.3 | 89.0 | 70.6 | 88.5 <td class="orion">**89.2**</td>
124
+ | ARC_c | 85.1 | 90.2 | **94.2** | 79.7 |91.5 <td class="orion">91.9</td>
125
+ | HellaSwag | 81.9 | 82.0 | 82.5 | 78.5 | 85.2 <td class="orion">**89.2**</td>
126
+ | LAMBADA | 76.8 | 73.7 | 75.4 | 78.8 | 72.6 <td class="orion">**79.7**</td>
127
+ | BBH | 50.9 | 57.3 | **67.7** | 50.4 | 55.1 <td class="orion">55.8</td>
128
+ | MuSR | 43.2 | 42.7 | 49.8 | 43.6 | 39.0 <td class="orion">**49.9**</td>
129
+ | PIQA | 83.4 | 82.2 | 80.1 | 79.5 | 81.9 <td class="orion">**87.3**</td>
130
+ | CommonSenseQA | 69.6 | **74.7** | 73.0 | 66.9 | 69.9 <td class="orion">73.1</td>
131
+ | IFEval | 24.2 | 33.0 | **41.6** | 29.1 | 31.2 <td class="orion">30.1</td>
132
+ | GQPA | 30.9 | 33.5 | 49.5 | 28.5 | 32.6 <td class="orion">**52.2**</td>
133
+ | HumanEval | 33.5 | 36.0 | **47.0** | 20.1 | 53.0 <td class="orion">44.5</td>
134
+
135
+
136
+
137
+
138
+ ### 3.2. Comparison of LLM performances on Japanese testsets
139
+ |Model <th>Average</th>|JSQuAD|JCommonSenseQA|JNLI|MARC-ja|JAQKET v2|PAWS-ja|
140
+ |-------------|-------|-------|---------------|-----|-------|---------|
141
+ |Mixtral-8x7B <td class="avg">69.8</td> |89.0 |78.7 |32.1 |95.4 |78.9 |44.5 |
142
+ |Qwen1.5-32B <td class="avg">74.7</td> |89.9 |84.5 |51.0 |97.1 |82.1 |43.8 |
143
+ |Qwen2.5-32B <td class="avg">80.7</td> |89.1 |93.8 |72.1 |**97.9** |**89.3** |42.2 |
144
+ |Orion-14B <td class="avg">74.2</td> |74.2 |88.2 |72.8 |94.1 |66.2 |49.9 |
145
+ |Orion-MoE8x7B <td class="avg">**82.9**</td> | **91.8** | 90.4 | **90.5** | 96.4 | 81.2 | **47.4** |
146
+
147
+ ### 3.3. Comparison of LLM performances on Korean testsets
148
+ |Model <th>Average</th>|HAE-RAE|KoBEST BoolQ|KoBEST COPA|KoBEST HellaSwag|KoBEST SentiNeg|KoBEST WiC|PAWS-ko|
149
+ |-----|-------|-------|------------|-----------|----------------|---------------|----------|
150
+ |Mixtral-8x7B <td class="avg">60.7</td> |53.2 |78.6 |66.2 |56.6 |77.1 |49.4 |44.1 |
151
+ |Qwen1.5-32B <td class="avg">58.6</td> |46.4 |76.3 |60.4 |53.0 |78.3 |52.1 |43.4 |
152
+ |Qwen2.5-32B <td class="avg">71.4</td> |**70.7** |80.3 |76.7 |**61.2** |96.5 |**77.2** |37.1 |
153
+ |Orion-14B <td class="avg">67.7</td> |69.7 |80.6 |77.1 |58.2 |92.4 |51.2 |44.6 |
154
+ |Orion-MoE8x7B <td class="avg">**72.0**</td> | 65.2 | **85.4** | **80.4** | 56.0 | **97.0** | 73.6 | **46.4** |
155
+
156
+
157
+ ### 3.4. Comparison of LLM performances on Arabic, German, French, and Spanish testsets
158
  | Language | Spanish | | French | | German | | Arabic | |
159
  |----|----|----|----|----|----|----|----|----|
160
  |**Model**|**HellaSwag**|**ARC**|**HellaSwag**|**ARC**|**HellaSwag**|**ARC**|**HellaSwag**|**ARC**|
161
  |Mixtral-8x7B |74.3 |54.8 |73.9 |55.9 |69.2 |52.4 |47.9 |36.3 |
162
  |Qwen1.5-32B |70.5 |55.1 |68.9 |56.0 |63.8 |50.8 |50.1 |40.0 |
163
  |Qwen2.5-32B |75.0 |65.3 |74.2 |62.7 |69.8 |61.8 |59.8 |52.9 |
164
+ |Orion-14B |62.0 |44.6 |60.2 |42.3 |54.7 |38.9 |42.3 |33.9 <tr><td> Orion-MoE8x7B</td> <td class="orion">**87.4**</td> <td class="orion">**70.1**</td> <td class="orion">**85.6**</td> <td class="orion">**68.8**</td> <td class="orion">**80.6**</td> <td class="orion">**63.5**</td> <td class="orion">**69.4**</td> <td class="orion">**54.3</td>** </tr>
 
165
 
166
+ ### 3.5. Leakage Detection Benchmark
167
  When the pre-training data of a large language model contains content from a specific dataset, the model’s performance on that dataset may be artificially enhanced, leading to inaccurate performance evaluations. To address this issue, researchers from the Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, and other institutions have proposed a simple and effective method for detecting data leakage. This method leverages the interchangeable nature of multiple-choice options by shuffling the options in the original dataset to generate derived data. The log-probability distribution of the derived dataset is then computed using the model to detect whether the original dataset has been leaked.
168
 
169
  We conducted data leakage detection experiments on three benchmark datasets: MMLU, CMMLU, and C-Eval.<br>
170
  More details can be found in the paper: https://web3.arxiv.org/pdf/2409.01790.<br>
171
  Test code: https://github.com/nishiwen1214/Benchmark-leakage-detection.
172
 
173
+ |Threshold 0.2|Qwen2.5 32B|Qwen1.5 32B| Orion MoE8x7B |Orion 14B|Mixtral 8x7B|
174
  |------|------|------|------|------|------|
175
+ |MMLU | 0.30 | 0.27 | 0.22 | 0.28 | 0.25 |
176
+ |CEval | 0.39 | 0.38 | 0.27 | 0.26 | 0.26 |
177
+ |CMMLU | 0.38 | 0.39 | 0.23 | 0.27 | 0.22 |
178
 
179
+ ### 3.6. Inference speed
180
  Setup inference server on 8x Nvidia RTX3090, and get results from client in unit of 'tokens per second'.
181
+ |Models | 8x3090 1 concurrent | 8x3090 4 concurrent | 4xA100 1 concurrent | 4xA100 4 concurrent|
182
  |---------|--------|-------|--------|-------|
183
+ |Qwen32 | 52.93 | 46.06 | 62.43 | 56.81 <tr><td>OrionMOE</td> <td class="orion">**102.77**</td> <td class="orion">**54.61**</td> <td class="orion">**107.76**</td> <td class="orion">**61.83**</td> </tr>
 
184
 
185
  <br>
186
  We also tested on a 4x A100, comparing inference speeds based on different input lengths (tokens), get results from client in unit of 'tokens per second'.
187
 
188
  | Input | 4k | 8k | 12k | 16k | 32k | 64k |
189
  |---------|-------|-------|-------|-------|-------|-------|
190
+ |Qwen32 | 53.99 | 47.59 | 25.98 | 24.35 | 18.64 | 11.86 <tr><td>OrionMOE</td> <td class="orion">**90.86**</td> <td class="orion">**54.40**</td> <td class="orion">**31.08**</td> <td class="orion">**29.04**</td> <td class="orion">**22.69**</td> <td class="orion">**14.51**</td> </tr>
 
 
 
191
 
192
  <a name="model-inference"></a><br>
193
  # 4. Model Inference
 
203
  from transformers import AutoModelForCausalLM, AutoTokenizer
204
  from transformers.generation.utils import GenerationConfig
205
 
206
+ tokenizer = AutoTokenizer.from_pretrained("OrionStarAI/Orion-MoE8x7B",
207
  use_fast=False,
208
  trust_remote_code=True)
209
+ model = AutoModelForCausalLM.from_pretrained("OrionStarAI/Orion-MoE8x7B",
210
  device_map="auto",
211
  torch_dtype=torch.bfloat16,
212
  trust_remote_code=True)
213
 
214
+ model.generation_config = GenerationConfig.from_pretrained("OrionStarAI/Orion-MoE8x7B")
215
  messages = [{"role": "user", "content": "Hello, what is your name? "}]
216
  response = model.chat(tokenizer, messages, streaming=False)
217
  print(response)
 
226
  ```shell
227
 
228
  # foundation model
229
+ CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python demo/text_generation_base.py --model OrionStarAI/Orion-MoE8x7B --tokenizer OrionStarAI/Orion-MoE8x7B --prompt hello
230
 
231
  ```
232
  ## 4.3. vLLM Inference Service
 
238
  ```
239
  Start docker service
240
  ```shell
241
+ docker run --gpus all -it -p 9999:9999 -v $(pwd)/logs:/workspace/logs:rw -v $HOME/Downloads:/workspace/models -e CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 -e MODEL_DIR=Orion-MoE8x7B -e MODEL_NAME=orion-moe vllm_server:0.0.0.0
242
  ```
243
  Run inference
244
  ```shell
 
251
 
252
  ## 5.1. Declarations
253
 
254
+ We strongly urge all users not to use the Orion-MoE8x7B model for any activities that may harm national or social security or violate the law.
255
+ Additionally, we request users not to use the Orion-MoE8x7B model for internet services without proper security review and filing.
256
  We hope all users abide by this principle to ensure that technological development takes place in a regulated and legal environment.
257
  We have done our best to ensure the compliance of the data used in the model training process. However, despite our
258
  significant efforts, unforeseen issues may still arise due to the complexity of the model and data. Therefore, if any
259
+ problems arise due to the use of the Orion-MoE8x7B open-source model, including but not limited to data security
260
  issues, public opinion risks, or any risks and issues arising from the model being misled, abused, disseminated, or
261
  improperly utilized, we will not assume any responsibility.
262
 
263
  ## 5.2. License
264
 
265
+ Community use of the Orion-MoE8x7B series models
266
  - For code, please comply with [Apache License Version 2.0](./LICENSE)<br>
267
  - For model, please comply with [【Orion Series】 Models Community License Agreement](./ModelsCommunityLicenseAgreement)
268