renillhuang
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -20,14 +20,14 @@ tags:
|
|
20 |
|
21 |
<div align="center">
|
22 |
<h1>
|
23 |
-
Orion-
|
24 |
</h1>
|
25 |
</div>
|
26 |
|
27 |
<div align="center">
|
28 |
|
29 |
<div align="center">
|
30 |
-
<b>🌐English</b> | <a href="https://huggingface.co/OrionStarAI/Orion-
|
31 |
</div>
|
32 |
|
33 |
|
@@ -48,7 +48,7 @@ tags:
|
|
48 |
<a name="model-introduction"></a><br>
|
49 |
# 1. Model Introduction
|
50 |
|
51 |
-
- Orion-
|
52 |
|
53 |
- Key Features of Orion-MoE8x7B
|
54 |
- The model demonstrates exceptional performance in comprehensive evaluations compared to other models of the same parameter scale.
|
@@ -91,105 +91,103 @@ Model release and download links are provided in the table below:
|
|
91 |
|
92 |
| Model Name | HuggingFace Download Links | ModelScope Download Links |
|
93 |
|------------|----------------------------|---------------------------|
|
94 |
-
| ⚾Orion-
|
95 |
|
96 |
|
97 |
<a name="model-benchmark"></a><br>
|
98 |
# 3. Model Benchmarks
|
99 |
|
100 |
-
|
101 |
-
### 3.1.1. LLM evaluation results on examination and professional knowledge
|
102 |
|
103 |
<style>
|
104 |
table th {
|
105 |
background-color: #f2f2f2;
|
106 |
}
|
107 |
|
108 |
-
|
109 |
-
|
110 |
-
background-color: #e6ffe6;
|
111 |
}
|
112 |
-
|
|
|
|
|
|
|
|
|
113 |
</style>
|
114 |
|
115 |
-
|
116 |
-
|
|
117 |
-
|
|
118 |
-
| MMLU
|
119 |
-
|
|
120 |
-
|
|
121 |
-
|
|
122 |
-
|
|
123 |
-
|
|
124 |
-
|
|
125 |
-
|
|
126 |
-
|
|
127 |
-
|
|
128 |
-
|
|
129 |
-
|
|
130 |
-
|
|
131 |
-
|
132 |
-
|
133 |
-
|
134 |
-
|
135 |
-
|
136 |
-
|
137 |
-
|
138 |
-
|
139 |
-
|
|
140 |
-
|
|
141 |
-
|
|
142 |
-
|Orion-
|
143 |
-
|
144 |
-
|
145 |
-
|
146 |
-
|
147 |
-
|
148 |
-
|
|
149 |
-
|
|
150 |
-
|
|
151 |
-
|Orion-
|
152 |
-
|
153 |
-
|
154 |
-
|
|
|
155 |
| Language | Spanish | | French | | German | | Arabic | |
|
156 |
|----|----|----|----|----|----|----|----|----|
|
157 |
|**Model**|**HellaSwag**|**ARC**|**HellaSwag**|**ARC**|**HellaSwag**|**ARC**|**HellaSwag**|**ARC**|
|
158 |
|Mixtral-8x7B |74.3 |54.8 |73.9 |55.9 |69.2 |52.4 |47.9 |36.3 |
|
159 |
|Qwen1.5-32B |70.5 |55.1 |68.9 |56.0 |63.8 |50.8 |50.1 |40.0 |
|
160 |
|Qwen2.5-32B |75.0 |65.3 |74.2 |62.7 |69.8 |61.8 |59.8 |52.9 |
|
161 |
-
|Orion-14B |62.0 |44.6 |60.2 |42.3 |54.7 |38.9 |42.3 |33.9
|
162 |
-
|Orion-MOE8x7B|<span style="background-color: #add8e6;">**87.4**</span> |<span style="background-color: #add8e6;">**70.1**</span> |<span style="background-color: #add8e6;">**85.6**</span> |<span style="background-color: #add8e6;">**68.8**</span> |<span style="background-color: #add8e6;">**80.6**</span> |<span style="background-color: #add8e6;">**63.5**</span> |<span style="background-color: #add8e6;">**69.4**</span> |<span style="background-color: #add8e6;">**54.3</span>** |
|
163 |
|
164 |
-
### 3.
|
165 |
When the pre-training data of a large language model contains content from a specific dataset, the model’s performance on that dataset may be artificially enhanced, leading to inaccurate performance evaluations. To address this issue, researchers from the Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, and other institutions have proposed a simple and effective method for detecting data leakage. This method leverages the interchangeable nature of multiple-choice options by shuffling the options in the original dataset to generate derived data. The log-probability distribution of the derived dataset is then computed using the model to detect whether the original dataset has been leaked.
|
166 |
|
167 |
We conducted data leakage detection experiments on three benchmark datasets: MMLU, CMMLU, and C-Eval.<br>
|
168 |
More details can be found in the paper: https://web3.arxiv.org/pdf/2409.01790.<br>
|
169 |
Test code: https://github.com/nishiwen1214/Benchmark-leakage-detection.
|
170 |
|
171 |
-
|Threshold 0.2|Qwen2.5 32B|Qwen1.5 32B|Orion
|
172 |
|------|------|------|------|------|------|
|
173 |
-
|MMLU | 0.30 | 0.27 |
|
174 |
-
|CEval | 0.39 | 0.38 |
|
175 |
-
|CMMLU | 0.38 | 0.39 |
|
176 |
|
177 |
-
### 3.
|
178 |
Setup inference server on 8x Nvidia RTX3090, and get results from client in unit of 'tokens per second'.
|
179 |
-
|Models | 8x3090
|
180 |
|---------|--------|-------|--------|-------|
|
181 |
-
|
|
182 |
-
|Qwen32 | 52.93 | 46.06 | 62.43 | 56.81 |
|
183 |
|
184 |
<br>
|
185 |
We also tested on a 4x A100, comparing inference speeds based on different input lengths (tokens), get results from client in unit of 'tokens per second'.
|
186 |
|
187 |
| Input | 4k | 8k | 12k | 16k | 32k | 64k |
|
188 |
|---------|-------|-------|-------|-------|-------|-------|
|
189 |
-
|
|
190 |
-
|Qwen32 | 53.99 | 47.59 | 25.98 | 24.35 | 18.64 | 11.86 |
|
191 |
-
|
192 |
-
|
193 |
|
194 |
<a name="model-inference"></a><br>
|
195 |
# 4. Model Inference
|
@@ -205,15 +203,15 @@ import torch
|
|
205 |
from transformers import AutoModelForCausalLM, AutoTokenizer
|
206 |
from transformers.generation.utils import GenerationConfig
|
207 |
|
208 |
-
tokenizer = AutoTokenizer.from_pretrained("OrionStarAI/Orion-
|
209 |
use_fast=False,
|
210 |
trust_remote_code=True)
|
211 |
-
model = AutoModelForCausalLM.from_pretrained("OrionStarAI/Orion-
|
212 |
device_map="auto",
|
213 |
torch_dtype=torch.bfloat16,
|
214 |
trust_remote_code=True)
|
215 |
|
216 |
-
model.generation_config = GenerationConfig.from_pretrained("OrionStarAI/Orion-
|
217 |
messages = [{"role": "user", "content": "Hello, what is your name? "}]
|
218 |
response = model.chat(tokenizer, messages, streaming=False)
|
219 |
print(response)
|
@@ -228,7 +226,7 @@ device, you can use something like `export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7`
|
|
228 |
```shell
|
229 |
|
230 |
# foundation model
|
231 |
-
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python demo/text_generation_base.py --model OrionStarAI/Orion-
|
232 |
|
233 |
```
|
234 |
## 4.3. vLLM Inference Service
|
@@ -240,7 +238,7 @@ docker build -t vllm_server:0.0.0.0 -f Dockerfile .
|
|
240 |
```
|
241 |
Start docker service
|
242 |
```shell
|
243 |
-
docker run --gpus all -it -p 9999:9999 -v $(pwd)/logs:/workspace/logs:rw -v $HOME/Downloads:/workspace/models -e CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 -e MODEL_DIR=Orion-
|
244 |
```
|
245 |
Run inference
|
246 |
```shell
|
@@ -253,18 +251,18 @@ curl http://0.0.0.0:9999/v1/chat/completions -H "Content-Type: application/json"
|
|
253 |
|
254 |
## 5.1. Declarations
|
255 |
|
256 |
-
We strongly urge all users not to use the Orion-
|
257 |
-
Additionally, we request users not to use the Orion-
|
258 |
We hope all users abide by this principle to ensure that technological development takes place in a regulated and legal environment.
|
259 |
We have done our best to ensure the compliance of the data used in the model training process. However, despite our
|
260 |
significant efforts, unforeseen issues may still arise due to the complexity of the model and data. Therefore, if any
|
261 |
-
problems arise due to the use of the Orion-
|
262 |
issues, public opinion risks, or any risks and issues arising from the model being misled, abused, disseminated, or
|
263 |
improperly utilized, we will not assume any responsibility.
|
264 |
|
265 |
## 5.2. License
|
266 |
|
267 |
-
Community use of the Orion-
|
268 |
- For code, please comply with [Apache License Version 2.0](./LICENSE)<br>
|
269 |
- For model, please comply with [【Orion Series】 Models Community License Agreement](./ModelsCommunityLicenseAgreement)
|
270 |
|
|
|
20 |
|
21 |
<div align="center">
|
22 |
<h1>
|
23 |
+
Orion-MoE8x7B
|
24 |
</h1>
|
25 |
</div>
|
26 |
|
27 |
<div align="center">
|
28 |
|
29 |
<div align="center">
|
30 |
+
<b>🌐English</b> | <a href="https://huggingface.co/OrionStarAI/Orion-MoE8x7B/blob/main/README_zh.md" target="_blank">🇨🇳中文</a>
|
31 |
</div>
|
32 |
|
33 |
|
|
|
48 |
<a name="model-introduction"></a><br>
|
49 |
# 1. Model Introduction
|
50 |
|
51 |
+
- Orion-MoE8x7B is a pretrained foundation large language model with a sparse Mixture of Experts (MoE) architecture. The model is trained from scratch on a multilingual corpus comprising approximately 5 trillion tokens, including launguages such as Chinese, English, Japanese, Korean, and more.
|
52 |
|
53 |
- Key Features of Orion-MoE8x7B
|
54 |
- The model demonstrates exceptional performance in comprehensive evaluations compared to other models of the same parameter scale.
|
|
|
91 |
|
92 |
| Model Name | HuggingFace Download Links | ModelScope Download Links |
|
93 |
|------------|----------------------------|---------------------------|
|
94 |
+
| ⚾Orion-MoE8x7B | [Orion-MoE8x7B](https://huggingface.co/OrionStarAI/Orion-MoE8x7B) | [Orion-MoE8x7B](https://modelscope.cn/models/OrionStarAI/Orion-MoE8x7B-Base/summary) |
|
95 |
|
96 |
|
97 |
<a name="model-benchmark"></a><br>
|
98 |
# 3. Model Benchmarks
|
99 |
|
100 |
+
### 3.1. LLM evaluation results on examination and professional knowledge
|
|
|
101 |
|
102 |
<style>
|
103 |
table th {
|
104 |
background-color: #f2f2f2;
|
105 |
}
|
106 |
|
107 |
+
td.orion{
|
108 |
+
background-color: #e6ffe6;
|
|
|
109 |
}
|
110 |
+
|
111 |
+
td.avg{
|
112 |
+
background-color: #ffffe0;
|
113 |
+
}
|
114 |
+
|
115 |
</style>
|
116 |
|
117 |
+
|
118 |
+
|TestSet|Mixtral 8x7B|Qwen1.5-32b|Qwen2.5-32b|Orion 14B |Qwen2-57B-A14 <th> Orion MoE8x7B</th>
|
119 |
+
| -------------- | ---- | ---- | ---- | ---- | ----
|
120 |
+
| MMLU | 70.4 | 73.4 | 82.9 | 69.9 | 76.5 <td class="orion">**85.9**</td>
|
121 |
+
| MMLU Pro | 38.5 | 45.3 | 58.0 | 34.0 |48.6 <td class="orion">**58.3**</td>
|
122 |
+
| CEval | 54.1 | 83.5 | 87.7 | 72.8 | 87.7 <td class="orion">**89.7**</td>
|
123 |
+
| CMMLU | 53.2 | 82.3 | 89.0 | 70.6 | 88.5 <td class="orion">**89.2**</td>
|
124 |
+
| ARC_c | 85.1 | 90.2 | **94.2** | 79.7 |91.5 <td class="orion">91.9</td>
|
125 |
+
| HellaSwag | 81.9 | 82.0 | 82.5 | 78.5 | 85.2 <td class="orion">**89.2**</td>
|
126 |
+
| LAMBADA | 76.8 | 73.7 | 75.4 | 78.8 | 72.6 <td class="orion">**79.7**</td>
|
127 |
+
| BBH | 50.9 | 57.3 | **67.7** | 50.4 | 55.1 <td class="orion">55.8</td>
|
128 |
+
| MuSR | 43.2 | 42.7 | 49.8 | 43.6 | 39.0 <td class="orion">**49.9**</td>
|
129 |
+
| PIQA | 83.4 | 82.2 | 80.1 | 79.5 | 81.9 <td class="orion">**87.3**</td>
|
130 |
+
| CommonSenseQA | 69.6 | **74.7** | 73.0 | 66.9 | 69.9 <td class="orion">73.1</td>
|
131 |
+
| IFEval | 24.2 | 33.0 | **41.6** | 29.1 | 31.2 <td class="orion">30.1</td>
|
132 |
+
| GQPA | 30.9 | 33.5 | 49.5 | 28.5 | 32.6 <td class="orion">**52.2**</td>
|
133 |
+
| HumanEval | 33.5 | 36.0 | **47.0** | 20.1 | 53.0 <td class="orion">44.5</td>
|
134 |
+
|
135 |
+
|
136 |
+
|
137 |
+
|
138 |
+
### 3.2. Comparison of LLM performances on Japanese testsets
|
139 |
+
|Model <th>Average</th>|JSQuAD|JCommonSenseQA|JNLI|MARC-ja|JAQKET v2|PAWS-ja|
|
140 |
+
|-------------|-------|-------|---------------|-----|-------|---------|
|
141 |
+
|Mixtral-8x7B <td class="avg">69.8</td> |89.0 |78.7 |32.1 |95.4 |78.9 |44.5 |
|
142 |
+
|Qwen1.5-32B <td class="avg">74.7</td> |89.9 |84.5 |51.0 |97.1 |82.1 |43.8 |
|
143 |
+
|Qwen2.5-32B <td class="avg">80.7</td> |89.1 |93.8 |72.1 |**97.9** |**89.3** |42.2 |
|
144 |
+
|Orion-14B <td class="avg">74.2</td> |74.2 |88.2 |72.8 |94.1 |66.2 |49.9 |
|
145 |
+
|Orion-MoE8x7B <td class="avg">**82.9**</td> | **91.8** | 90.4 | **90.5** | 96.4 | 81.2 | **47.4** |
|
146 |
+
|
147 |
+
### 3.3. Comparison of LLM performances on Korean testsets
|
148 |
+
|Model <th>Average</th>|HAE-RAE|KoBEST BoolQ|KoBEST COPA|KoBEST HellaSwag|KoBEST SentiNeg|KoBEST WiC|PAWS-ko|
|
149 |
+
|-----|-------|-------|------------|-----------|----------------|---------------|----------|
|
150 |
+
|Mixtral-8x7B <td class="avg">60.7</td> |53.2 |78.6 |66.2 |56.6 |77.1 |49.4 |44.1 |
|
151 |
+
|Qwen1.5-32B <td class="avg">58.6</td> |46.4 |76.3 |60.4 |53.0 |78.3 |52.1 |43.4 |
|
152 |
+
|Qwen2.5-32B <td class="avg">71.4</td> |**70.7** |80.3 |76.7 |**61.2** |96.5 |**77.2** |37.1 |
|
153 |
+
|Orion-14B <td class="avg">67.7</td> |69.7 |80.6 |77.1 |58.2 |92.4 |51.2 |44.6 |
|
154 |
+
|Orion-MoE8x7B <td class="avg">**72.0**</td> | 65.2 | **85.4** | **80.4** | 56.0 | **97.0** | 73.6 | **46.4** |
|
155 |
+
|
156 |
+
|
157 |
+
### 3.4. Comparison of LLM performances on Arabic, German, French, and Spanish testsets
|
158 |
| Language | Spanish | | French | | German | | Arabic | |
|
159 |
|----|----|----|----|----|----|----|----|----|
|
160 |
|**Model**|**HellaSwag**|**ARC**|**HellaSwag**|**ARC**|**HellaSwag**|**ARC**|**HellaSwag**|**ARC**|
|
161 |
|Mixtral-8x7B |74.3 |54.8 |73.9 |55.9 |69.2 |52.4 |47.9 |36.3 |
|
162 |
|Qwen1.5-32B |70.5 |55.1 |68.9 |56.0 |63.8 |50.8 |50.1 |40.0 |
|
163 |
|Qwen2.5-32B |75.0 |65.3 |74.2 |62.7 |69.8 |61.8 |59.8 |52.9 |
|
164 |
+
|Orion-14B |62.0 |44.6 |60.2 |42.3 |54.7 |38.9 |42.3 |33.9 <tr><td> Orion-MoE8x7B</td> <td class="orion">**87.4**</td> <td class="orion">**70.1**</td> <td class="orion">**85.6**</td> <td class="orion">**68.8**</td> <td class="orion">**80.6**</td> <td class="orion">**63.5**</td> <td class="orion">**69.4**</td> <td class="orion">**54.3</td>** </tr>
|
|
|
165 |
|
166 |
+
### 3.5. Leakage Detection Benchmark
|
167 |
When the pre-training data of a large language model contains content from a specific dataset, the model’s performance on that dataset may be artificially enhanced, leading to inaccurate performance evaluations. To address this issue, researchers from the Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, and other institutions have proposed a simple and effective method for detecting data leakage. This method leverages the interchangeable nature of multiple-choice options by shuffling the options in the original dataset to generate derived data. The log-probability distribution of the derived dataset is then computed using the model to detect whether the original dataset has been leaked.
|
168 |
|
169 |
We conducted data leakage detection experiments on three benchmark datasets: MMLU, CMMLU, and C-Eval.<br>
|
170 |
More details can be found in the paper: https://web3.arxiv.org/pdf/2409.01790.<br>
|
171 |
Test code: https://github.com/nishiwen1214/Benchmark-leakage-detection.
|
172 |
|
173 |
+
|Threshold 0.2|Qwen2.5 32B|Qwen1.5 32B| Orion MoE8x7B |Orion 14B|Mixtral 8x7B|
|
174 |
|------|------|------|------|------|------|
|
175 |
+
|MMLU | 0.30 | 0.27 | 0.22 | 0.28 | 0.25 |
|
176 |
+
|CEval | 0.39 | 0.38 | 0.27 | 0.26 | 0.26 |
|
177 |
+
|CMMLU | 0.38 | 0.39 | 0.23 | 0.27 | 0.22 |
|
178 |
|
179 |
+
### 3.6. Inference speed
|
180 |
Setup inference server on 8x Nvidia RTX3090, and get results from client in unit of 'tokens per second'.
|
181 |
+
|Models | 8x3090 1 concurrent | 8x3090 4 concurrent | 4xA100 1 concurrent | 4xA100 4 concurrent|
|
182 |
|---------|--------|-------|--------|-------|
|
183 |
+
|Qwen32 | 52.93 | 46.06 | 62.43 | 56.81 <tr><td>OrionMOE</td> <td class="orion">**102.77**</td> <td class="orion">**54.61**</td> <td class="orion">**107.76**</td> <td class="orion">**61.83**</td> </tr>
|
|
|
184 |
|
185 |
<br>
|
186 |
We also tested on a 4x A100, comparing inference speeds based on different input lengths (tokens), get results from client in unit of 'tokens per second'.
|
187 |
|
188 |
| Input | 4k | 8k | 12k | 16k | 32k | 64k |
|
189 |
|---------|-------|-------|-------|-------|-------|-------|
|
190 |
+
|Qwen32 | 53.99 | 47.59 | 25.98 | 24.35 | 18.64 | 11.86 <tr><td>OrionMOE</td> <td class="orion">**90.86**</td> <td class="orion">**54.40**</td> <td class="orion">**31.08**</td> <td class="orion">**29.04**</td> <td class="orion">**22.69**</td> <td class="orion">**14.51**</td> </tr>
|
|
|
|
|
|
|
191 |
|
192 |
<a name="model-inference"></a><br>
|
193 |
# 4. Model Inference
|
|
|
203 |
from transformers import AutoModelForCausalLM, AutoTokenizer
|
204 |
from transformers.generation.utils import GenerationConfig
|
205 |
|
206 |
+
tokenizer = AutoTokenizer.from_pretrained("OrionStarAI/Orion-MoE8x7B",
|
207 |
use_fast=False,
|
208 |
trust_remote_code=True)
|
209 |
+
model = AutoModelForCausalLM.from_pretrained("OrionStarAI/Orion-MoE8x7B",
|
210 |
device_map="auto",
|
211 |
torch_dtype=torch.bfloat16,
|
212 |
trust_remote_code=True)
|
213 |
|
214 |
+
model.generation_config = GenerationConfig.from_pretrained("OrionStarAI/Orion-MoE8x7B")
|
215 |
messages = [{"role": "user", "content": "Hello, what is your name? "}]
|
216 |
response = model.chat(tokenizer, messages, streaming=False)
|
217 |
print(response)
|
|
|
226 |
```shell
|
227 |
|
228 |
# foundation model
|
229 |
+
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python demo/text_generation_base.py --model OrionStarAI/Orion-MoE8x7B --tokenizer OrionStarAI/Orion-MoE8x7B --prompt hello
|
230 |
|
231 |
```
|
232 |
## 4.3. vLLM Inference Service
|
|
|
238 |
```
|
239 |
Start docker service
|
240 |
```shell
|
241 |
+
docker run --gpus all -it -p 9999:9999 -v $(pwd)/logs:/workspace/logs:rw -v $HOME/Downloads:/workspace/models -e CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 -e MODEL_DIR=Orion-MoE8x7B -e MODEL_NAME=orion-moe vllm_server:0.0.0.0
|
242 |
```
|
243 |
Run inference
|
244 |
```shell
|
|
|
251 |
|
252 |
## 5.1. Declarations
|
253 |
|
254 |
+
We strongly urge all users not to use the Orion-MoE8x7B model for any activities that may harm national or social security or violate the law.
|
255 |
+
Additionally, we request users not to use the Orion-MoE8x7B model for internet services without proper security review and filing.
|
256 |
We hope all users abide by this principle to ensure that technological development takes place in a regulated and legal environment.
|
257 |
We have done our best to ensure the compliance of the data used in the model training process. However, despite our
|
258 |
significant efforts, unforeseen issues may still arise due to the complexity of the model and data. Therefore, if any
|
259 |
+
problems arise due to the use of the Orion-MoE8x7B open-source model, including but not limited to data security
|
260 |
issues, public opinion risks, or any risks and issues arising from the model being misled, abused, disseminated, or
|
261 |
improperly utilized, we will not assume any responsibility.
|
262 |
|
263 |
## 5.2. License
|
264 |
|
265 |
+
Community use of the Orion-MoE8x7B series models
|
266 |
- For code, please comply with [Apache License Version 2.0](./LICENSE)<br>
|
267 |
- For model, please comply with [【Orion Series】 Models Community License Agreement](./ModelsCommunityLicenseAgreement)
|
268 |
|