renillhuang commited on
Commit
3432261
·
verified ·
1 Parent(s): 8ff7099

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -3
README.md CHANGED
@@ -57,7 +57,7 @@ tags:
57
 
58
  - Model Architecture
59
 
60
- |Configuration |OrionMOE 8x7B|
61
  |-------------------|-------------|
62
  |Hidden Size | 4096 |
63
  |# Layers | 32 |
@@ -180,14 +180,14 @@ Test code: https://github.com/nishiwen1214/Benchmark-leakage-detection.
180
  Setup inference server on 8x Nvidia RTX3090, and get results from client in unit of 'tokens per second'.
181
  |Models | 8x3090 1 concurrent | 8x3090 4 concurrent | 4xA100 1 concurrent | 4xA100 4 concurrent|
182
  |---------|--------|-------|--------|-------|
183
- |Qwen32 | 52.93 | 46.06 | 62.43 | 56.81 <tr><td>OrionMOE</td> <td class="orion">**102.77**</td> <td class="orion">**54.61**</td> <td class="orion">**107.76**</td> <td class="orion">**61.83**</td> </tr>
184
 
185
  <br>
186
  We also tested on a 4x A100, comparing inference speeds based on different input lengths (tokens), get results from client in unit of 'tokens per second'.
187
 
188
  | Input | 4k | 8k | 12k | 16k | 32k | 64k |
189
  |---------|-------|-------|-------|-------|-------|-------|
190
- |Qwen32 | 53.99 | 47.59 | 25.98 | 24.35 | 18.64 | 11.86 <tr><td>OrionMOE</td> <td class="orion">**90.86**</td> <td class="orion">**54.40**</td> <td class="orion">**31.08**</td> <td class="orion">**29.04**</td> <td class="orion">**22.69**</td> <td class="orion">**14.51**</td> </tr>
191
 
192
  <a name="model-inference"></a><br>
193
  # 4. Model Inference
 
57
 
58
  - Model Architecture
59
 
60
+ |Configuration |Orion-MoE 8x7B|
61
  |-------------------|-------------|
62
  |Hidden Size | 4096 |
63
  |# Layers | 32 |
 
180
  Setup inference server on 8x Nvidia RTX3090, and get results from client in unit of 'tokens per second'.
181
  |Models | 8x3090 1 concurrent | 8x3090 4 concurrent | 4xA100 1 concurrent | 4xA100 4 concurrent|
182
  |---------|--------|-------|--------|-------|
183
+ |Qwen32 | 52.93 | 46.06 | 62.43 | 56.81 <tr><td>Orion-MoE</td> <td class="orion">**102.77**</td> <td class="orion">**54.61**</td> <td class="orion">**107.76**</td> <td class="orion">**61.83**</td> </tr>
184
 
185
  <br>
186
  We also tested on a 4x A100, comparing inference speeds based on different input lengths (tokens), get results from client in unit of 'tokens per second'.
187
 
188
  | Input | 4k | 8k | 12k | 16k | 32k | 64k |
189
  |---------|-------|-------|-------|-------|-------|-------|
190
+ |Qwen32 | 53.99 | 47.59 | 25.98 | 24.35 | 18.64 | 11.86 <tr><td>Orion-MoE</td> <td class="orion">**90.86**</td> <td class="orion">**54.40**</td> <td class="orion">**31.08**</td> <td class="orion">**29.04**</td> <td class="orion">**22.69**</td> <td class="orion">**14.51**</td> </tr>
191
 
192
  <a name="model-inference"></a><br>
193
  # 4. Model Inference