OrionStarAI
/

Orion-MoE8x7B

Text Generation

Model card Files Files and versions Community

renillhuang commited on Nov 26, 2024

Commit

3432261

·

verified ·

1 Parent(s): 8ff7099

Update README.md

Files changed (1) hide show

README.md +3 -3

README.md CHANGED Viewed

@@ -57,7 +57,7 @@ tags:
 - Model Architecture
-    |Configuration      |OrionMOE 8x7B|
     |-------------------|-------------|
     |Hidden Size        | 4096        |
     |# Layers           | 32          |
@@ -180,14 +180,14 @@ Test code: https://github.com/nishiwen1214/Benchmark-leakage-detection.
 Setup inference server on 8x Nvidia RTX3090， and get results from client in unit of 'tokens per second'.
 |Models | 8x3090 1 concurrent | 8x3090 4 concurrent | 4xA100 1 concurrent | 4xA100 4 concurrent|
 |---------|--------|-------|--------|-------|
-|Qwen32   | 52.93  | 46.06 | 62.43  | 56.81  <tr><td>OrionMOE</td>  <td class="orion">**102.77**</td>  <td class="orion">**54.61**</td>  <td class="orion">**107.76**</td>  <td class="orion">**61.83**</td> </tr>
 <br>
 We also tested on a 4x A100, comparing inference speeds based on different input lengths (tokens), get results from client in unit of 'tokens per second'.
 | Input | 4k | 8k | 12k | 16k | 32k | 64k |
 |---------|-------|-------|-------|-------|-------|-------|
-|Qwen32   | 53.99 | 47.59 | 25.98 | 24.35 | 18.64 | 11.86 <tr><td>OrionMOE</td>  <td class="orion">**90.86**</td>  <td class="orion">**54.40**</td>  <td class="orion">**31.08**</td>  <td class="orion">**29.04**</td>  <td class="orion">**22.69**</td>  <td class="orion">**14.51**</td> </tr>
 <a name="model-inference"></a><br>
 # 4. Model Inference

 - Model Architecture
+    |Configuration      |Orion-MoE 8x7B|
     |-------------------|-------------|
     |Hidden Size        | 4096        |
     |# Layers           | 32          |
 Setup inference server on 8x Nvidia RTX3090， and get results from client in unit of 'tokens per second'.
 |Models | 8x3090 1 concurrent | 8x3090 4 concurrent | 4xA100 1 concurrent | 4xA100 4 concurrent|
 |---------|--------|-------|--------|-------|
+|Qwen32   | 52.93  | 46.06 | 62.43  | 56.81  <tr><td>Orion-MoE</td>  <td class="orion">**102.77**</td>  <td class="orion">**54.61**</td>  <td class="orion">**107.76**</td>  <td class="orion">**61.83**</td> </tr>
 <br>
 We also tested on a 4x A100, comparing inference speeds based on different input lengths (tokens), get results from client in unit of 'tokens per second'.
 | Input | 4k | 8k | 12k | 16k | 32k | 64k |
 |---------|-------|-------|-------|-------|-------|-------|
+|Qwen32   | 53.99 | 47.59 | 25.98 | 24.35 | 18.64 | 11.86 <tr><td>Orion-MoE</td>  <td class="orion">**90.86**</td>  <td class="orion">**54.40**</td>  <td class="orion">**31.08**</td>  <td class="orion">**29.04**</td>  <td class="orion">**22.69**</td>  <td class="orion">**14.51**</td> </tr>
 <a name="model-inference"></a><br>
 # 4. Model Inference