renillhuang
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -57,7 +57,7 @@ tags:
|
|
57 |
|
58 |
- Model Architecture
|
59 |
|
60 |
-
|Configuration |
|
61 |
|-------------------|-------------|
|
62 |
|Hidden Size | 4096 |
|
63 |
|# Layers | 32 |
|
@@ -180,14 +180,14 @@ Test code: https://github.com/nishiwen1214/Benchmark-leakage-detection.
|
|
180 |
Setup inference server on 8x Nvidia RTX3090, and get results from client in unit of 'tokens per second'.
|
181 |
|Models | 8x3090 1 concurrent | 8x3090 4 concurrent | 4xA100 1 concurrent | 4xA100 4 concurrent|
|
182 |
|---------|--------|-------|--------|-------|
|
183 |
-
|Qwen32 | 52.93 | 46.06 | 62.43 | 56.81 <tr><td>
|
184 |
|
185 |
<br>
|
186 |
We also tested on a 4x A100, comparing inference speeds based on different input lengths (tokens), get results from client in unit of 'tokens per second'.
|
187 |
|
188 |
| Input | 4k | 8k | 12k | 16k | 32k | 64k |
|
189 |
|---------|-------|-------|-------|-------|-------|-------|
|
190 |
-
|Qwen32 | 53.99 | 47.59 | 25.98 | 24.35 | 18.64 | 11.86 <tr><td>
|
191 |
|
192 |
<a name="model-inference"></a><br>
|
193 |
# 4. Model Inference
|
|
|
57 |
|
58 |
- Model Architecture
|
59 |
|
60 |
+
|Configuration |Orion-MoE 8x7B|
|
61 |
|-------------------|-------------|
|
62 |
|Hidden Size | 4096 |
|
63 |
|# Layers | 32 |
|
|
|
180 |
Setup inference server on 8x Nvidia RTX3090, and get results from client in unit of 'tokens per second'.
|
181 |
|Models | 8x3090 1 concurrent | 8x3090 4 concurrent | 4xA100 1 concurrent | 4xA100 4 concurrent|
|
182 |
|---------|--------|-------|--------|-------|
|
183 |
+
|Qwen32 | 52.93 | 46.06 | 62.43 | 56.81 <tr><td>Orion-MoE</td> <td class="orion">**102.77**</td> <td class="orion">**54.61**</td> <td class="orion">**107.76**</td> <td class="orion">**61.83**</td> </tr>
|
184 |
|
185 |
<br>
|
186 |
We also tested on a 4x A100, comparing inference speeds based on different input lengths (tokens), get results from client in unit of 'tokens per second'.
|
187 |
|
188 |
| Input | 4k | 8k | 12k | 16k | 32k | 64k |
|
189 |
|---------|-------|-------|-------|-------|-------|-------|
|
190 |
+
|Qwen32 | 53.99 | 47.59 | 25.98 | 24.35 | 18.64 | 11.86 <tr><td>Orion-MoE</td> <td class="orion">**90.86**</td> <td class="orion">**54.40**</td> <td class="orion">**31.08**</td> <td class="orion">**29.04**</td> <td class="orion">**22.69**</td> <td class="orion">**14.51**</td> </tr>
|
191 |
|
192 |
<a name="model-inference"></a><br>
|
193 |
# 4. Model Inference
|