Spaces:
Running
Running
Unexpected Intel Xeon performance on the leaderboard
#35
by
Aaaaadore
- opened
Hi, we observed that in Intel Xeon results of the leaderboard, the prefill latency of BF16 is generally larger than FP32, which is not expected: e.g., qwen1.5-7b:
We tried the benchmark on c7i-8xlarge AWS instance with optimum-benchmark tool, and the results show BF16 has lower latency than FP32:
Prefill (s) | Decode (tokens/s) | |
---|---|---|
fp32-eager | 1.612 | 5.330 |
bf16-eager | 0.377 | 7.300 |
Could there be a misalignment in the performance collection ? We did the benchmark with optimum-benchmark CLI and config is as below:
defaults:
- benchmark
- scenario: inference
- launcher: process
- backend: pytorch
- _base_
- _self_
name: cpu_pytorch_qwen
launcher:
numactl: true
numactl_kwargs:
cpunodebind: 0
membind: 0
backend:
device: cpu
# export: true
no_weights: false # on multi-node machines, intializing weights in the benchmark could harm performance
torch_dtype: bfloat16 # use bfloat16 on compatible Intel CPUs
model: Qwen/Qwen1.5-7B
scenario:
memory: true
latency: true
input_shapes:
batch_size: 1
sequence_length: 256
generate_kwargs:
max_new_tokens: 64
min_new_tokens: 64