zihanliu's picture
Upload 3 files
5689fb0 verified
|
raw
history blame
336 Bytes

Introduction

This is the evaluation script used to reproduce math benchmarks scores for AceMath-1.5B/7B/72B-Instruct models based on their outputs. The benchmark can be downloaded from Qwen2.5-Math.

Calculate Scores

python calculate_scores.py