Introduction
This is the evaluation script used to reproduce math benchmarks scores for AceMath-1.5B/7B/72B-Instruct models based on their outputs. The benchmark can be downloaded from Qwen2.5-Math.
Calculate Scores
python calculate_scores.py
This is the evaluation script used to reproduce math benchmarks scores for AceMath-1.5B/7B/72B-Instruct models based on their outputs. The benchmark can be downloaded from Qwen2.5-Math.
python calculate_scores.py