U-MATH and μ-MATH - University-level math evaluation
Collection
Paper: A UNIVERSITY-LEVEL BENCHMARK FOR EVALUATING MATHEMATICAL SKILLS IN LLMS
•
4 items
•
Updated
•
15
Human In The Loop - data labeling, model training and hosting, human verification, and more