|---|---|---| | |
| 수학(Math) | 5.86 | 5.14 | | |
| 문법(Grammar) | 4.71 | 1.29 | | |
| 이해(Understanding) | 4.00 | 4.43 | | |
| 추론(Reasoning) | 5.14 | 6.71 | | |
| 코딩(Coding) | 7.43 | 7.57 | | |
| 글쓰기(Writing) | 8.43 | 8.00 | | |
| Category | Score | | |
|---|---| | |
| Single turn | 5.93 | | |
| Multi turn | 5.52 | | |
| Overall | 5.73 | | |
| Tasks |Version| Filter |n-shot| Metric | |Value | |Stderr| | |
|--------|------:|----------------|-----:|-----------------------|---|-----:|---|------| | |
|gsm8k | 3|flexible-extract| 5|exact_match |↑ |0.7013|± |0.0126| | |
| | |strict-match | 5|exact_match |↑ |0.2418|± |0.0118| | |
|gsm8k-ko| 1|flexible-extract| 5|exact_match |↑ |0.4466|± |0.0137| | |
| | |strict-match | 5|exact_match |↑ |0.4420|± |0.0137| | |
|ifeval | 4|none | 0|inst_level_loose_acc |↑ |0.8549|± | N/A| | |
| | |none | 0|inst_level_strict_acc |↑ |0.8225|± | N/A| | |
| | |none | 0|prompt_level_loose_acc |↑ |0.7874|± |0.0176| | |
| | |none | 0|prompt_level_strict_acc|↑ |0.7468|± |0.0187| |