CodeElo: Benchmarking Competition-level Code Generation of LLMs with Human-comparable Elo Ratings Paper • 2501.01257 • Published 1 day ago • 30
ProcessBench: Identifying Process Errors in Mathematical Reasoning Paper • 2412.06559 • Published 26 days ago • 72
InterBERT: Vision-and-Language Interaction for Multi-modal Pretraining Paper • 2003.13198 • Published Mar 30, 2020
ExpertPrompting: Instructing Large Language Models to be Distinguished Experts Paper • 2305.14688 • Published May 24, 2023
Chinese CLIP: Contrastive Vision-Language Pretraining in Chinese Paper • 2211.01335 • Published Nov 2, 2022 • 1
Transferring General Multimodal Pretrained Models to Text Recognition Paper • 2212.09297 • Published Dec 19, 2022
Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement Paper • 2409.12122 • Published Sep 18, 2024 • 3