MapEval: A Map-Based Evaluation of Geo-Spatial Reasoning in Foundation Models Paper • 2501.00316 • Published 4 days ago • 16
MapQaTor: A System for Efficient Annotation of Map Query Datasets Paper • 2412.21015 • Published 4 days ago • 6
VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM Paper • 2501.00599 • Published 3 days ago • 27
CodeElo: Benchmarking Competition-level Code Generation of LLMs with Human-comparable Elo Ratings Paper • 2501.01257 • Published 1 day ago • 30
VideoAnydoor: High-fidelity Video Object Insertion with Precise Motion Control Paper • 2501.01427 • Published 1 day ago • 30
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining Paper • 2501.00958 • Published 2 days ago • 45
MLLM-as-a-Judge for Image Safety without Human Labeling Paper • 2501.00192 • Published 4 days ago • 12
ProgCo: Program Helps Self-Correction of Large Language Models Paper • 2501.01264 • Published 1 day ago • 17
OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis Paper • 2412.19723 • Published 7 days ago • 63
On the Compositional Generalization of Multimodal LLMs for Medical Imaging Paper • 2412.20070 • Published 7 days ago • 39
Explanatory Instructions: Towards Unified Vision Tasks Understanding and Zero-shot Generalization Paper • 2412.18525 • Published 10 days ago • 59
HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs Paper • 2412.18925 • Published 9 days ago • 82
Orient Anything: Learning Robust Object Orientation Estimation from Rendering 3D Models Paper • 2412.18605 • Published 10 days ago • 17
Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search Paper • 2412.18319 • Published 11 days ago • 33
InternVL2.5-MPO Collection Enhancing the Reasoning Ability of MLLMs via Mixed Preference Optimization • 16 items • Updated 4 days ago • 23
B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners Paper • 2412.17256 • Published 12 days ago • 42