VISTA: Enhancing Long-Duration and High-Resolution Video Understanding by Video Spatiotemporal Augmentation Paper • 2412.00927 • Published Dec 1, 2024 • 26
OmniEdit: Building Image Editing Generalist Models Through Specialist Supervision Paper • 2411.07199 • Published Nov 11, 2024 • 46
II-Bench: An Image Implication Understanding Benchmark for Multimodal Large Language Models Paper • 2406.05862 • Published Jun 9, 2024 • 4
MEGA-Bench: Scaling Multimodal Evaluation to over 500 Real-World Tasks Paper • 2410.10563 • Published Oct 14, 2024 • 38
MEGA-Bench: Scaling Multimodal Evaluation to over 500 Real-World Tasks Paper • 2410.10563 • Published Oct 14, 2024 • 38
MMMU-Pro: A More Robust Multi-discipline Multimodal Understanding Benchmark Paper • 2409.02813 • Published Sep 4, 2024 • 28
MMMU-Pro: A More Robust Multi-discipline Multimodal Understanding Benchmark Paper • 2409.02813 • Published Sep 4, 2024 • 28
Knowledge Mechanisms in Large Language Models: A Survey and Perspective Paper • 2407.15017 • Published Jul 22, 2024 • 34
MantisScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation Paper • 2406.15252 • Published Jun 21, 2024 • 14
GenAI Arena: An Open Evaluation Platform for Generative Models Paper • 2406.04485 • Published Jun 6, 2024 • 20