-
Vript: A Video Is Worth Thousands of Words
Paper β’ 2406.06040 β’ Published β’ 26 -
ShareGPT4Video: Improving Video Understanding and Generation with Better Captions
Paper β’ 2406.04325 β’ Published β’ 73 -
MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark
Paper β’ 2406.01574 β’ Published β’ 45 -
Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
Paper β’ 2405.21075 β’ Published β’ 22
Collections
Discover the best community collections!
Collections including paper arxiv:2405.17247
-
Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation
Paper β’ 2406.06525 β’ Published β’ 67 -
Husky: A Unified, Open-Source Language Agent for Multi-Step Reasoning
Paper β’ 2406.06469 β’ Published β’ 25 -
Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models
Paper β’ 2406.04271 β’ Published β’ 29 -
Block Transformer: Global-to-Local Language Modeling for Fast Inference
Paper β’ 2406.02657 β’ Published β’ 38