Apollo: An Exploration of Video Understanding in Large Multimodal Models Paper • 2412.10360 • Published 25 days ago • 136
VisionZip: Longer is Better but Not Necessary in Vision Language Models Paper • 2412.04467 • Published Dec 5, 2024 • 105
Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction Paper • 2412.04454 • Published Dec 5, 2024 • 57
MaskLLM: Learnable Semi-Structured Sparsity for Large Language Models Paper • 2409.17481 • Published Sep 26, 2024 • 47
LLaVA-3D: A Simple yet Effective Pathway to Empowering LMMs with 3D-awareness Paper • 2409.18125 • Published Sep 26, 2024 • 34
WonderWorld: Interactive 3D Scene Generation from a Single Image Paper • 2406.09394 • Published Jun 13, 2024 • 3
Imagine yourself: Tuning-Free Personalized Image Generation Paper • 2409.13346 • Published Sep 20, 2024 • 68
MEDIC: Towards a Comprehensive Framework for Evaluating LLMs in Clinical Applications Paper • 2409.07314 • Published Sep 11, 2024 • 51
Controllable Text Generation for Large Language Models: A Survey Paper • 2408.12599 • Published Aug 22, 2024 • 64
I-SHEEP: Self-Alignment of LLM from Scratch through an Iterative Self-Enhancement Paradigm Paper • 2408.08072 • Published Aug 15, 2024 • 34
The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery Paper • 2408.06292 • Published Aug 12, 2024 • 118
VITA: Towards Open-Source Interactive Omni Multimodal LLM Paper • 2408.05211 • Published Aug 9, 2024 • 47
view article Article Welcome FalconMamba: The first strong attention-free 7B model Aug 12, 2024 • 108
Medical SAM 2: Segment medical images as video via Segment Anything Model 2 Paper • 2408.00874 • Published Aug 1, 2024 • 45
A Large Encoder-Decoder Family of Foundation Models For Chemical Language Paper • 2407.20267 • Published Jul 24, 2024 • 32
A Simulation Benchmark for Autonomous Racing with Large-Scale Human Data Paper • 2407.16680 • Published Jul 23, 2024 • 12