DiTCtrl: Exploring Attention Control in Multi-Modal Diffusion Transformer for Tuning-Free Multi-Prompt Longer Video Generation Paper • 2412.18597 • Published 15 days ago • 19
3DGS-Enhancer: Enhancing Unbounded 3D Gaussian Splatting with View-consistent 2D Diffusion Priors Paper • 2410.16266 • Published Oct 21, 2024 • 4
EvoPress: Towards Optimal Dynamic Model Compression via Evolutionary Search Paper • 2410.14649 • Published Oct 18, 2024 • 8
MiniPLM: Knowledge Distillation for Pre-Training Language Models Paper • 2410.17215 • Published Oct 22, 2024 • 14
xGen-MM-Vid (BLIP-3-Video): You Only Need 32 Tokens to Represent a Video Even in VLMs Paper • 2410.16267 • Published Oct 21, 2024 • 17
Mitigating Object Hallucination via Concentric Causal Attention Paper • 2410.15926 • Published Oct 21, 2024 • 16
PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction Paper • 2410.17247 • Published Oct 22, 2024 • 45
SpectroMotion: Dynamic 3D Reconstruction of Specular Scenes Paper • 2410.17249 • Published Oct 22, 2024 • 41
LLM-based Optimization of Compound AI Systems: A Survey Paper • 2410.16392 • Published Oct 21, 2024 • 14
JMMMU: A Japanese Massive Multi-discipline Multimodal Understanding Benchmark for Culture-aware Evaluation Paper • 2410.17250 • Published Oct 22, 2024 • 14
Math Neurosurgery: Isolating Language Models' Math Reasoning Abilities Using Only Forward Passes Paper • 2410.16930 • Published Oct 22, 2024 • 6
Improve Vision Language Model Chain-of-thought Reasoning Paper • 2410.16198 • Published Oct 21, 2024 • 22
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions Paper • 2412.09596 • Published 27 days ago • 92