Inference-Time Scaling for Diffusion Models beyond Scaling Denoising Steps Paper • 2501.09732 • Published 3 days ago • 52
Omni-RGPT: Unifying Image and Video Region-level Understanding via Token Marks Paper • 2501.08326 • Published 5 days ago • 31
Diffusion Adversarial Post-Training for One-Step Video Generation Paper • 2501.08316 • Published 5 days ago • 29
MiniMax-01: Scaling Foundation Models with Lightning Attention Paper • 2501.08313 • Published 5 days ago • 259
The Lessons of Developing Process Reward Models in Mathematical Reasoning Paper • 2501.07301 • Published 6 days ago • 80
LlamaV-o1: Rethinking Step-by-step Visual Reasoning in LLMs Paper • 2501.06186 • Published 9 days ago • 56
The GAN is dead; long live the GAN! A Modern GAN Baseline Paper • 2501.05441 • Published 10 days ago • 77
Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models Paper • 2411.14432 • Published Nov 21, 2024 • 23
Search-o1: Agentic Search-Enhanced Large Reasoning Models Paper • 2501.05366 • Published 10 days ago • 75
Agent Laboratory: Using LLM Agents as Research Assistants Paper • 2501.04227 • Published 12 days ago • 77
Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Though Paper • 2501.04682 • Published 11 days ago • 83
Spatiotemporal Skip Guidance for Enhanced Video Diffusion Sampling Paper • 2411.18664 • Published Nov 27, 2024 • 24
DynamicScaler: Seamless and Scalable Video Generation for Panoramic Scenes Paper • 2412.11100 • Published Dec 15, 2024 • 6
Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces Paper • 2412.14171 • Published Dec 18, 2024 • 24
AutoPresent: Designing Structured Visuals from Scratch Paper • 2501.00912 • Published 18 days ago • 8
Through-The-Mask: Mask-based Motion Trajectories for Image-to-Video Generation Paper • 2501.03059 • Published 13 days ago • 19
VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM Paper • 2501.00599 • Published 19 days ago • 41
VisionReward: Fine-Grained Multi-Dimensional Human Preference Learning for Image and Video Generation Paper • 2412.21059 • Published 20 days ago • 18