-
Animate-X: Universal Character Image Animation with Enhanced Motion Representation
Paper • 2410.10306 • Published • 54 -
ReCapture: Generative Video Camera Controls for User-Provided Videos using Masked Video Fine-Tuning
Paper • 2411.05003 • Published • 70 -
TIP-I2V: A Million-Scale Real Text and Image Prompt Dataset for Image-to-Video Generation
Paper • 2411.04709 • Published • 25 -
IterComp: Iterative Composition-Aware Feedback Learning from Model Gallery for Text-to-Image Generation
Paper • 2410.07171 • Published • 42
Collections
Discover the best community collections!
Collections including paper arxiv:2410.13830
-
Janus: Decoupling Visual Encoding for Unified Multimodal Understanding and Generation
Paper • 2410.13848 • Published • 32 -
DreamVideo-2: Zero-Shot Subject-Driven Video Customization with Precise Motion Control
Paper • 2410.13830 • Published • 24 -
VisionZip: Longer is Better but Not Necessary in Vision Language Models
Paper • 2412.04467 • Published • 105
-
Tora: Trajectory-oriented Diffusion Transformer for Video Generation
Paper • 2407.21705 • Published • 27 -
TrackGo: A Flexible and Efficient Method for Controllable Video Generation
Paper • 2408.11475 • Published • 17 -
TVG: A Training-free Transition Video Generation Method with Diffusion Models
Paper • 2408.13413 • Published • 14 -
PhysGen: Rigid-Body Physics-Grounded Image-to-Video Generation
Paper • 2409.18964 • Published • 26
-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 25 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 12 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 40 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 20
-
WorldDreamer: Towards General World Models for Video Generation via Predicting Masked Tokens
Paper • 2401.09985 • Published • 15 -
CustomVideo: Customizing Text-to-Video Generation with Multiple Subjects
Paper • 2401.09962 • Published • 8 -
Inflation with Diffusion: Efficient Temporal Adaptation for Text-to-Video Super-Resolution
Paper • 2401.10404 • Published • 10 -
ActAnywhere: Subject-Aware Video Background Generation
Paper • 2401.10822 • Published • 13