-
Recurrent Neural Networks Learn to Store and Generate Sequences using Non-Linear Representations
Paper • 2408.10920 • Published • 1 -
Towards Exact Computation of Inductive Bias
Paper • 2406.15941 • Published -
DrawingSpinUp: 3D Animation from Single Character Drawings
Paper • 2409.08615 • Published • 18 -
Learning to Move Like Professional Counter-Strike Players
Paper • 2408.13934 • Published • 23
Collections
Discover the best community collections!
Collections including paper arxiv:2408.12588
-
Controllable Text Generation for Large Language Models: A Survey
Paper • 2408.12599 • Published • 64 -
xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations
Paper • 2408.12590 • Published • 35 -
Real-Time Video Generation with Pyramid Attention Broadcast
Paper • 2408.12588 • Published • 16 -
Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model
Paper • 2408.11039 • Published • 58
-
MOFA-Video: Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model
Paper • 2405.20222 • Published • 11 -
ZeroSmooth: Training-free Diffuser Adaptation for High Frame Rate Video Generation
Paper • 2406.00908 • Published • 11 -
CamCo: Camera-Controllable 3D-Consistent Image-to-Video Generation
Paper • 2406.02509 • Published • 9 -
I4VGen: Image as Stepping Stone for Text-to-Video Generation
Paper • 2406.02230 • Published • 16
-
SDXS: Real-Time One-Step Latent Diffusion Models with Image Conditions
Paper • 2403.16627 • Published • 20 -
Phased Consistency Model
Paper • 2405.18407 • Published • 46 -
Reducing Transformer Key-Value Cache Size with Cross-Layer Attention
Paper • 2405.12981 • Published • 28 -
Imp: Highly Capable Large Multimodal Models for Mobile Devices
Paper • 2405.12107 • Published • 26
-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 25 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 12 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 40 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 20
-
WorldDreamer: Towards General World Models for Video Generation via Predicting Masked Tokens
Paper • 2401.09985 • Published • 15 -
CustomVideo: Customizing Text-to-Video Generation with Multiple Subjects
Paper • 2401.09962 • Published • 8 -
Inflation with Diffusion: Efficient Temporal Adaptation for Text-to-Video Super-Resolution
Paper • 2401.10404 • Published • 10 -
ActAnywhere: Subject-Aware Video Background Generation
Paper • 2401.10822 • Published • 13