A Diffusion Model for Video Inpainting
Text-to-3D and Image-to-3D Generation
Dense Grounded Understanding of Images and Videos
Text to Audio (Sound SFX) Generator
Audio Conditioned LipSync with Latent Diffusion Models
Generate multi-view images from a single image
Scalable and Versatile 3D Generation from images
StableNormal Turbo Beta
Prompt with Images in flux[dev]
3D/4D Scenes from a Single Image w/ Controllable Video Diff
Blind Image Restoration with Instant Generative Reference
Expressive Portrait Animation w/ Hierarchical Motion AttentΒ°