Video-Guided Foley Sound Generation with Multimodal Controls Paper • 2411.17698 • Published Nov 26, 2024 • 7
FLOAT: Generative Motion Latent Flow Matching for Audio-driven Talking Portrait Paper • 2412.01064 • Published Dec 2, 2024 • 26
OmniFlow: Any-to-Any Generation with Multi-Modal Rectified Flows Paper • 2412.01169 • Published Dec 2, 2024 • 12
Lyra: An Efficient and Speech-Centric Framework for Omni-Cognition Paper • 2412.09501 • Published Dec 12, 2024 • 45
Omni-RGPT: Unifying Image and Video Region-level Understanding via Token Marks Paper • 2501.08326 • Published 6 days ago • 31