VividFace: A Diffusion-Based Hybrid Framework for High-Fidelity Video Face Swapping Paper • 2412.11279 • Published 22 days ago • 12
Causal Diffusion Transformers for Generative Modeling Paper • 2412.12095 • Published 21 days ago • 23
EasyRef: Omni-Generalized Group Image Reference for Diffusion Models via Multimodal LLM Paper • 2412.09618 • Published 25 days ago • 21
Co-DETR Collection State-of-the-art detection and segmentation models. • 5 items • Updated Nov 3, 2024 • 1
MMSearch: Benchmarking the Potential of Large Models as Multi-modal Search Engines Paper • 2409.12959 • Published Sep 19, 2024 • 37
MoVA: Adapting Mixture of Vision Experts to Multimodal Context Paper • 2404.13046 • Published Apr 19, 2024 • 1
Exploring the Role of Large Language Models in Prompt Encoding for Diffusion Models Paper • 2406.11831 • Published Jun 17, 2024 • 21
RAPHAEL: Text-to-Image Generation via Large Mixture of Diffusion Paths Paper • 2305.18295 • Published May 29, 2023 • 7
MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems? Paper • 2403.14624 • Published Mar 21, 2024 • 51