GLOV: Guided Large Language Models as Implicit Optimizers for Vision Language Models Paper • 2410.06154 • Published Oct 8, 2024 • 16
Visual Echoes: A Simple Unified Transformer for Audio-Visual Generation Paper • 2405.14598 • Published May 23, 2024 • 12
Dynamic Prompt Learning: Addressing Cross-Attention Leakage for Text-Based Image Editing Paper • 2309.15664 • Published Sep 27, 2023 • 1
Visual Echoes: A Simple Unified Transformer for Audio-Visual Generation Paper • 2405.14598 • Published May 23, 2024 • 12