M3DocRAG: Multi-modal Retrieval is What You Need for Multi-page Multi-document Understanding Paper • 2411.04952 • Published Nov 7, 2024 • 28
VideoRepair: Improving Text-to-Video Generation via Misalignment Evaluation and Localized Refinement Paper • 2411.15115 • Published Nov 22, 2024 • 9
Diagnostic Benchmark and Iterative Inpainting for Layout-Guided Image Generation Paper • 2304.06671 • Published Apr 13, 2023
DataEnvGym: Data Generation Agents in Teacher Environments with Student Feedback Paper • 2410.06215 • Published Oct 8, 2024
Self-Chained Image-Language Model for Video Localization and Question Answering Paper • 2305.06988 • Published May 11, 2023
DALL-Eval: Probing the Reasoning Skills and Social Biases of Text-to-Image Generative Models Paper • 2202.04053 • Published Feb 8, 2022
Visual Programming for Text-to-Image Generation and Evaluation Paper • 2305.15328 • Published May 24, 2023
VL-Adapter: Parameter-Efficient Transfer Learning for Vision-and-Language Tasks Paper • 2112.06825 • Published Dec 13, 2021
VidLanKD: Improving Language Understanding via Video-Distilled Knowledge Transfer Paper • 2107.02681 • Published Jul 6, 2021
EnvGen: Generating and Adapting Environments via LLMs for Training Embodied Agents Paper • 2403.12014 • Published Mar 18, 2024
Davidsonian Scene Graph: Improving Reliability in Fine-grained Evaluation for Text-to-Image Generation Paper • 2310.18235 • Published Oct 27, 2023
SELMA: Learning and Merging Skill-Specific Text-to-Image Experts with Auto-Generated Data Paper • 2403.06952 • Published Mar 11, 2024
DOCCI: Descriptions of Connected and Contrasting Images Paper • 2404.19753 • Published Apr 30, 2024 • 13
Ctrl-Adapter: An Efficient and Versatile Framework for Adapting Diverse Controls to Any Diffusion Model Paper • 2404.09967 • Published Apr 15, 2024 • 21
DMV3D: Denoising Multi-View Diffusion using 3D Large Reconstruction Model Paper • 2311.09217 • Published Nov 15, 2023 • 21
DiagrammerGPT: Generating Open-Domain, Open-Platform Diagrams via LLM Planning Paper • 2310.12128 • Published Oct 18, 2023
VideoDirectorGPT: Consistent Multi-scene Video Generation via LLM-Guided Planning Paper • 2309.15091 • Published Sep 26, 2023 • 32