mPLUG-DocOwl2: High-resolution Compressing for OCR-free Multi-page Document Understanding Paper • 2409.03420 • Published Sep 5, 2024 • 26
MINT-1T: Scaling Open-Source Multimodal Data by 10x: A Multimodal Dataset with One Trillion Tokens Paper • 2406.11271 • Published Jun 17, 2024 • 21
PhotoVerse: Tuning-Free Image Customization with Text-to-Image Diffusion Models Paper • 2309.05793 • Published Sep 11, 2023 • 50