FlashSloth: Lightning Multimodal Large Language Models via Embedded Visual Compression Paper • 2412.04317 • Published Dec 5, 2024
EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions Paper • 2409.18042 • Published Sep 26, 2024 • 37 • 7