Emma-X: An Embodied Multimodal Action Model with Grounded Chain of Thought and Look-ahead Spatial Reasoning Paper • 2412.11974 • Published 21 days ago • 9
Explanatory Instructions: Towards Unified Vision Tasks Understanding and Zero-shot Generalization Paper • 2412.18525 • Published 13 days ago • 62
Customized Generation Reimagined: Fidelity and Editability Harmonized Paper • 2412.04831 • Published about 1 month ago • 1
Molar: Multimodal LLMs with Collaborative Filtering Alignment for Enhanced Sequential Recommendation Paper • 2412.18176 • Published 13 days ago • 15
A Silver Bullet or a Compromise for Full Attention? A Comprehensive Study of Gist Token-based Context Compression Paper • 2412.17483 • Published 14 days ago • 29