Apollo: An Exploration of Video Understanding in Large Multimodal Models Paper • 2412.10360 • Published 19 days ago • 132
Image as a Foreign Language: BEiT Pretraining for All Vision and Vision-Language Tasks Paper • 2208.10442 • Published Aug 22, 2022
RedStone: Curating General, Code, Math, and QA Data for Large Language Models Paper • 2412.03398 • Published 28 days ago • 1
Multimodal Latent Language Modeling with Next-Token Diffusion Paper • 2412.08635 • Published 21 days ago • 41
Multimodal Latent Language Modeling with Next-Token Diffusion Paper • 2412.08635 • Published 21 days ago • 41