Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models Paper • 2409.17146 • Published Sep 25, 2024 • 106
Cosmos Tokenizer Collection A suite of image and video tokenizers • 12 items • Updated about 11 hours ago • 29
MobileVLM V2: Faster and Stronger Baseline for Vision Language Model Paper • 2402.03766 • Published Feb 6, 2024 • 13
DeepSeek-VL: Towards Real-World Vision-Language Understanding Paper • 2403.05525 • Published Mar 8, 2024 • 40
Tora: Trajectory-oriented Diffusion Transformer for Video Generation Paper • 2407.21705 • Published Jul 31, 2024 • 27
Unbounded: A Generative Infinite Game of Character Life Simulation Paper • 2410.18975 • Published Oct 24, 2024 • 35
LEDITS: Real Image Editing with DDPM Inversion and Semantic Guidance Paper • 2307.00522 • Published Jul 2, 2023 • 32