PixMo Collection A set of vision-language datasets built by Ai2 and used to train the Molmo family of models. Read more at https://molmo.allenai.org/blog • 9 items • Updated Nov 27, 2024 • 53
LLaVA-o1: Let Vision Language Models Reason Step-by-Step Paper • 2411.10440 • Published Nov 15, 2024 • 111
Theia Collection Distilling Diverse Vision Foundation Models for Robot Learning • 6 items • Updated Sep 30, 2024 • 9
view article Article Metric and Relative Monocular Depth Estimation: An Overview. Fine-Tuning Depth Anything V2 👐 📚 By Isayoften • Jul 10, 2024 • 44
3D-VLA: A 3D Vision-Language-Action Generative World Model Paper • 2403.09631 • Published Mar 14, 2024 • 7
Minitron Collection A family of compressed models obtained via pruning and knowledge distillation • 12 items • Updated 22 days ago • 60
OpenResearcher: Unleashing AI for Accelerated Scientific Research Paper • 2408.06941 • Published Aug 13, 2024 • 30
Optimus-1: Hybrid Multimodal Memory Empowered Agents Excel in Long-Horizon Tasks Paper • 2408.03615 • Published Aug 7, 2024 • 30
Zero-Shot Metric Depth with a Field-of-View Conditioned Diffusion Model Paper • 2312.13252 • Published Dec 20, 2023 • 27