Video-3D LLM: Learning Position-Aware Video Representation for 3D Scene Understanding Paper • 2412.00493 • Published Nov 30, 2024 • 16
LLaVA-Grounding: Grounded Visual Chat with Large Multimodal Models Paper • 2312.02949 • Published Dec 5, 2023 • 11