VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs Paper β’ 2406.07476 β’ Published Jun 11, 2024 β’ 32
DAMO-NLP-SG/VideoLLaMA2.1-7B-16F-Base Visual Question Answering β’ Updated Oct 21, 2024 β’ 164 β’ 1