Unifying Specialized Visual Encoders for Video Language Models Paper • 2501.01426 • Published 5 days ago • 20
Unifying Specialized Visual Encoders for Video Language Models Paper • 2501.01426 • Published 5 days ago • 20 • 2
xT: Nested Tokenization for Larger Context in Large Images Paper • 2403.01915 • Published Mar 4, 2024
Unifying Specialized Visual Encoders for Video Language Models Paper • 2501.01426 • Published 5 days ago • 20
AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark Paper • 2410.03051 • Published Oct 4, 2024 • 5