Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution Paper β’ 2409.12191 β’ Published Sep 18, 2024 β’ 76
Moshi v0.1 Release Collection MLX, Candle & PyTorch model checkpoints released as part of the Moshi release from Kyutai. Run inference via: https://github.com/kyutai-labs/moshi β’ 13 items β’ Updated Sep 18, 2024 β’ 225
view article Article Train Custom Models on Hugging Face Spaces with AutoTrain SpaceRunner By abhishek β’ May 9, 2024 β’ 13
UniBench: Visual Reasoning Requires Rethinking Vision-Language Beyond Scaling Paper β’ 2408.04810 β’ Published Aug 9, 2024 β’ 23
VITA: Towards Open-Source Interactive Omni Multimodal LLM Paper β’ 2408.05211 β’ Published Aug 9, 2024 β’ 47