FAST: Efficient Action Tokenization for Vision-Language-Action Models Paper • 2501.09747 • Published 7 days ago • 22
LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision Token Paper • 2501.03895 • Published 16 days ago • 48
YuLan-Mini: An Open Data-efficient Language Model Paper • 2412.17743 • Published about 1 month ago • 64
Flowing from Words to Pixels: A Framework for Cross-Modality Evolution Paper • 2412.15213 • Published Dec 19, 2024 • 26
Efficient Diffusion Transformer Policies with Mixture of Expert Denoisers for Multitask Learning Paper • 2412.12953 • Published Dec 17, 2024 • 11
PaliGemma 2: A Family of Versatile VLMs for Transfer Paper • 2412.03555 • Published Dec 4, 2024 • 124