VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction Paper • 2501.01957 • Published 3 days ago • 5
Florence-VL: Enhancing Vision-Language Models with Generative Vision Encoder and Depth-Breadth Fusion Paper • 2412.04424 • Published Dec 5, 2024 • 59
sentence-transformers/all-MiniLM-L6-v2 Sentence Similarity • Updated Nov 1, 2024 • 66.2M • • 2.77k
Apollo: An Exploration of Video Understanding in Large Multimodal Models Paper • 2412.10360 • Published 24 days ago • 136
PowerInfer/SmallThinker-3B-Preview Text Generation • Updated about 4 hours ago • 4.66k • • 239
sentence-transformers/stsb-xlm-r-multilingual Sentence Similarity • Updated Nov 5, 2024 • 634k • 45