Apollo: An Exploration of Video Understanding in Large Multimodal Models Paper • 2412.10360 • Published 21 days ago • 136
EXAONE-3.0 Collection EXAONE 3.0 7.8B instruction-tuned language model • 2 items • Updated 26 days ago • 3
EXAONE-3.5 Collection EXAONE 3.5 language model series including instruction-tuned models of 2.4B, 7.8B, and 32B. • 10 items • Updated 25 days ago • 84
NitroFusion: High-Fidelity Single-Step Diffusion through Dynamic Adversarial Training Paper • 2412.02030 • Published Dec 2, 2024 • 18
One Shot, One Talk: Whole-body Talking Avatar from a Single Image Paper • 2412.01106 • Published Dec 2, 2024 • 18
ChatRex: Taming Multimodal LLM for Joint Perception and Understanding Paper • 2411.18363 • Published Nov 27, 2024 • 9
Material Anything: Generating Materials for Any 3D Object via Diffusion Paper • 2411.15138 • Published Nov 22, 2024 • 42
Flux.1 Tools Collection FLUX.1 Tools, a suite of models designed to add control and steerability to base text-to-image models FLUX.1 • 6 items • Updated Nov 22, 2024 • 13
LLaVA-o1: Let Vision Language Models Reason Step-by-Step Paper • 2411.10440 • Published Nov 15, 2024 • 111
LLaMA-Mesh: Unifying 3D Mesh Generation with Language Models Paper • 2411.09595 • Published Nov 14, 2024 • 71
BLIP3-KALE: Knowledge Augmented Large-Scale Dense Captions Paper • 2411.07461 • Published Nov 12, 2024 • 21
OS-ATLAS: A Foundation Action Model for Generalist GUI Agents Paper • 2410.23218 • Published Oct 30, 2024 • 46
MobileLLM Collection Optimizing Sub-billion Parameter Language Models for On-Device Use Cases (ICML 2024) https://arxiv.org/abs/2402.14905 • 9 items • Updated Nov 27, 2024 • 101
MarDini: Masked Autoregressive Diffusion for Video Generation at Scale Paper • 2410.20280 • Published Oct 26, 2024 • 23
CogVLM2 Collection This collection hosts the repos of the THUDM's CogVLM2 releases • 8 items • Updated Nov 27, 2024 • 19