Sana Collection β‘οΈSana: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer β’ 19 items β’ Updated 14 days ago β’ 87
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models Paper β’ 2409.17146 β’ Published Sep 25, 2024 β’ 106
Wolf: Captioning Everything with a World Summarization Framework Paper β’ 2407.18908 β’ Published Jul 26, 2024 β’ 32
A Picture is Worth More Than 77 Text Tokens: Evaluating CLIP-Style Models on Dense Captions Paper β’ 2312.08578 β’ Published Dec 14, 2023 β’ 17
VILA: On Pre-training for Visual Language Models Paper β’ 2312.07533 β’ Published Dec 12, 2023 β’ 21
Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference Paper β’ 2310.04378 β’ Published Oct 6, 2023 β’ 19
PockEngine: Sparse and Efficient Fine-tuning in a Pocket Paper β’ 2310.17752 β’ Published Oct 26, 2023 β’ 12