Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos Paper • 2501.04001 • Published 15 days ago • 41
Ingredients: Blending Custom Photos with Video Diffusion Transformers Paper • 2501.01790 • Published 19 days ago • 8
Unpacking SDXL Turbo: Interpreting Text-to-Image Models with Sparse Autoencoders Paper • 2410.22366 • Published Oct 28, 2024 • 77
Scaling Synthetic Data Creation with 1,000,000,000 Personas Paper • 2406.20094 • Published Jun 28, 2024 • 98