Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos Paper • 2501.04001 • Published 1 day ago • 21
Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos Paper • 2501.04001 • Published 1 day ago • 21
LLAVADI: What Matters For Multimodal Large Language Models Distillation Paper • 2407.19409 • Published Jul 28, 2024
Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos Paper • 2501.04001 • Published 1 day ago • 21
DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation Paper • 2412.07589 • Published 30 days ago • 45
HumanEdit: A High-Quality Human-Rewarded Dataset for Instruction-based Image Editing Paper • 2412.04280 • Published Dec 5, 2024 • 13
Generalizable Entity Grounding via Assistance of Large Language Model Paper • 2402.02555 • Published Feb 4, 2024
DVIS-DAQ: Improving Video Segmentation via Dynamic Anchor Queries Paper • 2404.00086 • Published Mar 29, 2024
SemFlow: Binding Semantic Segmentation and Image Synthesis via Rectified Flow Paper • 2405.20282 • Published May 30, 2024
GenView: Enhancing View Quality with Pretrained Generative Model for Self-Supervised Learning Paper • 2403.12003 • Published Mar 18, 2024 • 2
LLAVADI: What Matters For Multimodal Large Language Models Distillation Paper • 2407.19409 • Published Jul 28, 2024
Meissonic: Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image Synthesis Paper • 2410.08261 • Published Oct 10, 2024 • 50