Feynman
's Collections
MultiModal
updated
MM-LLMs: Recent Advances in MultiModal Large Language Models
Paper
•
2401.13601
•
Published
•
45
A Touch, Vision, and Language Dataset for Multimodal Alignment
Paper
•
2402.13232
•
Published
•
14
Paper
•
2402.13144
•
Published
•
95
FlashTex: Fast Relightable Mesh Texturing with LightControlNet
Paper
•
2402.13251
•
Published
•
13
VisionLLaMA: A Unified LLaMA Interface for Vision Tasks
Paper
•
2403.00522
•
Published
•
44
PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K
Text-to-Image Generation
Paper
•
2403.04692
•
Published
•
39
OOTDiffusion: Outfitting Fusion based Latent Diffusion for Controllable
Virtual Try-on
Paper
•
2403.01779
•
Published
•
28
CRM: Single Image to 3D Textured Mesh with Convolutional Reconstruction
Model
Paper
•
2403.05034
•
Published
•
20
CogView3: Finer and Faster Text-to-Image Generation via Relay Diffusion
Paper
•
2403.05121
•
Published
•
22
MovieLLM: Enhancing Long Video Understanding with AI-Generated Movies
Paper
•
2403.01422
•
Published
•
26
DressCode: Autoregressively Sewing and Generating Garments from Text
Guidance
Paper
•
2401.16465
•
Published
•
11
Human4DiT: Free-view Human Video Generation with 4D Diffusion
Transformer
Paper
•
2405.17405
•
Published
•
14
Looking Backward: Streaming Video-to-Video Translation with Feature
Banks
Paper
•
2405.15757
•
Published
•
14
Jina CLIP: Your CLIP Model Is Also Your Text Retriever
Paper
•
2405.20204
•
Published
•
35
ColPali: Efficient Document Retrieval with Vision Language Models
Paper
•
2407.01449
•
Published
•
42
Honeybee: Locality-enhanced Projector for Multimodal LLM
Paper
•
2312.06742
•
Published
•
9