gary109
's Collections
Vision Transformers
updated
Mobile V-MoEs: Scaling Down Vision Transformers via Sparse
Mixture-of-Experts
Paper
•
2309.04354
•
Published
•
13
Vision Transformers Need Registers
Paper
•
2309.16588
•
Published
•
78
AutoCLIP: Auto-tuning Zero-Shot Classifiers for Vision-Language Models
Paper
•
2309.16414
•
Published
•
19
MotionLM: Multi-Agent Motion Forecasting as Language Modeling
Paper
•
2309.16534
•
Published
•
15
BLIP: Bootstrapping Language-Image Pre-training for Unified
Vision-Language Understanding and Generation
Paper
•
2201.12086
•
Published
•
3
FiT: Flexible Vision Transformer for Diffusion Model
Paper
•
2402.12376
•
Published
•
48
Subobject-level Image Tokenization
Paper
•
2402.14327
•
Published
•
17
Scalable Diffusion Models with Transformers
Paper
•
2212.09748
•
Published
•
17
mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal
Large Language Models
Paper
•
2408.04840
•
Published
•
32
Seeing and Understanding: Bridging Vision with Chemical Knowledge Via
ChemVLM
Paper
•
2408.07246
•
Published
•
21