flow2023
's Collections
CLIP
updated
YOLO-World: Real-Time Open-Vocabulary Object Detection
Paper
•
2401.17270
•
Published
•
35
Multimodal Pathway: Improve Transformers with Irrelevant Data from Other
Modalities
Paper
•
2401.14405
•
Published
•
12
Improving fine-grained understanding in image-text pre-training
Paper
•
2401.09865
•
Published
•
16
CatLIP: CLIP-level Visual Recognition Accuracy with 2.7x Faster
Pre-training on Web-scale Image-Text Data
Paper
•
2404.15653
•
Published
•
26
Multi-Head Mixture-of-Experts
Paper
•
2404.15045
•
Published
•
59
Scaling (Down) CLIP: A Comprehensive Analysis of Data, Architecture, and
Training Strategies
Paper
•
2404.08197
•
Published
•
27
MoDE: CLIP Data Experts via Clustering
Paper
•
2404.16030
•
Published
•
12
Jina CLIP: Your CLIP Model Is Also Your Text Retriever
Paper
•
2405.20204
•
Published
•
34
An Image is Worth More Than 16x16 Patches: Exploring Transformers on
Individual Pixels
Paper
•
2406.09415
•
Published
•
50
Paper
•
2410.05258
•
Published
•
169
Breaking the Memory Barrier: Near Infinite Batch Size Scaling for
Contrastive Loss
Paper
•
2410.17243
•
Published
•
89