-
MS MARCO Web Search: a Large-scale Information-rich Web Dataset with Millions of Real Click Labels
Paper • 2405.07526 • Published • 18 -
Automatic Data Curation for Self-Supervised Learning: A Clustering-Based Approach
Paper • 2405.15613 • Published • 13 -
A Touch, Vision, and Language Dataset for Multimodal Alignment
Paper • 2402.13232 • Published • 14 -
How Do Large Language Models Acquire Factual Knowledge During Pretraining?
Paper • 2406.11813 • Published • 30
Collections
Discover the best community collections!
Collections including paper arxiv:2411.04709
-
DreamScene360: Unconstrained Text-to-3D Scene Generation with Panoramic Gaussian Splatting
Paper • 2404.06903 • Published • 18 -
CatLIP: CLIP-level Visual Recognition Accuracy with 2.7x Faster Pre-training on Web-scale Image-Text Data
Paper • 2404.15653 • Published • 26 -
MoDE: CLIP Data Experts via Clustering
Paper • 2404.16030 • Published • 12 -
BlenderAlchemy: Editing 3D Graphics with Vision-Language Models
Paper • 2404.17672 • Published • 18
-
LLaVA-Gemma: Accelerating Multimodal Foundation Models with a Compact Language Model
Paper • 2404.01331 • Published • 25 -
OmniFusion Technical Report
Paper • 2404.06212 • Published • 75 -
MoDE: CLIP Data Experts via Clustering
Paper • 2404.16030 • Published • 12 -
WildGaussians: 3D Gaussian Splatting in the Wild
Paper • 2407.08447 • Published • 8
-
TextCraftor: Your Text Encoder Can be Image Quality Controller
Paper • 2403.18978 • Published • 13 -
InstantStyle: Free Lunch towards Style-Preserving in Text-to-Image Generation
Paper • 2404.02733 • Published • 20 -
OmniFusion Technical Report
Paper • 2404.06212 • Published • 75 -
Transferable and Principled Efficiency for Open-Vocabulary Segmentation
Paper • 2404.07448 • Published • 11
-
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
Paper • 2103.14030 • Published • 4 -
A Novel Transformer Based Semantic Segmentation Scheme for Fine-Resolution Remote Sensing Images
Paper • 2104.12137 • Published • 2 -
Self-Supervised Learning with Swin Transformers
Paper • 2105.04553 • Published • 2 -
Evaluating Transformer-based Semantic Segmentation Networks for Pathological Image Segmentation
Paper • 2108.11993 • Published • 2
-
U-Net: Convolutional Networks for Biomedical Image Segmentation
Paper • 1505.04597 • Published • 9 -
Image Segmentation using U-Net Architecture for Powder X-ray Diffraction Images
Paper • 2310.16186 • Published • 2 -
H-DenseUNet: Hybrid Densely Connected UNet for Liver and Tumor Segmentation from CT Volumes
Paper • 1709.07330 • Published • 2 -
Deep LOGISMOS: Deep Learning Graph-based 3D Segmentation of Pancreatic Tumors on CT scans
Paper • 1801.08599 • Published • 2
-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 25 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 12 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 40 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 20