Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2411.04709

MS MARCO Web Search: a Large-scale Information-rich Web Dataset with Millions of Real Click Labels

Paper • 2405.07526 • Published May 13, 2024 • 18
Automatic Data Curation for Self-Supervised Learning: A Clustering-Based Approach

Paper • 2405.15613 • Published May 24, 2024 • 13
A Touch, Vision, and Language Dataset for Multimodal Alignment

Paper • 2402.13232 • Published Feb 20, 2024 • 14
How Do Large Language Models Acquire Factual Knowledge During Pretraining?

Paper • 2406.11813 • Published Jun 17, 2024 • 30

Papers - Image - Encoders - ViT

DreamScene360: Unconstrained Text-to-3D Scene Generation with Panoramic Gaussian Splatting

Paper • 2404.06903 • Published Apr 10, 2024 • 18
CatLIP: CLIP-level Visual Recognition Accuracy with 2.7x Faster Pre-training on Web-scale Image-Text Data

Paper • 2404.15653 • Published Apr 24, 2024 • 26
MoDE: CLIP Data Experts via Clustering

Paper • 2404.16030 • Published Apr 24, 2024 • 12
BlenderAlchemy: Editing 3D Graphics with Vision-Language Models

Paper • 2404.17672 • Published Apr 26, 2024 • 18

Papers - Image - Encoders - DinoV2

LLaVA-Gemma: Accelerating Multimodal Foundation Models with a Compact Language Model

Paper • 2404.01331 • Published Mar 29, 2024 • 25
OmniFusion Technical Report

Paper • 2404.06212 • Published Apr 9, 2024 • 75
MoDE: CLIP Data Experts via Clustering

Paper • 2404.16030 • Published Apr 24, 2024 • 12
WildGaussians: 3D Gaussian Splatting in the Wild

Paper • 2407.08447 • Published Jul 11, 2024 • 8

Papers - Image - Encoders - Clip

TextCraftor: Your Text Encoder Can be Image Quality Controller

Paper • 2403.18978 • Published Mar 27, 2024 • 13
InstantStyle: Free Lunch towards Style-Preserving in Text-to-Image Generation

Paper • 2404.02733 • Published Apr 3, 2024 • 20
OmniFusion Technical Report

Paper • 2404.06212 • Published Apr 9, 2024 • 75
Transferable and Principled Efficiency for Open-Vocabulary Segmentation

Paper • 2404.07448 • Published Apr 11, 2024 • 11

Papers - Image - EfficientNet

EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks

Paper • 1905.11946 • Published May 28, 2019 • 3
TIP-I2V: A Million-Scale Real Text and Image Prompt Dataset for Image-to-Video Generation

Paper • 2411.04709 • Published Nov 5, 2024 • 25

Papers - Image - Swin

Swin Transformer: Hierarchical Vision Transformer using Shifted Windows

Paper • 2103.14030 • Published Mar 25, 2021 • 4
A Novel Transformer Based Semantic Segmentation Scheme for Fine-Resolution Remote Sensing Images

Paper • 2104.12137 • Published Apr 25, 2021 • 2
Self-Supervised Learning with Swin Transformers

Paper • 2105.04553 • Published May 10, 2021 • 2
Evaluating Transformer-based Semantic Segmentation Networks for Pathological Image Segmentation

Paper • 2108.11993 • Published Aug 26, 2021 • 2

U-Net: Convolutional Networks for Biomedical Image Segmentation

Paper • 1505.04597 • Published May 18, 2015 • 9
Image Segmentation using U-Net Architecture for Powder X-ray Diffraction Images

Paper • 2310.16186 • Published Oct 24, 2023 • 2
H-DenseUNet: Hybrid Densely Connected UNet for Liver and Tumor Segmentation from CT Volumes

Paper • 1709.07330 • Published Sep 21, 2017 • 2
Deep LOGISMOS: Deep Learning Graph-based 3D Segmentation of Pancreatic Tumors on CT scans

Paper • 1801.08599 • Published Jan 25, 2018 • 2

about 7 hours ago

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

Paper • 2402.04252 • Published Feb 6, 2024 • 25
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

Paper • 2402.03749 • Published Feb 6, 2024 • 12
ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 40
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

Paper • 2402.05008 • Published Feb 7, 2024 • 20

Previous
1
2
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs