Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2309.11419

Woodpecker: Hallucination Correction for Multimodal Large Language Models

Paper • 2310.16045 • Published Oct 24, 2023 • 15
HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(ision), LLaVA-1.5, and Other Multi-modality Models

Paper • 2310.14566 • Published Oct 23, 2023 • 25
SILC: Improving Vision Language Pretraining with Self-Distillation

Paper • 2310.13355 • Published Oct 20, 2023 • 8
Conditional Diffusion Distillation

Paper • 2310.01407 • Published Oct 2, 2023 • 20

monuirctc/invoice_extraction_layoutlmv2

Updated Aug 10, 2021
Kosmos-2.5: A Multimodal Literate Model

Paper • 2309.11419 • Published Sep 20, 2023 • 50

Text and Image recognition

Kosmos-2.5: A Multimodal Literate Model

Paper • 2309.11419 • Published Sep 20, 2023 • 50

Applied Machine Learning Papers

Reading List (Mainly Focused of VLM's and Diffusion Models)

Scalable Diffusion Models with Transformers

Paper • 2212.09748 • Published Dec 19, 2022 • 17
Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets

Paper • 2311.15127 • Published Nov 25, 2023 • 12
Learning Transferable Visual Models From Natural Language Supervision

Paper • 2103.00020 • Published Feb 26, 2021 • 11
U-Net: Convolutional Networks for Biomedical Image Segmentation

Paper • 1505.04597 • Published May 18, 2015 • 9

Kosmos-2.5: A Multimodal Literate Model

Paper • 2309.11419 • Published Sep 20, 2023 • 50

😱 Microsoft Papers with no code/data release

Collection of Microsoft Papers with no code/data release

MEGA: Multilingual Evaluation of Generative AI

Paper • 2303.12528 • Published Mar 22, 2023
MEGAVERSE: Benchmarking Large Language Models Across Languages, Modalities, Models and Tasks

Paper • 2311.07463 • Published Nov 13, 2023 • 13
Kosmos-2.5: A Multimodal Literate Model

Paper • 2309.11419 • Published Sep 20, 2023 • 50
A Unified View of Masked Image Modeling

Paper • 2210.10615 • Published Oct 19, 2022

Kosmos-2: Grounding Multimodal Large Language Models to the World

Paper • 2306.14824 • Published Jun 26, 2023 • 34
Kosmos-G: Generating Images in Context with Multimodal Large Language Models

Paper • 2310.02992 • Published Oct 4, 2023 • 4
Kosmos-2.5: A Multimodal Literate Model

Paper • 2309.11419 • Published Sep 20, 2023 • 50
AnyMAL: An Efficient and Scalable Any-Modality Augmented Language Model

Paper • 2309.16058 • Published Sep 27, 2023 • 55

Kosmos-2.5: A Multimodal Literate Model

Paper • 2309.11419 • Published Sep 20, 2023 • 50

Kosmos-2.5: A Multimodal Literate Model

Paper • 2309.11419 • Published Sep 20, 2023 • 50

Kosmos-2.5: A Multimodal Literate Model

Paper • 2309.11419 • Published Sep 20, 2023 • 50
Mirasol3B: A Multimodal Autoregressive model for time-aligned and contextual modalities

Paper • 2311.05698 • Published Nov 9, 2023 • 9
Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks

Paper • 2311.06242 • Published Nov 10, 2023 • 87
PolyMaX: General Dense Prediction with Mask Transformer

Paper • 2311.05770 • Published Nov 9, 2023 • 6

Previous
1
2
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs