Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2405.17247

An Introduction to Vision-Language Modeling

Paper • 2405.17247 • Published May 27, 2024 • 87

An Introduction to Vision-Language Modeling

Paper • 2405.17247 • Published May 27, 2024 • 87

An Introduction to Vision-Language Modeling

Paper • 2405.17247 • Published May 27, 2024 • 87

Vidio language Model

An Introduction to Vision-Language Modeling

Paper • 2405.17247 • Published May 27, 2024 • 87

An Introduction to Vision-Language Modeling

Paper • 2405.17247 • Published May 27, 2024 • 87

What matters when building vision-language models?

Paper • 2405.02246 • Published May 3, 2024 • 101
An Introduction to Vision-Language Modeling

Paper • 2405.17247 • Published May 27, 2024 • 87
DeMamba: AI-Generated Video Detection on Million-Scale GenVideo Benchmark

Paper • 2405.19707 • Published May 30, 2024 • 7
Scaling Up Your Kernels: Large Kernel Design in ConvNets towards Universal Representations

Paper • 2410.08049 • Published Oct 10, 2024 • 8

about 20 hours ago

MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training

Paper • 2311.17049 • Published Nov 28, 2023 • 1
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

Paper • 2405.04434 • Published May 7, 2024 • 14
A Study of Autoregressive Decoders for Multi-Tasking in Computer Vision

Paper • 2303.17376 • Published Mar 30, 2023
Sigmoid Loss for Language Image Pre-Training

Paper • 2303.15343 • Published Mar 27, 2023 • 6

Paper Reading List

xLSTM: Extended Long Short-Term Memory

Paper • 2405.04517 • Published May 7, 2024 • 12
You Only Cache Once: Decoder-Decoder Architectures for Language Models

Paper • 2405.05254 • Published May 8, 2024 • 10
Understanding the performance gap between online and offline alignment algorithms

Paper • 2405.08448 • Published May 14, 2024 • 14
Chameleon: Mixed-Modal Early-Fusion Foundation Models

Paper • 2405.09818 • Published May 16, 2024 • 127

image llm works

Visual Fact Checker: Enabling High-Fidelity Detailed Caption Generation

Paper • 2404.19752 • Published Apr 30, 2024 • 22
How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites

Paper • 2404.16821 • Published Apr 25, 2024 • 55
MoAI: Mixture of All Intelligence for Large Language and Vision Models

Paper • 2403.07508 • Published Mar 12, 2024 • 74
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training

Paper • 2403.09611 • Published Mar 14, 2024 • 125

Natural Language (LLM, NLP etc)

Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing

Paper • 2404.12253 • Published Apr 18, 2024 • 54
FlowMind: Automatic Workflow Generation with LLMs

Paper • 2404.13050 • Published Mar 17, 2024 • 34
How Far Can We Go with Practical Function-Level Program Repair?

Paper • 2404.12833 • Published Apr 19, 2024 • 6
Replacing Judges with Juries: Evaluating LLM Generations with a Panel of Diverse Models

Paper • 2404.18796 • Published Apr 29, 2024 • 68

Previous
1
2
3
4
5
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs