Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2405.17247

Video Understanding

Vript: A Video Is Worth Thousands of Words

Paper • 2406.06040 • Published Jun 10, 2024 • 26
ShareGPT4Video: Improving Video Understanding and Generation with Better Captions

Paper • 2406.04325 • Published Jun 6, 2024 • 73
MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark

Paper • 2406.01574 • Published Jun 3, 2024 • 45
Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis

Paper • 2405.21075 • Published May 31, 2024 • 22

Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation

Paper • 2406.06525 • Published Jun 10, 2024 • 67
Husky: A Unified, Open-Source Language Agent for Multi-Step Reasoning

Paper • 2406.06469 • Published Jun 10, 2024 • 25
Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models

Paper • 2406.04271 • Published Jun 6, 2024 • 29
Block Transformer: Global-to-Local Language Modeling for Fast Inference

Paper • 2406.02657 • Published Jun 4, 2024 • 38

Running on A10G

188

🚀

Video LLaMA
Running on Zero

1.6k

📺

Stable Video Diffusion 1.1
Running on Zero

1.86k

📱🔲

QR Code AI Art Generator

QR Code AI Art Generator Blend QR codes with AI Art
Running on CPU Upgrade

9.16k

👩‍🎨

AI Comic Factory

Create your own AI comic with a single prompt

To Believe or Not to Believe Your LLM

Paper • 2406.02543 • Published Jun 4, 2024 • 33
An Introduction to Vision-Language Modeling

Paper • 2405.17247 • Published May 27, 2024 • 87

Vision Language Models

An Introduction to Vision-Language Modeling

Paper • 2405.17247 • Published May 27, 2024 • 87

An Introduction to Vision-Language Modeling

Paper • 2405.17247 • Published May 27, 2024 • 87

An Introduction to Vision-Language Modeling

Paper • 2405.17247 • Published May 27, 2024 • 87

An Introduction to Vision-Language Modeling

Paper • 2405.17247 • Published May 27, 2024 • 87

An Introduction to Vision-Language Modeling

Paper • 2405.17247 • Published May 27, 2024 • 87
SHIC: Shape-Image Correspondences with no Keypoint Supervision

Paper • 2407.18907 • Published Jul 26, 2024 • 41
General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model

Paper • 2409.01704 • Published Sep 3, 2024 • 83

An Introduction to Vision-Language Modeling

Paper • 2405.17247 • Published May 27, 2024 • 87

Previous
1
2
3
4
5
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs