Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2408.02622

about 15 hours ago

iVideoGPT: Interactive VideoGPTs are Scalable World Models

Paper • 2405.15223 • Published May 24, 2024 • 12
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models

Paper • 2405.15574 • Published May 24, 2024 • 53
An Introduction to Vision-Language Modeling

Paper • 2405.17247 • Published May 27, 2024 • 87
Matryoshka Multimodal Models

Paper • 2405.17430 • Published May 27, 2024 • 31

Language Model Can Listen While Speaking

Paper • 2408.02622 • Published Aug 5, 2024 • 37

Language Model Can Listen While Speaking

Paper • 2408.02622 • Published Aug 5, 2024 • 37

Language Model Can Listen While Speaking

Paper • 2408.02622 • Published Aug 5, 2024 • 37

AI-MO/NuminaMath-7B-TIR

Text Generation • Updated Aug 14, 2024 • 2.53k • 322
LLaVA-NeXT-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal Models

Paper • 2407.07895 • Published Jul 10, 2024 • 40
RhapsodyAI/MiniCPM-V-Embedding-preview

Feature Extraction • Updated Aug 20, 2024 • 40 • 44
Language Model Can Listen While Speaking

Paper • 2408.02622 • Published Aug 5, 2024 • 37

Language Models as Compilers: Simulating Pseudocode Execution Improves Algorithmic Reasoning in Language Models

Paper • 2404.02575 • Published Apr 3, 2024 • 48
Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing

Paper • 2404.12253 • Published Apr 18, 2024 • 54
SnapKV: LLM Knows What You are Looking for Before Generation

Paper • 2404.14469 • Published Apr 22, 2024 • 23
FlowMind: Automatic Workflow Generation with LLMs

Paper • 2404.13050 • Published Mar 17, 2024 • 34

Audio/Music/Speech/etc.

Language Model Can Listen While Speaking

Paper • 2408.02622 • Published Aug 5, 2024 • 37

MegaScale: Scaling Large Language Model Training to More Than 10,000 GPUs

Paper • 2402.15627 • Published Feb 23, 2024 • 34
Rainbow Teaming: Open-Ended Generation of Diverse Adversarial Prompts

Paper • 2402.16822 • Published Feb 26, 2024 • 15
FuseChat: Knowledge Fusion of Chat Models

Paper • 2402.16107 • Published Feb 25, 2024 • 36
Multi-LoRA Composition for Image Generation

Paper • 2402.16843 • Published Feb 26, 2024 • 28

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

Paper • 2402.04252 • Published Feb 6, 2024 • 25
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

Paper • 2402.03749 • Published Feb 6, 2024 • 12
ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 40
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

Paper • 2402.05008 • Published Feb 7, 2024 • 20

A Picture is Worth More Than 77 Text Tokens: Evaluating CLIP-Style Models on Dense Captions

Paper • 2312.08578 • Published Dec 14, 2023 • 16
ZeroQuant(4+2): Redefining LLMs Quantization with a New FP6-Centric Strategy for Diverse Generative Tasks

Paper • 2312.08583 • Published Dec 14, 2023 • 9
Vision-Language Models as a Source of Rewards

Paper • 2312.09187 • Published Dec 14, 2023 • 11
StemGen: A music generation model that listens

Paper • 2312.08723 • Published Dec 14, 2023 • 47

Previous
1
2
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs