Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2312.13286

Small Multimodal Models

Textbooks Are All You Need

Paper • 2306.11644 • Published Jun 20, 2023 • 143
LLaVA-φ: Efficient Multi-Modal Assistant with Small Language Model

Paper • 2401.02330 • Published Jan 4, 2024 • 14
Textbooks Are All You Need II: phi-1.5 technical report

Paper • 2309.05463 • Published Sep 11, 2023 • 87
Visual Instruction Tuning

Paper • 2304.08485 • Published Apr 17, 2023 • 13

aMUSEd: An Open MUSE Reproduction

Paper • 2401.01808 • Published Jan 3, 2024 • 28
From Audio to Photoreal Embodiment: Synthesizing Humans in Conversations

Paper • 2401.01885 • Published Jan 3, 2024 • 27
SteinDreamer: Variance Reduction for Text-to-3D Score Distillation via Stein Identity

Paper • 2401.00604 • Published Dec 31, 2023 • 5
LARP: Language-Agent Role Play for Open-World Games

Paper • 2312.17653 • Published Dec 24, 2023 • 31

Attention Is All You Need

Paper • 1706.03762 • Published Jun 12, 2017 • 50
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Paper • 1810.04805 • Published Oct 11, 2018 • 16
RoBERTa: A Robustly Optimized BERT Pretraining Approach

Paper • 1907.11692 • Published Jul 26, 2019 • 7
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter

Paper • 1910.01108 • Published Oct 2, 2019 • 14

Generative Multimodal Models are In-Context Learners

Paper • 2312.13286 • Published Dec 20, 2023 • 34
StarVector: Generating Scalable Vector Graphics Code from Images

Paper • 2312.11556 • Published Dec 17, 2023 • 27
Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding

Paper • 2311.08046 • Published Nov 14, 2023 • 1
VCoder: Versatile Vision Encoders for Multimodal Large Language Models

Paper • 2312.14233 • Published Dec 21, 2023 • 16

Interesting Papers

Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision

Paper • 2312.09390 • Published Dec 14, 2023 • 32
OneLLM: One Framework to Align All Modalities with Language

Paper • 2312.03700 • Published Dec 6, 2023 • 20
Generative Multimodal Models are In-Context Learners

Paper • 2312.13286 • Published Dec 20, 2023 • 34
The LLM Surgeon

Paper • 2312.17244 • Published Dec 28, 2023 • 9

Running on Zero

1.6k

📺

Stable Video Diffusion 1.1
Generative Multimodal Models are In-Context Learners

Paper • 2312.13286 • Published Dec 20, 2023 • 34
COSMO: COntrastive Streamlined MultimOdal Model with Interleaved Pre-Training

Paper • 2401.00849 • Published Jan 1, 2024 • 16
TheBloke/Sonya-7B-GPTQ

Text Generation • Updated Dec 31, 2023 • 16 • 2

Dissecting In-Context Learning of Translations in GPTs

Paper • 2310.15987 • Published Oct 24, 2023 • 5
In-Context Learning Creates Task Vectors

Paper • 2310.15916 • Published Oct 24, 2023 • 42
ZeroGen: Efficient Zero-shot Learning via Dataset Generation

Paper • 2202.07922 • Published Feb 16, 2022 • 1
Promptor: A Conversational and Autonomous Prompt Generation Agent for Intelligent Text Entry Techniques

Paper • 2310.08101 • Published Oct 12, 2023 • 2

Woodpecker: Hallucination Correction for Multimodal Large Language Models

Paper • 2310.16045 • Published Oct 24, 2023 • 15
HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(ision), LLaVA-1.5, and Other Multi-modality Models

Paper • 2310.14566 • Published Oct 23, 2023 • 25
SILC: Improving Vision Language Pretraining with Self-Distillation

Paper • 2310.13355 • Published Oct 20, 2023 • 8
Conditional Diffusion Distillation

Paper • 2310.01407 • Published Oct 2, 2023 • 20

Eureka: Human-Level Reward Design via Coding Large Language Models

Paper • 2310.12931 • Published Oct 19, 2023 • 26
GENOME: GenerativE Neuro-symbOlic visual reasoning by growing and reusing ModulEs

Paper • 2311.04901 • Published Nov 8, 2023 • 7
Hiformer: Heterogeneous Feature Interactions Learning with Transformers for Recommender Systems

Paper • 2311.05884 • Published Nov 10, 2023 • 5
PolyMaX: General Dense Prediction with Mask Transformer

Paper • 2311.05770 • Published Nov 9, 2023 • 6

Table-GPT: Table-tuned GPT for Diverse Table Tasks

Paper • 2310.09263 • Published Oct 13, 2023 • 39
A Zero-Shot Language Agent for Computer Control with Structured Reflection

Paper • 2310.08740 • Published Oct 12, 2023 • 14
The Consensus Game: Language Model Generation via Equilibrium Search

Paper • 2310.09139 • Published Oct 13, 2023 • 12
PaLI-3 Vision Language Models: Smaller, Faster, Stronger

Paper • 2310.09199 • Published Oct 13, 2023 • 26

Previous
1
2
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs