-
Textbooks Are All You Need
Paper • 2306.11644 • Published • 143 -
LLaVA-φ: Efficient Multi-Modal Assistant with Small Language Model
Paper • 2401.02330 • Published • 14 -
Textbooks Are All You Need II: phi-1.5 technical report
Paper • 2309.05463 • Published • 87 -
Visual Instruction Tuning
Paper • 2304.08485 • Published • 13
Collections
Discover the best community collections!
Collections including paper arxiv:2312.13286
-
aMUSEd: An Open MUSE Reproduction
Paper • 2401.01808 • Published • 28 -
From Audio to Photoreal Embodiment: Synthesizing Humans in Conversations
Paper • 2401.01885 • Published • 27 -
SteinDreamer: Variance Reduction for Text-to-3D Score Distillation via Stein Identity
Paper • 2401.00604 • Published • 5 -
LARP: Language-Agent Role Play for Open-World Games
Paper • 2312.17653 • Published • 31
-
Attention Is All You Need
Paper • 1706.03762 • Published • 50 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 16 -
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Paper • 1907.11692 • Published • 7 -
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
Paper • 1910.01108 • Published • 14
-
Generative Multimodal Models are In-Context Learners
Paper • 2312.13286 • Published • 34 -
StarVector: Generating Scalable Vector Graphics Code from Images
Paper • 2312.11556 • Published • 27 -
Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding
Paper • 2311.08046 • Published • 1 -
VCoder: Versatile Vision Encoders for Multimodal Large Language Models
Paper • 2312.14233 • Published • 16
-
Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision
Paper • 2312.09390 • Published • 32 -
OneLLM: One Framework to Align All Modalities with Language
Paper • 2312.03700 • Published • 20 -
Generative Multimodal Models are In-Context Learners
Paper • 2312.13286 • Published • 34 -
The LLM Surgeon
Paper • 2312.17244 • Published • 9
-
Dissecting In-Context Learning of Translations in GPTs
Paper • 2310.15987 • Published • 5 -
In-Context Learning Creates Task Vectors
Paper • 2310.15916 • Published • 42 -
ZeroGen: Efficient Zero-shot Learning via Dataset Generation
Paper • 2202.07922 • Published • 1 -
Promptor: A Conversational and Autonomous Prompt Generation Agent for Intelligent Text Entry Techniques
Paper • 2310.08101 • Published • 2
-
Woodpecker: Hallucination Correction for Multimodal Large Language Models
Paper • 2310.16045 • Published • 15 -
HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(ision), LLaVA-1.5, and Other Multi-modality Models
Paper • 2310.14566 • Published • 25 -
SILC: Improving Vision Language Pretraining with Self-Distillation
Paper • 2310.13355 • Published • 8 -
Conditional Diffusion Distillation
Paper • 2310.01407 • Published • 20
-
Eureka: Human-Level Reward Design via Coding Large Language Models
Paper • 2310.12931 • Published • 26 -
GENOME: GenerativE Neuro-symbOlic visual reasoning by growing and reusing ModulEs
Paper • 2311.04901 • Published • 7 -
Hiformer: Heterogeneous Feature Interactions Learning with Transformers for Recommender Systems
Paper • 2311.05884 • Published • 5 -
PolyMaX: General Dense Prediction with Mask Transformer
Paper • 2311.05770 • Published • 6
-
Table-GPT: Table-tuned GPT for Diverse Table Tasks
Paper • 2310.09263 • Published • 39 -
A Zero-Shot Language Agent for Computer Control with Structured Reflection
Paper • 2310.08740 • Published • 14 -
The Consensus Game: Language Model Generation via Equilibrium Search
Paper • 2310.09139 • Published • 12 -
PaLI-3 Vision Language Models: Smaller, Faster, Stronger
Paper • 2310.09199 • Published • 26