-
57π’
Llava
-
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
Paper β’ 2311.05437 β’ Published β’ 48 -
Ziya-VL: Bilingual Large Vision-Language Model via Multi-Task Instruction Tuning
Paper β’ 2310.08166 β’ Published β’ 1 -
973π₯Όππ
OOTDiffusion
High-quality virtual try-on ~ Your cyber fitting room
Collections
Discover the best community collections!
Collections including paper arxiv:2311.05437
-
Attention Is All You Need
Paper β’ 1706.03762 β’ Published β’ 50 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper β’ 1810.04805 β’ Published β’ 16 -
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Paper β’ 1907.11692 β’ Published β’ 7 -
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
Paper β’ 1910.01108 β’ Published β’ 14
-
Extending Context Window of Large Language Models via Semantic Compression
Paper β’ 2312.09571 β’ Published β’ 12 -
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
Paper β’ 2311.05437 β’ Published β’ 48 -
LLaVA-Grounding: Grounded Visual Chat with Large Multimodal Models
Paper β’ 2312.02949 β’ Published β’ 11 -
TinyLLaVA: A Framework of Small-scale Large Multimodal Models
Paper β’ 2402.14289 β’ Published β’ 19
-
Visual In-Context Prompting
Paper β’ 2311.13601 β’ Published β’ 16 -
AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation Framework
Paper β’ 2308.08155 β’ Published β’ 5 -
LIDA: A Tool for Automatic Generation of Grammar-Agnostic Visualizations and Infographics using Large Language Models
Paper β’ 2303.02927 β’ Published β’ 3 -
The Impact of Large Language Models on Scientific Discovery: a Preliminary Study using GPT-4
Paper β’ 2311.07361 β’ Published β’ 12
-
LayoutPrompter: Awaken the Design Ability of Large Language Models
Paper β’ 2311.06495 β’ Published β’ 10 -
Q-Instruct: Improving Low-level Visual Abilities for Multi-modality Foundation Models
Paper β’ 2311.06783 β’ Published β’ 26 -
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
Paper β’ 2311.05437 β’ Published β’ 48 -
TEAL: Tokenize and Embed ALL for Multi-modal Large Language Models
Paper β’ 2311.04589 β’ Published β’ 18
-
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
Paper β’ 2311.05437 β’ Published β’ 48 -
On the Road with GPT-4V(ision): Early Explorations of Visual-Language Model on Autonomous Driving
Paper β’ 2311.05332 β’ Published β’ 9 -
SoundCam: A Dataset for Finding Humans Using Room Acoustics
Paper β’ 2311.03517 β’ Published β’ 10