-
Scaling Instruction-Finetuned Language Models
Paper • 2210.11416 • Published • 7 -
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Paper • 2312.00752 • Published • 139 -
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
Paper • 2403.05530 • Published • 62 -
Yi: Open Foundation Models by 01.AI
Paper • 2403.04652 • Published • 62
Collections
Discover the best community collections!
Collections including paper arxiv:2402.16837
-
Evaluating Very Long-Term Conversational Memory of LLM Agents
Paper • 2402.17753 • Published • 19 -
StructLM: Towards Building Generalist Models for Structured Knowledge Grounding
Paper • 2402.16671 • Published • 27 -
Do Large Language Models Latently Perform Multi-Hop Reasoning?
Paper • 2402.16837 • Published • 25 -
Divide-or-Conquer? Which Part Should You Distill Your LLM?
Paper • 2402.15000 • Published • 22
-
Efficient Tool Use with Chain-of-Abstraction Reasoning
Paper • 2401.17464 • Published • 17 -
Divide and Conquer: Language Models can Plan and Self-Correct for Compositional Text-to-Image Generation
Paper • 2401.15688 • Published • 11 -
SliceGPT: Compress Large Language Models by Deleting Rows and Columns
Paper • 2401.15024 • Published • 70 -
From GPT-4 to Gemini and Beyond: Assessing the Landscape of MLLMs on Generalizability, Trustworthiness and Causality through Four Modalities
Paper • 2401.15071 • Published • 35
-
CRUXEval: A Benchmark for Code Reasoning, Understanding and Execution
Paper • 2401.03065 • Published • 11 -
DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence
Paper • 2401.14196 • Published • 49 -
WaveCoder: Widespread And Versatile Enhanced Instruction Tuning with Refined Data Generation
Paper • 2312.14187 • Published • 49 -
On the Effectiveness of Large Language Models in Domain-Specific Code Generation
Paper • 2312.01639 • Published • 1
-
Meta-Prompting: Enhancing Language Models with Task-Agnostic Scaffolding
Paper • 2401.12954 • Published • 29 -
Learning Universal Predictors
Paper • 2401.14953 • Published • 19 -
TravelPlanner: A Benchmark for Real-World Planning with Language Agents
Paper • 2402.01622 • Published • 35 -
Do Large Language Models Latently Perform Multi-Hop Reasoning?
Paper • 2402.16837 • Published • 25
-
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models
Paper • 2402.19427 • Published • 53 -
Simple linear attention language models balance the recall-throughput tradeoff
Paper • 2402.18668 • Published • 19 -
ChunkAttention: Efficient Self-Attention with Prefix-Aware KV Cache and Two-Phase Partition
Paper • 2402.15220 • Published • 19 -
Linear Transformers are Versatile In-Context Learners
Paper • 2402.14180 • Published • 6
-
Beyond Surface: Probing LLaMA Across Scales and Layers
Paper • 2312.04333 • Published • 19 -
SymbolicAI: A framework for logic-based approaches combining generative models and solvers
Paper • 2402.00854 • Published • 20 -
TravelPlanner: A Benchmark for Real-World Planning with Language Agents
Paper • 2402.01622 • Published • 35 -
Do Large Language Models Latently Perform Multi-Hop Reasoning?
Paper • 2402.16837 • Published • 25