-
Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads
Paper • 2401.10774 • Published • 54 -
APAR: LLMs Can Do Auto-Parallel Auto-Regressive Decoding
Paper • 2401.06761 • Published • 1 -
Infinite-LLM: Efficient LLM Service for Long Context with DistAttention and Distributed KVCache
Paper • 2401.02669 • Published • 14 -
MambaByte: Token-free Selective State Space Model
Paper • 2401.13660 • Published • 54
Collections
Discover the best community collections!
Collections including paper arxiv:2401.18058
-
E^2-LLM: Efficient and Extreme Length Extension of Large Language Models
Paper • 2401.06951 • Published • 25 -
Extending LLMs' Context Window with 100 Samples
Paper • 2401.07004 • Published • 15 -
Soaring from 4K to 400K: Extending LLM's Context with Activation Beacon
Paper • 2401.03462 • Published • 27 -
The Hedgehog & the Porcupine: Expressive Linear Attentions with Softmax Mimicry
Paper • 2402.04347 • Published • 14
-
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Paper • 2312.00752 • Published • 139 -
SparQ Attention: Bandwidth-Efficient LLM Inference
Paper • 2312.04985 • Published • 39 -
Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models
Paper • 2401.04658 • Published • 27 -
E^2-LLM: Efficient and Extreme Length Extension of Large Language Models
Paper • 2401.06951 • Published • 25
-
A Picture is Worth More Than 77 Text Tokens: Evaluating CLIP-Style Models on Dense Captions
Paper • 2312.08578 • Published • 17 -
ZeroQuant(4+2): Redefining LLMs Quantization with a New FP6-Centric Strategy for Diverse Generative Tasks
Paper • 2312.08583 • Published • 9 -
Vision-Language Models as a Source of Rewards
Paper • 2312.09187 • Published • 12 -
StemGen: A music generation model that listens
Paper • 2312.08723 • Published • 48
-
GPQA: A Graduate-Level Google-Proof Q&A Benchmark
Paper • 2311.12022 • Published • 27 -
GAIA: a benchmark for General AI Assistants
Paper • 2311.12983 • Published • 188 -
gorilla-llm/APIBench
Updated • 109 • 66 -
Purple Llama CyberSecEval: A Secure Coding Benchmark for Language Models
Paper • 2312.04724 • Published • 21
-
Moral Foundations of Large Language Models
Paper • 2310.15337 • Published • 1 -
Specific versus General Principles for Constitutional AI
Paper • 2310.13798 • Published • 2 -
Contrastive Prefence Learning: Learning from Human Feedback without RL
Paper • 2310.13639 • Published • 25 -
RLAIF: Scaling Reinforcement Learning from Human Feedback with AI Feedback
Paper • 2309.00267 • Published • 48
-
TRAMS: Training-free Memory Selection for Long-range Language Modeling
Paper • 2310.15494 • Published • 1 -
A Long Way to Go: Investigating Length Correlations in RLHF
Paper • 2310.03716 • Published • 9 -
YaRN: Efficient Context Window Extension of Large Language Models
Paper • 2309.00071 • Published • 67 -
Giraffe: Adventures in Expanding Context Lengths in LLMs
Paper • 2308.10882 • Published • 1