Addition is All You Need for Energy-efficient Language Models Paper • 2410.00907 • Published Oct 1, 2024 • 145 • 17
Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality Paper • 2405.21060 • Published May 31, 2024 • 64 • 3
Kangaroo: Lossless Self-Speculative Decoding via Double Early Exiting Paper • 2404.18911 • Published Apr 29, 2024 • 30 • 2
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits Paper • 2402.17764 • Published Feb 27, 2024 • 607 • 142
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits Paper • 2402.17764 • Published Feb 27, 2024 • 607 • 142
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits Paper • 2402.17764 • Published Feb 27, 2024 • 607 • 142
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits Paper • 2402.17764 • Published Feb 27, 2024 • 607 • 142
GPTVQ: The Blessing of Dimensionality for LLM Quantization Paper • 2402.15319 • Published Feb 23, 2024 • 19 • 3
BitDelta: Your Fine-Tune May Only Be Worth One Bit Paper • 2402.10193 • Published Feb 15, 2024 • 20 • 5
BlackMamba: Mixture of Experts for State-Space Models Paper • 2402.01771 • Published Feb 1, 2024 • 24 • 5
EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty Paper • 2401.15077 • Published Jan 26, 2024 • 19 • 6
EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty Paper • 2401.15077 • Published Jan 26, 2024 • 19 • 6
FP6-LLM: Efficiently Serving Large Language Models Through FP6-Centric Algorithm-System Co-Design Paper • 2401.14112 • Published Jan 25, 2024 • 18 • 7
The Impact of Reasoning Step Length on Large Language Models Paper • 2401.04925 • Published Jan 10, 2024 • 16 • 2