GEAR: An Efficient KV Cache Compression Recipefor Near-Lossless Generative Inference of LLM Paper • 2403.05527 • Published Mar 8, 2024 • 1
Hymba: A Hybrid-head Architecture for Small Language Models Paper • 2411.13676 • Published Nov 20, 2024 • 41
LaMDA: Large Model Fine-Tuning via Spectrally Decomposed Low-Dimensional Adaptation Paper • 2406.12832 • Published Jun 18, 2024