Head-wise Shareable Attention for Large Language Models Paper • 2402.11819 • Published Feb 19, 2024 • 1
MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases Paper • 2402.14905 • Published Feb 22, 2024 • 127