L4Q: Parameter Efficient Quantization-Aware Training on Large Language Models via LoRA-wise LSQ Paper • 2402.04902 • Published Feb 7, 2024 • 4
Mixture of Scales: Memory-Efficient Token-Adaptive Binarization for Large Language Models Paper • 2406.12311 • Published Jun 18, 2024 • 6
SLEB: Streamlining LLMs through Redundancy Verification and Elimination of Transformer Blocks Paper • 2402.09025 • Published Feb 14, 2024 • 6