Accurate Compression of Text-to-Image Diffusion Models via Vector Quantization Paper • 2409.00492 • Published Aug 31, 2024 • 12
Enabling High-Sparsity Foundational Llama Models with Efficient Pretraining and Deployment Paper • 2405.03594 • Published May 6, 2024 • 7
Sparse Finetuning for Inference Acceleration of Large Language Models Paper • 2310.06927 • Published Oct 10, 2023 • 14
The Optimal BERT Surgeon: Scalable and Accurate Second-Order Pruning for Large Language Models Paper • 2203.07259 • Published Mar 14, 2022 • 3