DC-AE-Diffusion Collection Efficient Diffusion Models with Deep Compression Autoencoder • 8 items • Updated about 1 month ago • 6
LLM Inference Unveiled: Survey and Roofline Model Insights Paper • 2402.16363 • Published Feb 26, 2024 • 2
LLM Inference Unveiled: Survey and Roofline Model Insights Paper • 2402.16363 • Published Feb 26, 2024 • 2
QuEST: Low-bit Diffusion Model Quantization via Efficient Selective Finetuning Paper • 2402.03666 • Published Feb 6, 2024
WKVQuant: Quantizing Weight and Key/Value Cache for Large Language Models Gains More Paper • 2402.12065 • Published Feb 19, 2024
DiTFastAttn: Attention Compression for Diffusion Transformer Models Paper • 2406.08552 • Published Jun 12, 2024 • 23
PD-Quant: Post-Training Quantization based on Prediction Difference Metric Paper • 2212.07048 • Published Dec 14, 2022