Tempo14
's Collections
quantization
updated
BiLLM: Pushing the Limit of Post-Training Quantization for LLMs
Paper
•
2402.04291
•
Published
•
48
OneBit: Towards Extremely Low-bit Large Language Models
Paper
•
2402.11295
•
Published
•
23
A Survey on Transformer Compression
Paper
•
2402.05964
•
Published
Towards Next-Level Post-Training Quantization of Hyper-Scale
Transformers
Paper
•
2402.08958
•
Published
•
3
BitDelta: Your Fine-Tune May Only Be Worth One Bit
Paper
•
2402.10193
•
Published
•
19
GPTVQ: The Blessing of Dimensionality for LLM Quantization
Paper
•
2402.15319
•
Published
•
19
EasyQuant: An Efficient Data-free Quantization Algorithm for LLMs
Paper
•
2403.02775
•
Published
•
11
4-bit Shampoo for Memory-Efficient Network Training
Paper
•
2405.18144
•
Published
•
9
PrefixQuant: Static Quantization Beats Dynamic through Prefixed Outliers
in LLMs
Paper
•
2410.05265
•
Published
•
30
BitNet a4.8: 4-bit Activations for 1-bit LLMs
Paper
•
2411.04965
•
Published
•
64
"Give Me BF16 or Give Me Death"? Accuracy-Performance Trade-Offs in LLM
Quantization
Paper
•
2411.02355
•
Published
•
46
NeuZip: Memory-Efficient Training and Inference with Dynamic Compression
of Neural Networks
Paper
•
2410.20650
•
Published
•
16
BitStack: Fine-Grained Size Control for Compressed Large Language Models
in Variable Memory Environments
Paper
•
2410.23918
•
Published
•
18