danielz01
's Collections
Efficient LLM
updated
FlashDecoding++: Faster Large Language Model Inference on GPUs
Paper
•
2311.01282
•
Published
•
35
S-LoRA: Serving Thousands of Concurrent LoRA Adapters
Paper
•
2311.03285
•
Published
•
28
Parameter-Efficient Orthogonal Finetuning via Butterfly Factorization
Paper
•
2311.06243
•
Published
•
17
FlashFFTConv: Efficient Convolutions for Long Sequences with Tensor
Cores
Paper
•
2311.05908
•
Published
•
12
Tied-Lora: Enhacing parameter efficiency of LoRA with weight tying
Paper
•
2311.09578
•
Published
•
14
I&S-ViT: An Inclusive & Stable Method for Pushing the Limit of
Post-Training ViTs Quantization
Paper
•
2311.10126
•
Published
•
7
SparQ Attention: Bandwidth-Efficient LLM Inference
Paper
•
2312.04985
•
Published
•
38
A Survey of Resource-efficient LLM and Multimodal Foundation Models
Paper
•
2401.08092
•
Published
•
3
SliceGPT: Compress Large Language Models by Deleting Rows and Columns
Paper
•
2401.15024
•
Published
•
69
EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty
Paper
•
2401.15077
•
Published
•
19
Decoding Compressed Trust: Scrutinizing the Trustworthiness of Efficient
LLMs Under Compression
Paper
•
2403.15447
•
Published
•
16
A Controlled Study on Long Context Extension and Generalization in LLMs
Paper
•
2409.12181
•
Published
•
44