Mooncake: A KVCache-centric Disaggregated Architecture for LLM Serving Paper • 2407.00079 • Published Jun 24, 2024 • 5
Llumnix: Dynamic Scheduling for Large Language Model Serving Paper • 2406.03243 • Published Jun 5, 2024
CacheGen: Fast Context Loading for Language Model Applications Paper • 2310.07240 • Published Oct 11, 2023 • 1
vAttention: Dynamic Memory Management for Serving LLMs without PagedAttention Paper • 2405.04437 • Published May 7, 2024 • 3
Andes: Defining and Enhancing Quality-of-Experience in LLM-Based Text Streaming Services Paper • 2404.16283 • Published Apr 25, 2024
Efficiently Programming Large Language Models using SGLang Paper • 2312.07104 • Published Dec 12, 2023 • 7