Research projects on top of vLLM - a julien-c Collection

julien-c 's Collections

Canonical models

Papers about model merging

Recent Mamba Papers

Research projects on top of vLLM

Research projects on top of vLLM

updated Jul 29, 2024

Papers cited in https://blog.vllm.ai/2024/07/25/lfai-perf.html

Mooncake: A KVCache-centric Disaggregated Architecture for LLM Serving

Paper • 2407.00079 • Published Jun 24, 2024 • 5
Llumnix: Dynamic Scheduling for Large Language Model Serving

Paper • 2406.03243 • Published Jun 5, 2024
CacheGen: Fast Context Loading for Language Model Applications

Paper • 2310.07240 • Published Oct 11, 2023 • 1
vAttention: Dynamic Memory Management for Serving LLMs without PagedAttention

Paper • 2405.04437 • Published May 7, 2024 • 3
Andes: Defining and Enhancing Quality-of-Experience in LLM-Based Text Streaming Services

Paper • 2404.16283 • Published Apr 25, 2024
Efficiently Programming Large Language Models using SGLang

Paper • 2312.07104 • Published Dec 12, 2023 • 7