KVDrive: A Holistic Multi-Tier KV Cache Management System for Long-Context LLM Inference
Published in SIGMOD 2026, 2026
KVDrive proposes a holistic multi-tier memory management system (GPU memory, host DRAM, and SSD) for long-context LLM inference. It coordinates KV-cache admission, tiering, and scheduling to improve throughput while reducing service degradation under high memory pressure.
Recommended citation: Jian Lin, Jiazhi Mi, Zicong Hong, Haodong Wang, Qianli Liu, Haoyue Zhang, Peng Li, and Song Guo. (2026). KVDrive: A Holistic Multi-Tier KV Cache Management System for Long-Context LLM Inference. SIGMOD 2026.
Download Paper
