KVDrive: A Holistic Multi-Tier KV Cache Management System for Long-Context LLM Inference

Published in SIGMOD 2026, 2026

KVDrive proposes a holistic multi-tier memory management system (GPU memory, host DRAM, and SSD) for long-context LLM inference. It coordinates KV-cache admission, tiering, and scheduling to improve throughput while reducing service degradation under high memory pressure.

Recommended citation: Jian Lin, Jiazhi Mi, Zicong Hong, Haodong Wang, Qianli Liu, Haoyue Zhang, Peng Li, and Song Guo. (2026). KVDrive: A Holistic Multi-Tier KV Cache Management System for Long-Context LLM Inference. SIGMOD 2026.
Download Paper