2026 March 11 SGLang

SGLang Memory Management & KV Cache (Part 1)

SGLang Memory Management System

Key–Value (KV) cache entries can be reused across token generation, improving inference efficiency when cached.
SGLang maintains mapping tables to locate the KV-cache indices of tensors stored in the memory pool.
The mapping mechanism supports multiple attention backends (e.g., MHA, MLA, NSA).
SGLang provides several cache backends to meet different usage scenarios, performance goals, and implementation constraints.

Cache Class	Module	When to Use / Condition
RadixCache	`mem_cache/radix_cache.py`	Default
ChunkCache	`mem_cache/chunk_cache.py`	`disable_radix_cache=True`
SWAChunkCache	`mem_cache/chunk_cache.py`	`disable_radix_cache=True` + sliding window
HiRadixCache	`mem_cache/hiradix_cache.py`	`enable_hierarchical_cache=True`
SWARadixCache	`mem_cache/swa_radix_cache.py`	Sliding-window attention models
MambaRadixCache	`mem_cache/mamba_radix_cache.py`	Mamba / SSM-hybrid models
LMCRadixCache	`mem_cache/storage/lmcache/`	`enable_lmcache=True`
RadixCacheCpp	`mem_cache/radix_cache_cpp.py`	Experimental C++ radix tree