Guidelines for AI coding agents (Copilot, Cursor, Claude Code, etc.) working in this repository.
LMCache is a KV cache management engine for LLM serving that reduces Time To First Token (TTFT) and increases throughput. It stores KV caches across multiple tiers (GPU, CPU, disk, S3) and integrates with vLLM and SGLang.
The default branch is dev. Base all new branches and pull requests against dev.
We recommend using uv to manage Python environments and dependencies:
# Create and activate a virtual environment
uv venv --python 3.12
source .venv/bin/activate
# Install dependencies
uv pip install torch # pre-requisite for CUDA extensions
uv pip install -e . --no-build-isolation# Standard install with CUDA extensions (requires torch pre-installed)
pip install -e . --no-build-isolation
# Source-only (no CUDA extensions)
NO_CUDA_EXT=1 pip install -e .
# HIP/ROCm build
BUILD_WITH_HIP=1 pip install -e .# Run standard test suite (mirrors CI)
pytest -xvs --ignore=tests/disagg \
--ignore=tests/v1/test_nixl_storage.py \
--ignore=tests/v1/multiprocess/ \
--ignore=tests/v1/distributed/ \
--ignore=tests/skipped \
--ignore=tests/v1/storage_backend/test_eic.py
# Run a single test file
pytest -xvs tests/v1/test_cache_engine.py
# Run a single test
pytest -xvs tests/v1/test_cache_engine.py::test_function_nameTest dependencies: uv pip install -r requirements/test.txt
Pytest marker: @pytest.mark.no_shared_allocator disables the shared-allocator monkeypatch for a test.
- Write tests against the public interface and docstring contract, not the implementation. Test as if you don't know the internals — verify that behavior matches what the docstring describes.
- Avoid accessing private members in tests unless strongly needed.
- All new features and bug fixes should include corresponding tests.
- Ensure existing tests still pass before submitting changes.
# Run all checks (mirrors CI exactly)
pre-commit run --all-files
# Individual tools
ruff check . # Lint (E, F, B, SLF rules)
ruff format . # Format (line-length 88)
isort . # Import sorting (black profile, from_first=true)
mypy --config-file=pyproject.toml # Type checking
codespell --toml pyproject.toml # Spell checkingC++/CUDA files use clang-format (Google style, 80-col). Rust code in rust/ uses cargo fmt and cargo clippy.
All Python files require an # SPDX-License-Identifier: Apache-2.0 header as the first line.
Imports must follow this section-heading convention:
# Standard
import os
# Third Party
import torch
# First Party
from lmcache.v1.config import LMCacheEngineConfig
# Local
from .utils import helperSLF lint rules are currently enforced by CI only in lmcache/v1/multiprocess/ and lmcache/v1/distributed/. However, all new code should follow SLF discipline regardless of location — never access private members (prefixed with _) of other classes. Treat this as a project-wide coding standard for any new or modified code.
All functions and methods must have type hints for their arguments and return values.
Every public function and method must have a clear docstring covering:
- What the function does
- Arguments (with types and descriptions)
- Return values
- Raised exceptions (if any)
- Additional notes when behavior is non-obvious
User-facing and design documentation lives in the docs/source/ directory and is built with Sphinx. Documentation files use reStructuredText (.rst). When adding or modifying docs, place them in the appropriate subdirectory under docs/source/ (e.g., developer_guide/, getting_started/, kv_cache/) and make sure any new pages are linked from a toctree so they appear in the built site.
When writing or updating documentation, follow these principles:
- Be concrete and concise. State exactly what something does and why — avoid vague, hand-wavy descriptions. One precise sentence beats a paragraph of generalities.
- Include examples. Show concrete code snippets, command invocations, or data formats so the reader can immediately see how things work in practice.
- Explain the why, not just the what. Briefly state the design motivation or trade-off behind a decision so readers understand the reasoning.
- Use diagrams or short flows for complex interactions. When multiple components interact (e.g., the multiprocess pipeline), a short step-by-step flow or ASCII diagram is far clearer than prose alone.
- Keep scope focused. Each document should have a clear audience and purpose. Don't mix user-facing setup guides with internal architecture notes.
Always verify that the Sphinx build passes after making documentation changes:
# Install doc dependencies (one-time)
pip install -r requirements/docs.txt
# Build (from the docs/ directory)
cd docs
make clean
make htmlThe build must complete without errors or warnings. Review the generated HTML in docs/build/html/ to confirm formatting, links, and examples render correctly. You can preview locally with:
python -m http.server -d build/html/Never access private members (prefixed with _) of other classes. Interact only through their public APIs.
- Module-level helper functions go at the top of the file (after imports, before classes).
- Private/helper methods within a class go at the end of the class, after all public methods.
When reviewing code (or self-checking before submitting), verify all of the following:
- The code does what it claims to do and matches the PR description.
- Edge cases are handled (empty inputs, None values, boundary conditions).
- No regressions to existing functionality — existing tests still pass.
-
pre-commit run --all-filespasses with no errors. - All new/modified functions have type hints for arguments and return values.
- All new/modified public functions have complete docstrings.
- License header (
# SPDX-License-Identifier: Apache-2.0) is present on all Python files. - Import ordering follows the section-heading convention (Standard / Third Party / First Party / Local).
- No direct access to private members (
_-prefixed) of other classes. - New public APIs are minimal and well-defined — avoid exposing internals.
- Module-level helpers are placed at the top; private methods at the end of the class.
- New features and bug fixes include corresponding tests.
- Tests target the public interface and docstring contract, not implementation details.
- Tests pass locally:
pytest -xvswith the standard ignore flags.
- New or updated documentation is concrete, concise, and includes examples.
- Design decisions explain the why, not just the what.
- Docs are placed in the correct subdirectory under
docs/source/and linked from atoctree. - Sphinx build passes cleanly:
cd docs && make clean && make htmlcompletes without errors or warnings.
- No security vulnerabilities (injection, unsafe deserialization, etc.).
- No unnecessary memory copies or allocations in hot paths.
- Thread safety is maintained for shared data structures.
- CUDA/GPU resources are properly managed (allocated, freed, synchronized).