-
Notifications
You must be signed in to change notification settings - Fork 7
[Ecosystem] Mooncake #52
Description
Contact emails
james0zan@gmail.com, me@zhyncs.com
Project summary
A KVCache-centric Disaggregated Architecture for LLM Serving
Project description
-
Transfer Engine is a high-performance data transfer framework. Transfer Engine provides a unified interface to transfer data from DRAM, VRAM or NVMe, while the technical details related to hardware are hidden. Transfer Engine supports TCP, RDMA (InfiniBand/RoCEv2/eRDMA/NVIDIA GPUDirect) and NVMe over Fabric (NVMe-of) protocols.
-
P2P Store is built on the Transfer Engine and supports sharing temporary objects between peer nodes in a cluster. P2P Store is ideal for scenarios like checkpoint transfer, where data needs to be rapidly and efficiently shared across a cluster.
-
Mooncake Store is a distributed KVCache storage engine specialized for LLM inference based on Transfer Engine. It is the central component of the KVCache-centric disaggregated architecture. The goal of Mooncake Store is to store the reusable KV caches across various locations in an inference cluster.
-
Mooncake Backend serves as a fault-tolerant PyTorch distributed backend. It provides robust collective communication primitives capable of continuing operation seamlessly in the presence of rank failures. Mooncake EP extends these capabilities to support elastic and fault-tolerant MoE model inference with dynamic token routing.
Are there any other projects in the PyTorch Ecosystem similar to yours? If yes, what are they?
No.
While SGLang, vLLM, and LMCache are also designed for LLM serving, they operate in a collaborative and complementary relationship with Mooncake. Mooncake is deeply integrated into these projects and serves as a foundational infrastructure layer, providing high-performance data transfer for PD and EPD architectures, distributed and shared KV cache storage, and collective communication for EP parallelism, among other core functionalities.
Project repo URL
https://github.com/kvcache-ai/Mooncake
Additional repos in scope of the application
No response
Project license
Apache License
GitHub handles of the project maintainer(s)
@james0zan, @stmatengss, @UNIDY2002, @ShangmingCai, @alogfans, @chestnut-Q, @ykwd
Is there a corporate or academic entity backing this project? If so, please provide the name and URL of the entity.
No response
Website URL
https://kvcache-ai.github.io/Mooncake/
Documentation
https://kvcache-ai.github.io/Mooncake/getting_started/quick-start.html
https://kvcache-ai.github.io/Mooncake/index.html
How do you build and test the project today (continuous integration)? Please describe.
We rely on GitHub Actions as our CI system to automatically build and test the project. An example CI run can be found here: https://github.com/kvcache-ai/Mooncake/actions/runs/20810476279
Version of PyTorch
2.8.0, 2.9.0, 2.9.1
Components of PyTorch
As a communication library for LLM serving, this project primarily leverages PyTorch Distributed. See:
https://docs.pytorch.org/docs/stable/distributed.html
https://kvcache-ai.github.io/Mooncake/python-api-reference/ep-backend.html
How long do you expect to maintain the project?
This project is a critical dependency for inference engines such as vLLM and SGLang. It powers key components, including the transfer engine for PD disaggregation data transfer and the Mooncake store for distributed CPU memory caching. As a result, the project will be actively and continuously maintained.
Additional information
Mooncake’s architecture is based on the FAST 2025 Best Paper. The open-source project has 130 contributors and is maintained and contributed to by developers from NVIDIA, AMD, Intel, Google, Moonshot AI, Tsinghua University, Stanford, Alibaba, Approaching.ai, Ant Group, Tencent, etc., as well as individual contributors.
Mooncake has deep collaboration and integration with PyTorch Foundation projects such as SGLang, vLLM, and LMCache, and has been widely adopted across many companies, operating at scale on thousands of GPUs.
Metadata
Metadata
Assignees
Type
Projects
Status
Status