Mooncake is a KVCache-centric disaggregated architecture for LLM serving. The core of Mooncake is the Transfer Engine, which provides a unified interface for batched data transfer across various storage devices and network links. Supporting multiple protocols including TCP, RDMA, CXL/shared-memory, and NVMe over Fabric (NVMe-of), Transfer Engine is designed to enable fast and reliable data transfer for AI workloads. Compared to Gloo (used by Distributed PyTorch) and traditional TCP, Transfer Engine achieves significantly lower I/O latency, making it a superior solution for efficient data transmission.
Mooncake transfer engine is a high-performance, zero-copy data transfer library. To achieve better performance in NIXL, we have designed an new backend based on Mooncake Transfer Engine.
-
Build the install Mooncake manually. You can refer to the installation guide here.
git clone https://github.com/kvcache-ai/Mooncake.git cd Mooncake bash dependencies.sh mkdir build cd build cmake .. -DBUILD_SHARED_LIBS=ON make -j sudo make install[!IMPORTANT] You must build and install the shared library (
-DBUILD_SHARED_LIBS=ON) before building NIXL with the Mooncake backend. -
Build NIXL, ensuring that the option
disable_mooncake_backendis set asfalse. -
To test the Mooncake backend, you can run the unit test in
test/unit/plugins/mooncake/mooncake_backend_test.
- The
Notif[ication]andProgTh[read]features are not supported. - The current version of Mooncake Transfer Engine manages metadata exchange by itself, which is different from NIXL.
- The sum of the number of release requests for each handle allocated by
prepXfer()should be less thankMaxRequestCount(1024).
Important
We are working for refactoring Mooncake Transfer Engine to make it more adaptful and useful.