-
Notifications
You must be signed in to change notification settings - Fork 416
Description
Changes proposed
As mentioned in the previous issue #171,#333 , by offloading KV cache to SSDs to support Mooncake's multi-level caching mechanism, we can further improve the reuse rate of KV cache and address the issue of limited DRAM space in certain scenarios.
Currently, we have implemented Version 1 of KV cache offloading, #437 with the following mechanisms:
- Client-side persistence: We plan to offload and install KV cache on DFS (3FS) to facilitate unified file synchronization across nodes. All read/write/query operations for KV cache objects are performed entirely on the client side, with the master node remaining unaware of them. The index mapping from keys to KV cache objects in the file system is maintained by a fixed indexing mechanism, where each file corresponds to a KV cache object (the filename serves as the key).
- POSIX read/write: Currently, all file I/O operations are performed using POSIX interfaces. For
put/batchputoperations, we only submit a persistence request to the thread pool after a successful in-memory write, without further verification of write success. (If the write fails, the file is automatically deleted to prevent indexing by other instances.) Forgetoperations, synchronous reads are used, whilebatchgetemploys asynchronous batch reads to improve throughput.
Future To-Do List
-
Native 3FS Interface (Merged)
Since the ultimate goal is to support this persistence feature on 3FS, and the current POSIX implementation (via FUSE) still impacts I/O performance, we plan to introduce a 3FS-native plugin interface to further optimize file read performance forget/batchget. -
Master-Managed KV Cache in SSD (Merged)
The current implementation manages SSD KV cache on the client side, with metadata synchronization handled by DFS (the master remains unaware). While this approach ensures loose coupling, the lack of centralized management introduces consistency and performance issues. Future plans include migrating KV cache metadata to the master, leveraging an extended replica mechanism to support both memory and disk modes. Benefits include:- Reduced query latency: Currently,
query/existoperations require filesystem access, incurring high overhead for large datasets. Moving metadata to the master enables single-RPC lookups for SSD/memory status. - Consistent behavior: Ensures alignment with memory semantics for operations like
removeAllandtearDownAll. - Race condition mitigation: Resolves issues like "remove-before-write" through centralized coordination.
- Reduced query latency: Currently,
-
File Eviction Mechanism (WIP)
Currently, file deletion relies on manual user calls (remove/removeAll) or admin intervention. Without automatic eviction, long-running clusters risk storage bloat. Future versions will introduce monitoring and auto-eviction policies. -
Master-Triggered Eviction & Persistence (WIP)
Presently, every successfulputtriggers persistence, effectively backing up KV cache entries. We aim to shift persistence to the master’s eviction phase, where evicted data is written to SSDs. Challenges include:- The master currently handles only metadata, not data flow.
- Data distribution across nodes complicates persistence during eviction.
A well-designed solution will be explored in future iterations.
We welcome feedback and suggestions on this design and implementation.
Before submitting a new issue...
- Make sure you already searched for relevant issues and read the documentation