-
Notifications
You must be signed in to change notification settings - Fork 606
Description
Describe your feature request
Background
The Mooncake Store, built on top of the Mooncake Transfer Engine, has demonstrated excellent performance and stability. As a result, it is increasingly being adopted for model weight storage and update workflows in LLM systems.
Several real-world scenarios benefit from this capability:
- Reinforcement Learning (RL): Mooncake Store can be used to efficiently publish and fetch updated model weights during training loops.
- Model Management: It can serve as a high-performance backend for storing and retrieving model weights across different training or inference services.
Many existing system designs, APIs, and optimizations in Mooncake Store were originally built for KVCache storage. A large portion of these components can be reused for model weight storage. However, certain design assumptions made for KVCache workloads are not ideal for model weight scenarios.
Current Limitations
-
Lack of a Hard Pin Mechanism
In most cases, model weights should not be evicted unexpectedly.
Currently, the system provides a soft pin mechanism. In theory, soft pinning can emulate hard pin behavior using configurations such as:
--default_kv_soft_pin_ttl=N --allow_evict_soft_pinned_objects=false
where
Nis a sufficiently large number.However, this approach is not user-friendly, and the semantics are indirect. A dedicated hard pin mechanism would provide clearer guarantees and improve usability.
-
Missing Upsert Interface
Mooncake Store originally assumes that KVCache entries are immutable, so the system does not provide update semantics.
In reinforcement learning workflows, however, model weights are updated frequently. The current workaround is using
remove → putprocess to update the same weight object.While this approach works, providing a native
upsertinterface would offer several advantages:- Simplifies user workflows
- Provides a clearer API for weight updates
- Creates opportunities for future optimization
- Enables improved fault tolerance for update operations
Discussion
During early discussions, we considered introducing an “RL mode” for the master component. In this mode, the system would:
- Enable hard pinning by default
- Support upsert operations
- Apply other configurations optimized for reinforcement learning workloads
However, we later realized that model weight storage is not limited to reinforcement learning.
For example, in model management systems, different models may have different storage requirements:
- Important models may require hard pinning
- Non-important models may use soft pinning
- Remaining storage space may still be used to cache KVCache objects
Because of these mixed workloads, introducing a fixed mode with predefined behaviors would reduce flexibility.
Instead, we believe a better approach is to introduce these mechanisms as independent features, such as:
- A native hard pin mechanism
- A dedicated upsert API
This design allows users to compose behaviors based on their needs, while keeping the system extensible for future requirements such as engram table storage.
Proposed Contributions
We welcome contributions in the following areas:
-
Hard Pin Support
- Introduce a hard pin mechanism for objects where the pinned objects cannot be evicted by the eviction policy
-
Upsert API
- Implement an
upsertinterface for updating objects
- Implement an
-
Documentation and Examples
- Provide usage examples for model weight storage
- Document recommended patterns for RL and model management scenarios
Expected Outcome
By introducing these mechanisms, Mooncake Store will better support model weight storage workloads, including:
- Reinforcement learning training pipelines
- Model management systems
- Hybrid environments combining KVCache and model weights
At the same time, this approach preserves the flexibility and extensibility of the system architecture.
How to Contribute
If you are interested in contributing:
- Join the discussion in this issue.
- Share your design proposals or implementation ideas.
- Submit a pull request with your implementation.
We welcome feedback, design discussions, and implementation contributions from the community.
Before submitting a new issue...
- Make sure you already searched for relevant issues and read the documentation