Skip to content

[Call for Contribution] Enhance Model Weight Storage for Mooncake Store #1621

@ykwd

Description

@ykwd

Describe your feature request

Background

The Mooncake Store, built on top of the Mooncake Transfer Engine, has demonstrated excellent performance and stability. As a result, it is increasingly being adopted for model weight storage and update workflows in LLM systems.

Several real-world scenarios benefit from this capability:

  • Reinforcement Learning (RL): Mooncake Store can be used to efficiently publish and fetch updated model weights during training loops.
  • Model Management: It can serve as a high-performance backend for storing and retrieving model weights across different training or inference services.

Many existing system designs, APIs, and optimizations in Mooncake Store were originally built for KVCache storage. A large portion of these components can be reused for model weight storage. However, certain design assumptions made for KVCache workloads are not ideal for model weight scenarios.

Current Limitations

  1. Lack of a Hard Pin Mechanism

    In most cases, model weights should not be evicted unexpectedly.

    Currently, the system provides a soft pin mechanism. In theory, soft pinning can emulate hard pin behavior using configurations such as:

    --default_kv_soft_pin_ttl=N
    --allow_evict_soft_pinned_objects=false

    where N is a sufficiently large number.

    However, this approach is not user-friendly, and the semantics are indirect. A dedicated hard pin mechanism would provide clearer guarantees and improve usability.

  2. Missing Upsert Interface

    Mooncake Store originally assumes that KVCache entries are immutable, so the system does not provide update semantics.

    In reinforcement learning workflows, however, model weights are updated frequently. The current workaround is using remove → put process to update the same weight object.

    While this approach works, providing a native upsert interface would offer several advantages:

    • Simplifies user workflows
    • Provides a clearer API for weight updates
    • Creates opportunities for future optimization
    • Enables improved fault tolerance for update operations

Discussion

During early discussions, we considered introducing an “RL mode” for the master component. In this mode, the system would:

  • Enable hard pinning by default
  • Support upsert operations
  • Apply other configurations optimized for reinforcement learning workloads

However, we later realized that model weight storage is not limited to reinforcement learning.

For example, in model management systems, different models may have different storage requirements:

  • Important models may require hard pinning
  • Non-important models may use soft pinning
  • Remaining storage space may still be used to cache KVCache objects

Because of these mixed workloads, introducing a fixed mode with predefined behaviors would reduce flexibility.

Instead, we believe a better approach is to introduce these mechanisms as independent features, such as:

  • A native hard pin mechanism
  • A dedicated upsert API

This design allows users to compose behaviors based on their needs, while keeping the system extensible for future requirements such as engram table storage.

Proposed Contributions

We welcome contributions in the following areas:

  1. Hard Pin Support

    • Introduce a hard pin mechanism for objects where the pinned objects cannot be evicted by the eviction policy
  2. Upsert API

    • Implement an upsert interface for updating objects
  3. Documentation and Examples

    • Provide usage examples for model weight storage
    • Document recommended patterns for RL and model management scenarios

Expected Outcome

By introducing these mechanisms, Mooncake Store will better support model weight storage workloads, including:

  • Reinforcement learning training pipelines
  • Model management systems
  • Hybrid environments combining KVCache and model weights

At the same time, this approach preserves the flexibility and extensibility of the system architecture.

How to Contribute

If you are interested in contributing:

  1. Join the discussion in this issue.
  2. Share your design proposals or implementation ideas.
  3. Submit a pull request with your implementation.

We welcome feedback, design discussions, and implementation contributions from the community.

Before submitting a new issue...

  • Make sure you already searched for relevant issues and read the documentation

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions