-
Notifications
You must be signed in to change notification settings - Fork 604
Description
Changes proposed
Summary
Extend Mooncake Store to support explicit data type classification for stored objects (KVCache, Tensor, Weight, Sample, etc.), enabling type-aware optimizations for storage, eviction, replication, and retrieval policies.
Motivation
Currently, Mooncake Store treats all objects as opaque byte blobs without semantic understanding of their content. This limits optimization opportunities:
- Eviction Policy: KVCache blocks have different access patterns than model weights
- Replication Strategy: Weights benefit from higher replication, samples may need lower
- Compression: Different data types have different compression characteristics
- Prefetching: Type-aware prefetching can improve cache hit rates
- Monitoring: Type-specific metrics enable better observability
- Storage Tiering: Critical types (weights) stay in faster storage tiers
Design Goals
- Backward Compatibility: Existing clients continue working without modification
- Minimal Overhead: Type metadata adds <1% storage overhead
- Extensibility: Easy to add new data types without breaking changes
- Type-Aware Policies: Enable per-type eviction, replication, and allocation strategies
Proposed Design
1. Data Type Enumeration
Add a new enum in mooncake-store/include/types.h:
enum class ObjectDataType : uint8_t {
UNKNOWN = 0, // Default for backward compatibility
KVCACHE = 1, // KV cache blocks for LLM inference
TENSOR = 2, // General tensor data
WEIGHT = 3, // Model weights/parameters
SAMPLE = 4, // Training samples or prompts
ACTIVATION = 5, // Intermediate activations
GRADIENT = 6, // Gradient tensors
OPTIMIZER_STATE = 7, // Optimizer state (momentum, etc.)
METADATA = 8, // Model metadata, configs
// Reserved 9-255 for future types
};2. Extended ReplicateConfig
Extend ReplicateConfig in mooncake-store/include/replica.h:
struct ReplicateConfig {
size_t replica_num{1};
bool with_soft_pin{false};
std::vector<std::string> preferred_segments{};
std::string preferred_segment{};
bool prefer_alloc_in_same_node{false};
// NEW: Data type specification
ObjectDataType data_type{ObjectDataType::UNKNOWN};
// NEW: Type-specific hints (optional)
std::unordered_map<std::string, std::string> type_hints{};
};3. Object Metadata Extension
Extend object metadata to store type information:
struct ObjectMetadata {
ObjectKey key;
Version version;
uint64_t size;
ObjectDataType data_type; // NEW
uint64_t created_at_ms;
uint64_t last_accessed_at_ms;
// ... existing fields
};4. Type-Aware Eviction Strategy
Create new eviction strategy in mooncake-store/include/eviction_strategy.h:
class TypeAwareEvictionStrategy : public EvictionStrategy {
public:
struct TypePolicy {
double eviction_priority; // Lower = harder to evict
uint64_t min_ttl_ms; // Minimum time before eviction
bool allow_eviction;
};
void set_type_policy(ObjectDataType type, TypePolicy policy);
std::vector<ObjectKey> select_victims(size_t target_bytes) override;
};Default policies:
WEIGHT: priority=0.1 (rarely evict), min_ttl=3600sKVCACHE: priority=0.5 (moderate), min_ttl=60sSAMPLE: priority=0.9 (evict first), min_ttl=10sUNKNOWN: priority=0.5 (default behavior)
5. API Changes
C++ API (backward compatible)
// Existing API continues to work (data_type defaults to UNKNOWN)
int put(const std::string& key, std::span<const char> value,
const ReplicateConfig& config = ReplicateConfig{});
// New overload with explicit type
int put_typed(const std::string& key, std::span<const char> value,
ObjectDataType data_type,
const ReplicateConfig& config = ReplicateConfig{});Python API
# Existing API (backward compatible)
client.put(key, value)
# New typed API
from mooncake import ObjectDataType
client.put(key, value, data_type=ObjectDataType.KVCACHE)
client.put(key, value, data_type=ObjectDataType.WEIGHT,
config=ReplicateConfig(replica_num=3))6. Type-Aware Allocation
Extend AllocationStrategy to consider data types:
class TypeAwareAllocationStrategy : public AllocationStrategy {
// Allocate WEIGHT to high-performance segments
// Allocate SAMPLE to cost-effective segments
SegmentId select_segment(size_t size, ObjectDataType type,
const std::vector<std::string>& preferred) override;
};7. Monitoring & Metrics
Add per-type metrics:
store.objects.count{type=KVCACHE}store.objects.bytes{type=WEIGHT}store.evictions.count{type=SAMPLE}store.cache_hit_rate{type=TENSOR}
Implementation Plan
Phase 1: Core Infrastructure (Week 1-2)
- Add
ObjectDataTypeenum to types.h - Extend
ReplicateConfigwith data_type field - Update object metadata structures
- Add serialization/deserialization for new fields
- Ensure backward compatibility with existing metadata
Phase 2: Storage & Retrieval (Week 3-4)
- Update
put()family to accept and store data type - Update
get()family to return data type (optional) - Add
put_typed()convenience methods - Update master service to track type metadata
- Add migration path for existing objects (default to UNKNOWN)
Phase 3: Type-Aware Policies (Week 5-6)
- Implement
TypeAwareEvictionStrategy - Implement
TypeAwareAllocationStrategy - Add configuration for per-type policies
- Add type-based replication policies
Phase 4: Python Bindings & Testing (Week 7-8)
- Expose
ObjectDataTypeto Python - Update pybind11 bindings
- Add unit tests for all type-aware features
- Add integration tests with vLLM/SGLang
- Performance benchmarks
Phase 5: Monitoring & Documentation (Week 9-10)
- Add per-type metrics collection
- Update Prometheus exporters
- Write user documentation
- Create migration guide
- Add examples for each data type
Backward Compatibility
- Default Behavior: Objects without explicit type use
UNKNOWN, treated with default policies - Metadata Migration: Existing objects automatically tagged as
UNKNOWNon first access - API Compatibility: All existing
put()/get()calls work unchanged - Wire Protocol: Type field optional in RPC, defaults to
UNKNOWNif missing
Performance Considerations
- Storage Overhead: 1 byte per object for type enum (~0.001% for 1MB objects)
- CPU Overhead: Single enum comparison in hot paths (<1ns)
- Memory Overhead: Type-to-policy map cached in memory (~1KB)
- Network Overhead: 1 byte added to RPC messages (negligible)
Alternatives Considered
Alternative 1: Key Prefix Convention
Use key prefixes like kvcache:, weight: to infer types.
Rejected: Fragile, requires parsing, breaks existing keys, no enforcement.
Alternative 2: Separate Stores per Type
Create separate store instances for each data type.
Rejected: Resource fragmentation, complex management, no unified view.
Alternative 3: External Type Registry
Store type mappings in external service (etcd).
Rejected: Extra network hop, consistency challenges, single point of failure.
Open Questions
- Type Inference: Should we support automatic type detection from tensor metadata?
- Type Conversion: Allow changing object type after creation?
- Composite Types: Support objects containing multiple types (e.g., checkpoint = weights + optimizer state)?
- Type Hierarchies: Should WEIGHT be a subtype of TENSOR?
Success Metrics
- Adoption: >50% of objects tagged with explicit types within 3 months
- Performance: 10-20% improvement in cache hit rate for typed workloads
- Eviction Quality: 30% reduction in premature weight evictions
- Compatibility: Zero breaking changes for existing deployments
References
Before submitting a new issue...
- Make sure you already searched for relevant issues and read the documentation