You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It looks like the current Mooncake store does not have awareness of layer granularity, chunk order (earlier vs. later chunks), or TP ranks. This can cause problems during cache eviction:
The store might evict KVs for only some layers, forcing prefill to recompute all layer KVs for those tokens.
The store might evict the KVs corresponding to the earliest chunk(s) of a token sequence, forcing prefill to recompute KVs for all chunks/tokens.
The store might evict KVs produced by some TP ranks, forcing all TP ranks to recompute KVs.
I think the Mooncake store should be aware of layers, chunk ordering, and TP. On eviction, it should evict a token’s KV caches for all layers and for all TP ranks together (i.e., keep KV state complete per token).
From the chunk-order perspective, eviction policies should prefer removing KVs for later chunks first rather than evicting early-chunk KVs that make prefill expensive.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
It looks like the current Mooncake store does not have awareness of layer granularity, chunk order (earlier vs. later chunks), or TP ranks. This can cause problems during cache eviction:
I think the Mooncake store should be aware of layers, chunk ordering, and TP. On eviction, it should evict a token’s KV caches for all layers and for all TP ranks together (i.e., keep KV state complete per token).
From the chunk-order perspective, eviction policies should prefer removing KVs for later chunks first rather than evicting early-chunk KVs that make prefill expensive.
Beta Was this translation helpful? Give feedback.
All reactions