You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Over the past month, SGLang has introduced multi-level KV cache storage along with high-performance distributed storage systems such as 3FS. I truly appreciate these efforts and the improvements they bring.
However, while reviewing the code this morning, I noticed a potential correctness issue. For an input token, its embedding is denoted as 𝑋 and the first-layer projections are computed as:
Q = W_q * X, K = W_kX, V = W_vX
In deeper layers, however, the inputs are no longer the embeddings but the hidden states produced by the previous layer. These hidden states are influenced by other tokens in the same request. This raises a concern: if the context of a token changes, the KV values in subsequent layers will also change, rather than remaining constant.
Given this, I would like to ask: when reusing KV cache across different requests, could this context dependence potentially lead to correctness issues?
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
Over the past month, SGLang has introduced multi-level KV cache storage along with high-performance distributed storage systems such as 3FS. I truly appreciate these efforts and the improvements they bring.
However, while reviewing the code this morning, I noticed a potential correctness issue. For an input token, its embedding is denoted as 𝑋 and the first-layer projections are computed as:
Q = W_q * X, K = W_kX, V = W_vX
In deeper layers, however, the inputs are no longer the embeddings but the hidden states produced by the previous layer. These hidden states are influenced by other tokens in the same request. This raises a concern: if the context of a token changes, the KV values in subsequent layers will also change, rather than remaining constant.
Given this, I would like to ask: when reusing KV cache across different requests, could this context dependence potentially lead to correctness issues?
Beta Was this translation helpful? Give feedback.
All reactions