Correctness Considerations Introduced by HiCache #9389

soyail · 2025-08-20T07:53:39Z

soyail
Aug 20, 2025

Over the past month, SGLang has introduced multi-level KV cache storage along with high-performance distributed storage systems such as 3FS. I truly appreciate these efforts and the improvements they bring.

However, while reviewing the code this morning, I noticed a potential correctness issue. For an input token, its embedding is denoted as 𝑋 and the first-layer projections are computed as:
Q = W_q * X, K = W_kX, V = W_vX
In deeper layers, however, the inputs are no longer the embeddings but the hidden states produced by the previous layer. These hidden states are influenced by other tokens in the same request. This raises a concern: if the context of a token changes, the KV values in subsequent layers will also change, rather than remaining constant.

Given this, I would like to ask: when reusing KV cache across different requests, could this context dependence potentially lead to correctness issues?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Correctness Considerations Introduced by HiCache #9389

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Correctness Considerations Introduced by HiCache #9389

Uh oh!

soyail Aug 20, 2025

Replies: 0 comments

soyail
Aug 20, 2025