Skip to content

Commit 3322456

Browse files
authored
[None][doc] Added line about partial reuse (#7846)
Signed-off-by: thorjohnsen <41591019+thorjohnsen@users.noreply.github.com>
1 parent e834f04 commit 3322456

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

docs/source/legacy/advanced/kv-cache-reuse.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -64,7 +64,7 @@ There are a few pitfalls that can prevent kv cache reuse when that seems possibl
6464

6565
Kv cache state for system prompts will remain reusable until memory is needed for launching a new request or propagating an existing one. When this happens, reusable blocks are evicted based on LRU. System prompts that are frequently used have a better chance of remaining reusable, but there is no guarantee since launching new requests take priority over possible reuse. Running with a larger batch size, or larger output sequence lengths for example will reduce the probability of kv cache blocks being reused, since it increases memory needs.
6666

67-
KV cache state is stored in blocks, each block holds multiple tokens. Only full blocks can be shared by multiple requests, thus the block size matters. The block size is a trade-off, larger block size may improve efficiency of compute kernels, but it reduces the likelihood of kv cache state reuse. The block defaults to 128 tokens, this can be changed when the model is built with the trtllm-build command, for example
67+
KV cache state is stored in blocks, each block holds multiple tokens. Only full blocks can be shared by multiple requests, thus the block size matters. Partially matched blocks can also be reused, but that creates a new copy of the block for each sequence. The block size is a trade-off, larger block size may improve efficiency of compute kernels, but it reduces the likelihood of kv cache state reuse. The block defaults to 128 tokens, this can be changed when the model is built with the trtllm-build command, for example
6868

6969
```trtllm-build --tokens_per_block 32 ...```
7070

0 commit comments

Comments
 (0)