Skip to content

Commit 7a4a5de

Browse files
[Misc] Update outdated note: LMCache now supports chunked prefill (#16697)
Signed-off-by: chaunceyjiang <[email protected]>
1 parent c16fb5d commit 7a4a5de

File tree

1 file changed

+2
-2
lines changed

1 file changed

+2
-2
lines changed

examples/offline_inference/cpu_offload_lmcache.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -37,11 +37,11 @@ def build_llm_with_lmcache():
3737
'{"kv_connector":"LMCacheConnector", "kv_role":"kv_both"}')
3838
# Set GPU memory utilization to 0.8 for an A40 GPU with 40GB
3939
# memory. Reduce the value if your GPU has less memory.
40-
# Note that LMCache is not compatible with chunked prefill for now.
40+
# Note: LMCache supports chunked prefill (see vLLM#14505, LMCache#392).
4141
llm = LLM(model="mistralai/Mistral-7B-Instruct-v0.2",
4242
kv_transfer_config=ktc,
4343
max_model_len=8000,
44-
enable_chunked_prefill=False,
44+
enable_chunked_prefill=True,
4545
gpu_memory_utilization=0.8)
4646

4747
try:

0 commit comments

Comments
 (0)