We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
There was an error while loading. Please reload this page.
1 parent c16fb5d commit 7a4a5deCopy full SHA for 7a4a5de
examples/offline_inference/cpu_offload_lmcache.py
@@ -37,11 +37,11 @@ def build_llm_with_lmcache():
37
'{"kv_connector":"LMCacheConnector", "kv_role":"kv_both"}')
38
# Set GPU memory utilization to 0.8 for an A40 GPU with 40GB
39
# memory. Reduce the value if your GPU has less memory.
40
- # Note that LMCache is not compatible with chunked prefill for now.
+ # Note: LMCache supports chunked prefill (see vLLM#14505, LMCache#392).
41
llm = LLM(model="mistralai/Mistral-7B-Instruct-v0.2",
42
kv_transfer_config=ktc,
43
max_model_len=8000,
44
- enable_chunked_prefill=False,
+ enable_chunked_prefill=True,
45
gpu_memory_utilization=0.8)
46
47
try:
0 commit comments