Skip to content

Commit d14cb7c

Browse files
authored
Updating doc for enabling prefix caching (#489)
Signed-off-by: YuhanLiu11 <yliu738@wisc.edu>
1 parent 928c08c commit d14cb7c

File tree

8 files changed

+13
-22
lines changed

8 files changed

+13
-22
lines changed

docs/source/tutorials/disagg.rst

Lines changed: 3 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -64,7 +64,7 @@ The router coordinates between the prefill and decode servers, handling request
6464
python3 -m vllm_router.app --port 8005 \
6565
--service-discovery static \
6666
--static-backends "http://localhost:8100,http://localhost:8200" \
67-
--static-models "meta-llama/Llama-3.1-8B-Instruct,meta-llama/Llama-3.1-70B-Instruct" \
67+
--static-models "meta-llama/Llama-3.1-8B-Instruct,meta-llama/Llama-3.1-8B-Instruct" \
6868
--static-model-labels "llama-prefill,llama-decode" \
6969
--log-stats \
7070
--log-stats-interval 10 \
@@ -134,8 +134,7 @@ Create a configuration file ``values-16-disagg-prefill.yaml`` with the following
134134
# requestGPU: 1
135135
pvcStorage: "50Gi"
136136
vllmConfig:
137-
enableChunkedPrefill: false
138-
enablePrefixCaching: false
137+
enablePrefixCaching: true
139138
maxModelLen: 32000
140139
v1: 1
141140
gpuMemoryUtilization: 0.6
@@ -166,8 +165,7 @@ Create a configuration file ``values-16-disagg-prefill.yaml`` with the following
166165
# requestGPU: 1
167166
pvcStorage: "50Gi"
168167
vllmConfig:
169-
enableChunkedPrefill: false
170-
enablePrefixCaching: false
168+
enablePrefixCaching: true
171169
maxModelLen: 32000
172170
v1: 1
173171
lmcacheConfig:

docs/source/tutorials/kv_cache.rst

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -36,9 +36,8 @@ Locate the file ``tutorials/assets/values-05-cpu-offloading.yaml`` with the foll
3636
requestGPU: 1
3737
pvcStorage: "50Gi"
3838
vllmConfig:
39-
enableChunkedPrefill: false
40-
enablePrefixCaching: false
41-
maxModelLen: 16384
39+
enablePrefixCaching: true
40+
maxModelLen: 16384
4241
4342
lmcacheConfig:
4443
enabled: true

tutorials/assets/values-05-cpu-offloading.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@ servingEngineSpec:
1616
enableChunkedPrefill: false
1717
enablePrefixCaching: false
1818
maxModelLen: 16384
19+
v1: 1
1920

2021
lmcacheConfig:
2122
enabled: true

tutorials/assets/values-06-shared-storage.yaml

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -11,8 +11,7 @@ servingEngineSpec:
1111
requestGPU: 1
1212
pvcStorage: "50Gi"
1313
vllmConfig:
14-
enableChunkedPrefill: false
15-
enablePrefixCaching: false
14+
enablePrefixCaching: true
1615
maxModelLen: 16384
1716
v1: 1
1817

tutorials/assets/values-14-vllm-v1.yaml

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -16,8 +16,7 @@ servingEngineSpec:
1616
- ReadWriteOnce
1717

1818
vllmConfig:
19-
enableChunkedPrefill: false
20-
enablePrefixCaching: false
19+
enablePrefixCaching: true
2120
maxModelLen: 4096
2221
dtype: "bfloat16"
2322
v1: 1

tutorials/assets/values-16-disagg-prefill.yaml

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -15,8 +15,7 @@ servingEngineSpec:
1515
# requestGPU: 1
1616
pvcStorage: "50Gi"
1717
vllmConfig:
18-
enableChunkedPrefill: false
19-
enablePrefixCaching: false
18+
enablePrefixCaching: true
2019
maxModelLen: 32000
2120
v1: 1
2221
gpuMemoryUtilization: 0.6
@@ -47,8 +46,7 @@ servingEngineSpec:
4746
# requestGPU: 1
4847
pvcStorage: "50Gi"
4948
vllmConfig:
50-
enableChunkedPrefill: false
51-
enablePrefixCaching: false
49+
enablePrefixCaching: true
5250
maxModelLen: 32000
5351
v1: 1
5452
lmcacheConfig:

tutorials/assets/values-17-kv-aware.yaml

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -11,8 +11,7 @@ servingEngineSpec:
1111
requestGPU: 1
1212
pvcStorage: "50Gi"
1313
vllmConfig:
14-
enableChunkedPrefill: false
15-
enablePrefixCaching: false
14+
enablePrefixCaching: true
1615
maxModelLen: 16384
1716
v1: 1
1817

@@ -38,8 +37,7 @@ servingEngineSpec:
3837
requestGPU: 1
3938
pvcStorage: "50Gi"
4039
vllmConfig:
41-
enableChunkedPrefill: false
42-
enablePrefixCaching: false
40+
enablePrefixCaching: true
4341
maxModelLen: 16384
4442
v1: 1
4543

tutorials/assets/values-18-prefix-aware.yaml

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -11,8 +11,7 @@ servingEngineSpec:
1111
requestGPU: 1
1212
pvcStorage: "50Gi"
1313
vllmConfig:
14-
enableChunkedPrefill: false
15-
enablePrefixCaching: false
14+
enablePrefixCaching: true
1615
maxModelLen: 16384
1716
v1: 1
1817

0 commit comments

Comments
 (0)