Skip to content

Commit 65df760

Browse files
committed
Remove MTP and LMCache for GLM.
1 parent b0910f0 commit 65df760

File tree

1 file changed

+1
-5
lines changed

1 file changed

+1
-5
lines changed

GLM-4.7.yaml

Lines changed: 1 addition & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -75,8 +75,6 @@ services:
7575
command: >
7676
zai-org/GLM-4.7
7777
--tensor-parallel-size 8
78-
--speculative-config '{"method":"mtp","num_speculative_tokens":1}'
79-
--kv-transfer-config '{"kv_connector":"LMCacheConnectorV1","kv_role":"kv_both"}'
8078
--max-model-len 128K
8179
--max-num-batched-tokens 32K
8280
--max-num-seqs 128
@@ -95,9 +93,7 @@ services:
9593
- NCCL_DEBUG=INFO
9694
- VLLM_CACHE_ROOT=/root/.cache/vllm
9795
- TORCH_FLOAT32_MATMUL_PRECISION=high
98-
- LMCACHE_CHUNK_SIZE=256
99-
- LMCACHE_LOCAL_CPU=True
100-
- LMCACHE_MAX_LOCAL_CPU_SIZE=100
96+
- LMCACHE_LOCAL_CPU=False
10197
- PYTHONHASHSEED=0
10298
- VLLM_RPC_TIMEOUT=60000
10399
deploy:

0 commit comments

Comments
 (0)