Skip to content

Commit 2cb386e

Browse files
authored
Super tiny enable draft-weights-cpu-backup to avoid MTP acc len issue (#971)
1 parent c66ff38 commit 2cb386e

File tree

4 files changed

+4
-0
lines changed

4 files changed

+4
-0
lines changed

docs/en/advanced/speculative-decoding.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@ For models with MTP layers (e.g., GLM-4.6, DeepSeek-V3/R1), simply add:
1111
--sglang-speculative-num-steps 3
1212
--sglang-speculative-eagle-topk 1
1313
--sglang-speculative-num-draft-tokens 4
14+
--sglang-enable-draft-weights-cpu-backup
1415
```
1516

1617
If you want to use a separately trained draft model (e.g., one trained with [SpecForge](https://docs.sglang.ai/SpecForge/)), also set:

scripts/run-glm4.5-355B-A32B.sh

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -122,6 +122,7 @@ SGLANG_ARGS=(
122122
--sglang-speculative-num-steps 1
123123
--sglang-speculative-eagle-topk 1
124124
--sglang-speculative-num-draft-tokens 2
125+
--sglang-enable-draft-weights-cpu-backup
125126

126127
)
127128

scripts/run-mimo-7B-rl-eagle.sh

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -113,6 +113,7 @@ SGLANG_ARGS=(
113113
--sglang-speculative-num-steps 3
114114
--sglang-speculative-eagle-topk 1
115115
--sglang-speculative-num-draft-tokens 4
116+
--sglang-enable-draft-weights-cpu-backup
116117
)
117118

118119
MISC_ARGS=(

scripts/run-qwen3-next-80B-A3B.sh

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -129,6 +129,7 @@ SGLANG_ARGS=(
129129
--sglang-speculative-num-steps 2
130130
--sglang-speculative-eagle-topk 1
131131
--sglang-speculative-num-draft-tokens 3
132+
--sglang-enable-draft-weights-cpu-backup
132133

133134
--sglang-max-running-requests 512
134135
)

0 commit comments

Comments
 (0)