Skip to content

Commit bc09379

Browse files
PierreLeGuenclaude
andcommitted
fix: use H200 default chunked-prefill-size of 8192
4096 was below SGLang's auto-detected default for H200 GPUs (<160GB), which unnecessarily limited prefill throughput. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 2be0dac commit bc09379

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

Qwen3.5-122B.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -54,7 +54,7 @@ x-sglang-qwen35-122b-common: &sglang-qwen35-122b-common
5454
--mem-fraction-static 0.88
5555
--context-length 262144
5656
--kv-cache-dtype fp8_e4m3
57-
--chunked-prefill-size 4096
57+
--chunked-prefill-size 8192
5858
--attention-backend flashinfer
5959
--schedule-conservativeness 0.5
6060
--reasoning-parser qwen3

0 commit comments

Comments
 (0)