Skip to content

Commit de67e9f

Browse files
yiliu30hmellor
andauthored
Update _posts/2025-12-03-intel-autoround-llmc.md
Co-authored-by: Harry Mellor <[email protected]> Signed-off-by: Yi Liu <[email protected]>
1 parent e66caf6 commit de67e9f

File tree

1 file changed

+2
-4
lines changed

1 file changed

+2
-4
lines changed

_posts/2025-12-03-intel-autoround-llmc.md

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -116,10 +116,8 @@ Once quantization is complete, the same compressed model can be served on differ
116116
```bash
117117
vllm serve Qwen3-8B-W4A16-G128-AutoRound \
118118
--dtype=bfloat16 \
119-
--enforce-eager \
120-
--gpu-memory-util=0.8 \
121-
--no-enable-prefix-caching \
122-
--max-num-batched-tokens=8192
119+
--gpu-memory-utilization 0.8 \
120+
--max-num-batched-tokens 8192
123121
```
124122

125123
Note: please install vLLM from this PR https://github.com/vllm-project/vllm/pull/29484/

0 commit comments

Comments
 (0)