Update _posts/2025-12-03-intel-autoround-llmc.md

yiliu30 · hmellor · web-flow · commit de67e9fbdbba · 2025-12-04T21:34:36.000+08:00
Co-authored-by: Harry Mellor &lt;19981378+hmellor@users.noreply.github.com&gt;
Signed-off-by: Yi Liu &lt;yi4.liu@intel.com&gt;
diff --git a/_posts/2025-12-03-intel-autoround-llmc.md b/_posts/2025-12-03-intel-autoround-llmc.md
@@ -116,10 +116,8 @@ Once quantization is complete, the same compressed model can be served on differ
 ```bash
 vllm serve Qwen3-8B-W4A16-G128-AutoRound \
     --dtype=bfloat16 \
-    --enforce-eager \
-    --gpu-memory-util=0.8 \
-    --no-enable-prefix-caching \
-    --max-num-batched-tokens=8192 
+    --gpu-memory-utilization 0.8 \
+    --max-num-batched-tokens 8192 
 ```
 
 Note: please install vLLM from this PR https://github.com/vllm-project/vllm/pull/29484/