Skip to content

Commit b9e685a

Browse files
committed
Remove unrecognized vllm args from gpt-oss-120b
Remove --max-cudagraph-capture-size and --stream-interval which are not recognized by the current vLLM api_server.py.
1 parent 45c047b commit b9e685a

File tree

1 file changed

+0
-2
lines changed

1 file changed

+0
-2
lines changed

small-models.yaml

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -74,12 +74,10 @@ x-gpt-oss-common: &gpt-oss-common
7474
--enable-prefix-caching
7575
--async-scheduling
7676
--max-num-seqs 64
77-
--max-cudagraph-capture-size 2048
7877
--tool-call-parser openai
7978
--enable-auto-tool-choice
8079
--max-model-len 128K
8180
--max-num-batched-tokens 16K
82-
--stream-interval 20
8381
--speculative-config '{"model":"nvidia/gpt-oss-120b-Eagle3-v2","num_speculative_tokens":3,"method":"eagle3","draft_tensor_parallel_size":1}'
8482
--load-format runai_streamer
8583
--model-loader-extra-config '{"distributed":true, "concurrency":48}'

0 commit comments

Comments
 (0)