Skip to content

Commit c16fb5d

Browse files
[Doc] Improve help examples for --compilation-config (vllm-project#16729)
Signed-off-by: DarkLight1337 <[email protected]>
1 parent e37073e commit c16fb5d

File tree

3 files changed

+17
-8
lines changed

3 files changed

+17
-8
lines changed

docs/source/design/v1/torch_compile.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -134,6 +134,6 @@ The cudagraphs are captured and managed by the compiler backend, and replayed wh
134134

135135
By default, vLLM will try to determine a set of sizes to capture cudagraph. You can also override it using the config `cudagraph_capture_sizes`:
136136

137-
`VLLM_USE_V1=1 vllm serve meta-llama/Llama-3.2-1B --compilation_config "{'cudagraph_capture_sizes': [1, 2, 4, 8]}"`
137+
`VLLM_USE_V1=1 vllm serve meta-llama/Llama-3.2-1B --compilation-config "{'cudagraph_capture_sizes': [1, 2, 4, 8]}"`
138138

139139
Then it will only capture cudagraph for the specified sizes. It can be useful to have fine-grained control over the cudagraph capture.

tests/engine/test_arg_utils.py

Lines changed: 12 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -53,12 +53,20 @@ def test_compilation_config():
5353
assert args.compilation_config.level == 3
5454

5555
# set to string form of a dict
56-
args = parser.parse_args(["--compilation-config", "{'level': 3}"])
57-
assert args.compilation_config.level == 3
56+
args = parser.parse_args([
57+
"--compilation-config",
58+
"{'level': 3, 'cudagraph_capture_sizes': [1, 2, 4, 8]}",
59+
])
60+
assert (args.compilation_config.level == 3 and
61+
args.compilation_config.cudagraph_capture_sizes == [1, 2, 4, 8])
5862

5963
# set to string form of a dict
60-
args = parser.parse_args(["--compilation-config={'level': 3}"])
61-
assert args.compilation_config.level == 3
64+
args = parser.parse_args([
65+
"--compilation-config="
66+
"{'level': 3, 'cudagraph_capture_sizes': [1, 2, 4, 8]}",
67+
])
68+
assert (args.compilation_config.level == 3 and
69+
args.compilation_config.cudagraph_capture_sizes == [1, 2, 4, 8])
6270

6371

6472
def test_prefix_cache_default():

vllm/engine/arg_utils.py

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -939,10 +939,11 @@ def get_kwargs(cls: type[Config]) -> dict[str, Any]:
939939
'testing only. level 3 is the recommended level '
940940
'for production.\n'
941941
'To specify the full compilation config, '
942-
'use a JSON string.\n'
942+
'use a JSON string, e.g. ``{"level": 3, '
943+
'"cudagraph_capture_sizes": [1, 2, 4, 8]}``\n'
943944
'Following the convention of traditional '
944-
'compilers, using -O without space is also '
945-
'supported. -O3 is equivalent to -O 3.')
945+
'compilers, using ``-O`` without space is also '
946+
'supported. ``-O3`` is equivalent to ``-O 3``.')
946947

947948
parser.add_argument('--kv-transfer-config',
948949
type=KVTransferConfig.from_cli,

0 commit comments

Comments
 (0)