1- ---
2- toc_depth : 4
3- ---
4-
51# vLLM CLI Guide
62
73The vllm command-line tool is used to run and manage vLLM models. You can start by viewing the help message with:
@@ -16,52 +12,48 @@ Available Commands:
1612vllm {chat,complete,serve,bench,collect-env,run-batch}
1713```
1814
19- When passing JSON CLI arguments, the following sets of arguments are equivalent:
20-
21- - ` --json-arg '{"key1": "value1", "key2": {"key3": "value2"}}' `
22- - ` --json-arg.key1 value1 --json-arg.key2.key3 value2 `
23-
24- Additionally, list elements can be passed individually using ` + ` :
15+ ## serve
2516
26- - ` --json-arg '{"key4": ["value3", "value4", "value5"]}' `
27- - ` --json-arg.key4+ value3 --json-arg.key4+='value4,value5' `
17+ Starts the vLLM OpenAI Compatible API server.
2818
29- ## serve
19+ Start with a model:
3020
31- Start the vLLM OpenAI Compatible API server.
21+ ``` bash
22+ vllm serve meta-llama/Llama-2-7b-hf
23+ ```
3224
33- ??? console "Examples"
25+ Specify the port:
3426
35- ```bash
36- # Start with a model
37- vllm serve meta-llama/Llama-2-7b-hf
27+ ``` bash
28+ vllm serve meta-llama/Llama-2-7b-hf --port 8100
29+ ```
3830
39- # Specify the port
40- vllm serve meta-llama/Llama-2-7b-hf --port 8100
31+ Serve over a Unix domain socket:
4132
42- # Serve over a Unix domain socket
43- vllm serve meta-llama/Llama-2-7b-hf --uds /tmp/vllm.sock
33+ ``` bash
34+ vllm serve meta-llama/Llama-2-7b-hf --uds /tmp/vllm.sock
35+ ```
4436
45- # Check with --help for more options
46- # To list all groups
47- vllm serve --help=listgroup
37+ Check with --help for more options:
4838
49- # To view a argument group
50- vllm serve --help=ModelConfig
39+ ``` bash
40+ # To list all groups
41+ vllm serve --help=listgroup
5142
52- # To view a single argument
53- vllm serve --help=max-num-seqs
43+ # To view a argument group
44+ vllm serve --help=ModelConfig
5445
55- # To search by keyword
56- vllm serve --help=max
46+ # To view a single argument
47+ vllm serve --help=max-num-seqs
5748
58- # To view full help with pager (less/more)
59- vllm serve --help=page
60- ```
49+ # To search by keyword
50+ vllm serve --help=max
6151
62- ### Options
52+ # To view full help with pager (less/more)
53+ vllm serve --help=page
54+ ```
6355
64- --8<-- "docs/argparse/ serve.md"
56+ See [ vllm serve ] ( ./ serve.md) for the full reference of all available arguments.
6557
6658## chat
6759
@@ -78,6 +70,8 @@ vllm chat --url http://{vllm-serve-host}:{vllm-serve-port}/v1
7870vllm chat --quick " hi"
7971```
8072
73+ See [ vllm chat] ( ./chat.md ) for the full reference of all available arguments.
74+
8175## complete
8276
8377Generate text completions based on the given prompt via the running API server.
@@ -93,7 +87,7 @@ vllm complete --url http://{vllm-serve-host}:{vllm-serve-port}/v1
9387vllm complete --quick " The future of AI is"
9488```
9589
96- </ details >
90+ See [ vllm complete ] ( ./complete.md ) for the full reference of all available arguments.
9791
9892## bench
9993
@@ -120,6 +114,8 @@ vllm bench latency \
120114 --load-format dummy
121115```
122116
117+ See [ vllm bench latency] ( ./bench/latency.md ) for the full reference of all available arguments.
118+
123119### serve
124120
125121Benchmark the online serving throughput.
@@ -134,6 +130,8 @@ vllm bench serve \
134130 --num-prompts 5
135131```
136132
133+ See [ vllm bench serve] ( ./bench/serve.md ) for the full reference of all available arguments.
134+
137135### throughput
138136
139137Benchmark offline inference throughput.
@@ -147,6 +145,8 @@ vllm bench throughput \
147145 --load-format dummy
148146```
149147
148+ See [ vllm bench throughput] ( ./bench/throughput.md ) for the full reference of all available arguments.
149+
150150## collect-env
151151
152152Start collecting environment information.
@@ -159,24 +159,25 @@ vllm collect-env
159159
160160Run batch prompts and write results to file.
161161
162- <details >
163- <summary >Examples</summary >
162+ Running with a local file:
164163
165164``` bash
166- # Running with a local file
167165vllm run-batch \
168166 -i offline_inference/openai_batch/openai_example_batch.jsonl \
169167 -o results.jsonl \
170168 --model meta-llama/Meta-Llama-3-8B-Instruct
169+ ```
171170
172- # Using remote file
171+ Using remote file:
172+
173+ ``` bash
173174vllm run-batch \
174175 -i https://raw.githubusercontent.com/vllm-project/vllm/main/examples/offline_inference/openai_batch/openai_example_batch.jsonl \
175176 -o results.jsonl \
176177 --model meta-llama/Meta-Llama-3-8B-Instruct
177178```
178179
179- </ details >
180+ See [ vllm run-batch ] ( ./run-batch.md ) for the full reference of all available arguments.
180181
181182## More Help
182183
0 commit comments