Skip to content

Commit b4c9c69

Browse files
hmellorjingyu
authored andcommitted
[Docs] Add comprehensive CLI reference for all large vllm subcommands (vllm-project#22601)
Signed-off-by: Harry Mellor <[email protected]> Signed-off-by: jingyu <[email protected]>
1 parent 33ee436 commit b4c9c69

File tree

20 files changed

+199
-104
lines changed

20 files changed

+199
-104
lines changed

docs/.nav.yml

Lines changed: 4 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ nav:
1111
- Quick Links:
1212
- User Guide: usage/README.md
1313
- Developer Guide: contributing/README.md
14-
- API Reference: api/summary.md
14+
- API Reference: api/README.md
1515
- CLI Reference: cli/README.md
1616
- Timeline:
1717
- Roadmap: https://roadmap.vllm.ai
@@ -58,11 +58,9 @@ nav:
5858
- CI: contributing/ci
5959
- Design Documents: design
6060
- API Reference:
61-
- Summary: api/summary.md
62-
- Contents:
63-
- api/vllm/*
64-
- CLI Reference:
65-
- Summary: cli/README.md
61+
- api/README.md
62+
- api/vllm/*
63+
- CLI Reference: cli
6664
- Community:
6765
- community/*
6866
- Blog: https://blog.vllm.ai
File renamed without changes.

docs/cli/.meta.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
toc_depth: 3

docs/cli/.nav.yml

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
nav:
2+
- README.md
3+
- serve.md
4+
- chat.md
5+
- complete.md
6+
- run-batch.md
7+
- vllm bench:
8+
- bench/*.md

docs/cli/README.md

Lines changed: 43 additions & 42 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,3 @@
1-
---
2-
toc_depth: 4
3-
---
4-
51
# vLLM CLI Guide
62

73
The vllm command-line tool is used to run and manage vLLM models. You can start by viewing the help message with:
@@ -16,52 +12,48 @@ Available Commands:
1612
vllm {chat,complete,serve,bench,collect-env,run-batch}
1713
```
1814

19-
When passing JSON CLI arguments, the following sets of arguments are equivalent:
20-
21-
- `--json-arg '{"key1": "value1", "key2": {"key3": "value2"}}'`
22-
- `--json-arg.key1 value1 --json-arg.key2.key3 value2`
23-
24-
Additionally, list elements can be passed individually using `+`:
15+
## serve
2516

26-
- `--json-arg '{"key4": ["value3", "value4", "value5"]}'`
27-
- `--json-arg.key4+ value3 --json-arg.key4+='value4,value5'`
17+
Starts the vLLM OpenAI Compatible API server.
2818

29-
## serve
19+
Start with a model:
3020

31-
Start the vLLM OpenAI Compatible API server.
21+
```bash
22+
vllm serve meta-llama/Llama-2-7b-hf
23+
```
3224

33-
??? console "Examples"
25+
Specify the port:
3426

35-
```bash
36-
# Start with a model
37-
vllm serve meta-llama/Llama-2-7b-hf
27+
```bash
28+
vllm serve meta-llama/Llama-2-7b-hf --port 8100
29+
```
3830

39-
# Specify the port
40-
vllm serve meta-llama/Llama-2-7b-hf --port 8100
31+
Serve over a Unix domain socket:
4132

42-
# Serve over a Unix domain socket
43-
vllm serve meta-llama/Llama-2-7b-hf --uds /tmp/vllm.sock
33+
```bash
34+
vllm serve meta-llama/Llama-2-7b-hf --uds /tmp/vllm.sock
35+
```
4436

45-
# Check with --help for more options
46-
# To list all groups
47-
vllm serve --help=listgroup
37+
Check with --help for more options:
4838

49-
# To view a argument group
50-
vllm serve --help=ModelConfig
39+
```bash
40+
# To list all groups
41+
vllm serve --help=listgroup
5142

52-
# To view a single argument
53-
vllm serve --help=max-num-seqs
43+
# To view a argument group
44+
vllm serve --help=ModelConfig
5445

55-
# To search by keyword
56-
vllm serve --help=max
46+
# To view a single argument
47+
vllm serve --help=max-num-seqs
5748

58-
# To view full help with pager (less/more)
59-
vllm serve --help=page
60-
```
49+
# To search by keyword
50+
vllm serve --help=max
6151

62-
### Options
52+
# To view full help with pager (less/more)
53+
vllm serve --help=page
54+
```
6355

64-
--8<-- "docs/argparse/serve.md"
56+
See [vllm serve](./serve.md) for the full reference of all available arguments.
6557

6658
## chat
6759

@@ -78,6 +70,8 @@ vllm chat --url http://{vllm-serve-host}:{vllm-serve-port}/v1
7870
vllm chat --quick "hi"
7971
```
8072

73+
See [vllm chat](./chat.md) for the full reference of all available arguments.
74+
8175
## complete
8276

8377
Generate text completions based on the given prompt via the running API server.
@@ -93,7 +87,7 @@ vllm complete --url http://{vllm-serve-host}:{vllm-serve-port}/v1
9387
vllm complete --quick "The future of AI is"
9488
```
9589

96-
</details>
90+
See [vllm complete](./complete.md) for the full reference of all available arguments.
9791

9892
## bench
9993

@@ -120,6 +114,8 @@ vllm bench latency \
120114
--load-format dummy
121115
```
122116

117+
See [vllm bench latency](./bench/latency.md) for the full reference of all available arguments.
118+
123119
### serve
124120

125121
Benchmark the online serving throughput.
@@ -134,6 +130,8 @@ vllm bench serve \
134130
--num-prompts 5
135131
```
136132

133+
See [vllm bench serve](./bench/serve.md) for the full reference of all available arguments.
134+
137135
### throughput
138136

139137
Benchmark offline inference throughput.
@@ -147,6 +145,8 @@ vllm bench throughput \
147145
--load-format dummy
148146
```
149147

148+
See [vllm bench throughput](./bench/throughput.md) for the full reference of all available arguments.
149+
150150
## collect-env
151151

152152
Start collecting environment information.
@@ -159,24 +159,25 @@ vllm collect-env
159159

160160
Run batch prompts and write results to file.
161161

162-
<details>
163-
<summary>Examples</summary>
162+
Running with a local file:
164163

165164
```bash
166-
# Running with a local file
167165
vllm run-batch \
168166
-i offline_inference/openai_batch/openai_example_batch.jsonl \
169167
-o results.jsonl \
170168
--model meta-llama/Meta-Llama-3-8B-Instruct
169+
```
171170

172-
# Using remote file
171+
Using remote file:
172+
173+
```bash
173174
vllm run-batch \
174175
-i https://raw.githubusercontent.com/vllm-project/vllm/main/examples/offline_inference/openai_batch/openai_example_batch.jsonl \
175176
-o results.jsonl \
176177
--model meta-llama/Meta-Llama-3-8B-Instruct
177178
```
178179

179-
</details>
180+
See [vllm run-batch](./run-batch.md) for the full reference of all available arguments.
180181

181182
## More Help
182183

docs/cli/bench/latency.md

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
# vllm bench latency
2+
3+
## JSON CLI Arguments
4+
5+
--8<-- "docs/cli/json_tip.inc.md"
6+
7+
## Options
8+
9+
--8<-- "docs/argparse/bench_latency.md"

docs/cli/bench/serve.md

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
# vllm bench serve
2+
3+
## JSON CLI Arguments
4+
5+
--8<-- "docs/cli/json_tip.inc.md"
6+
7+
## Options
8+
9+
--8<-- "docs/argparse/bench_serve.md"

docs/cli/bench/throughput.md

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
# vllm bench throughput
2+
3+
## JSON CLI Arguments
4+
5+
--8<-- "docs/cli/json_tip.inc.md"
6+
7+
## Options
8+
9+
--8<-- "docs/argparse/bench_throughput.md"

docs/cli/chat.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
# vllm chat
2+
3+
## Options
4+
5+
--8<-- "docs/argparse/chat.md"

docs/cli/complete.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
# vllm complete
2+
3+
## Options
4+
5+
--8<-- "docs/argparse/complete.md"

0 commit comments

Comments
 (0)