Skip to content

Commit b1f5f5f

Browse files
committed
Updates to docs folder for new CLI args
Signed-off-by: Mark Kurtz <[email protected]>
1 parent 71fe942 commit b1f5f5f

File tree

10 files changed

+69
-68
lines changed

10 files changed

+69
-68
lines changed

docs/examples/index.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -8,12 +8,12 @@ Welcome to the GuideLLM examples section! This area is designed to showcase prac
88

99
## Call for Contributions
1010

11-
Currently, we do not have any specific examples available, but we welcome contributions from the community! If you have examples of how you've used GuideLLM to solve real-world problems or optimize your LLM deployments, we'd love to feature them here.
11+
Currently, we do not have many specific examples available, but we welcome contributions from the community! If you have examples of how you've used GuideLLM to solve real-world problems or optimize your LLM deployments, we'd love to feature them here.
1212

1313
To contribute an example:
1414

15-
1. Fork the [GuideLLM repository](https://github.com/neuralmagic/guidellm)
16-
2. Create your example in the `docs/examples/` directory following our [contribution guidelines](https://github.com/neuralmagic/guidellm/blob/main/CONTRIBUTING.md)
15+
1. Fork the [GuideLLM repository](https://github.com/vllm-project/guidellm)
16+
2. Create your example in the `docs/examples/` directory following our [contribution guidelines](https://github.com/vllm-project/guidellm/blob/main/CONTRIBUTING.md)
1717
3. Submit a pull request with your contribution
1818

1919
Your examples will help others leverage GuideLLM more effectively and contribute to the growing knowledge base around LLM deployment optimization.

docs/examples/practice_on_vllm_simulator.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -96,7 +96,7 @@ guidellm benchmark \
9696
--target "http://localhost:8000/" \
9797
--model "tweet-summary-0" \
9898
--processor "${local_path}/Qwen2.5-1.5B-Instruct" \
99-
--rate-type sweep \
99+
--profile sweep \
100100
--max-seconds 10 \
101101
--max-requests 10 \
102102
--data "prompt_tokens=128,output_tokens=56"

docs/getting-started/analyze.md

Lines changed: 20 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -64,20 +64,31 @@ The p99 (99th percentile) values are particularly important for SLO analysis, as
6464

6565
## Analyzing the Results File
6666

67-
For deeper analysis, GuideLLM saves detailed results to a file (default: `benchmarks.json`). This file contains all metrics with more comprehensive statistics and individual request data.
67+
For deeper analysis, GuideLLM saves detailed results to multiple files by default in your current directory:
68+
69+
- `benchmarks.json`: Complete benchmark data in JSON format
70+
- `benchmarks.csv`: Summary of key metrics in CSV format
71+
- `benchmarks.html`: Interactive HTML report with visualizations
6872

6973
### File Formats
7074

71-
GuideLLM supports multiple output formats:
75+
GuideLLM supports multiple output formats that can be customized:
76+
77+
- **JSON**: Complete benchmark data in JSON format with full request samples
78+
- **CSV**: Summary of key metrics in CSV format suitable for spreadsheets
79+
- **HTML**: Interactive HTML report with tables and visualizations
80+
- **Console**: Terminal output displayed during execution
81+
82+
To specify which formats to generate, use the `--outputs` argument:
7283

73-
- **JSON**: Complete benchmark data in JSON format (default)
74-
- **YAML**: Complete benchmark data in human-readable YAML format
75-
- **CSV**: Summary of key metrics in CSV format
84+
```bash
85+
guidellm benchmark --target "http://localhost:8000" --outputs json csv
86+
```
7687

77-
To specify the format, use the `--output-path` argument with the appropriate extension:
88+
To change the output directory, use the `--output-dir` argument:
7889

7990
```bash
80-
guidellm benchmark --target "http://localhost:8000" --output-path results.yaml
91+
guidellm benchmark --target "http://localhost:8000" --output-dir results/
8192
```
8293

8394
### Programmatic Analysis
@@ -130,8 +141,8 @@ When analyzing your results, focus on these key indicators:
130141
Run benchmarks with different models or hardware configurations, then compare:
131142

132143
```bash
133-
guidellm benchmark --target "http://server1:8000" --output-path model1.json
134-
guidellm benchmark --target "http://server2:8000" --output-path model2.json
144+
guidellm benchmark --target "http://server1:8000" --output-dir model1/
145+
guidellm benchmark --target "http://server2:8000" --output-dir model2/
135146
```
136147

137148
### Cost Optimization

docs/getting-started/benchmark.md

Lines changed: 8 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ This command:
2727
- Connects to your vLLM server running at `http://localhost:8000`
2828
- Uses synthetic data with 256 prompt tokens and 128 output tokens per request
2929
- Automatically determines the available model on the server
30-
- Runs a "sweep" benchmark (default) to find optimal performance points
30+
- Runs a "sweep" profile (default) to find optimal performance points
3131

3232
During the benchmark, you'll see a progress display similar to this:
3333

@@ -44,14 +44,15 @@ GuideLLM offers a wide range of configuration options to customize your benchmar
4444
| `--target` | URL of the OpenAI-compatible server | `--target "http://localhost:8000"` |
4545
| `--model` | Model name to benchmark (optional) | `--model "Meta-Llama-3.1-8B-Instruct"` |
4646
| `--data` | Data configuration for benchmarking | `--data "prompt_tokens=256,output_tokens=128"` |
47-
| `--rate-type` | Type of benchmark to run | `--rate-type sweep` |
47+
| `--profile` | Type of benchmark profile to run | `--profile sweep` |
4848
| `--rate` | Request rate or number of benchmarks for sweep | `--rate 10` |
4949
| `--max-seconds` | Duration for each benchmark in seconds | `--max-seconds 30` |
50-
| `--output-path` | Output file path and format | `--output-path results.json` |
50+
| `--output-dir` | Directory path to save output files | `--output-dir results/` |
51+
| `--outputs` | Output formats to generate | `--outputs json csv html` |
5152

52-
### Benchmark Types (`--rate-type`)
53+
### Benchmark Profiles (`--profile`)
5354

54-
GuideLLM supports several benchmark types:
55+
GuideLLM supports several benchmark profiles and strategies:
5556

5657
- `synchronous`: Runs requests one at a time (sequential)
5758
- `throughput`: Tests maximum throughput by running requests in parallel
@@ -82,12 +83,12 @@ While synthetic data is convenient for quick tests, you can benchmark with real-
8283
guidellm benchmark \
8384
--target "http://localhost:8000" \
8485
--data "/path/to/your/dataset.json" \
85-
--rate-type constant \
86+
--profile constant \
8687
--rate 5
8788
```
8889

8990
You can also use datasets from HuggingFace or customize synthetic data generation with additional parameters such as standard deviation, minimum, and maximum values.
9091

91-
By default, complete results are saved to `benchmarks.json` in your current directory. Use the `--output-path` parameter to specify a different location or format.
92+
By default, complete results are saved to `benchmarks.json`, `benchmarks.csv`, and `benchmarks.html` in your current directory. Use the `--output-dir` parameter to specify a different location and `--outputs` to control which formats are generated.
9293

9394
Learn more about dataset options in the [Datasets documentation](../guides/datasets.md).

docs/getting-started/install.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ Before installing GuideLLM, ensure you have the following prerequisites:
1212

1313
- **Operating System:** Linux or MacOS
1414

15-
- **Python Version:** 3.9 – 3.13
15+
- **Python Version:** 3.10 – 3.13
1616

1717
- **Pip Version:** Ensure you have the latest version of pip installed. You can upgrade pip using the following command:
1818

@@ -27,10 +27,10 @@ Before installing GuideLLM, ensure you have the following prerequisites:
2727
The simplest way to install GuideLLM is via pip from the Python Package Index (PyPI):
2828

2929
```bash
30-
pip install guidellm
30+
pip install guidellm[recommended]
3131
```
3232

33-
This will install the latest stable release of GuideLLM.
33+
This will install the latest stable release of GuideLLM with recommended dependencies.
3434

3535
### 2. Install a Specific Version from PyPI
3636

@@ -45,7 +45,7 @@ pip install guidellm==0.2.0
4545
To install the latest development version of GuideLLM from the main branch, use the following command:
4646

4747
```bash
48-
pip install git+https://github.com/neuralmagic/guidellm.git
48+
pip install git+https://github.com/vllm-project/guidellm.git
4949
```
5050

5151
This will clone the repository and install GuideLLM directly from the main branch.
@@ -55,7 +55,7 @@ This will clone the repository and install GuideLLM directly from the main branc
5555
If you want to install GuideLLM from a specific branch (e.g., `feature-branch`), use the following command:
5656

5757
```bash
58-
pip install git+https://github.com/neuralmagic/guidellm.git@feature-branch
58+
pip install git+https://github.com/vllm-project/guidellm.git@feature-branch
5959
```
6060

6161
Replace `feature-branch` with the name of the branch you want to install.
@@ -88,4 +88,4 @@ This should display the installed version of GuideLLM.
8888

8989
## Troubleshooting
9090

91-
If you encounter any issues during installation, ensure that your Python and pip versions meet the prerequisites. For further assistance, please refer to the [GitHub Issues](https://github.com/neuralmagic/guidellm/issues) page or consult the [Documentation](https://github.com/neuralmagic/guidellm/tree/main/docs).
91+
If you encounter any issues during installation, ensure that your Python and pip versions meet the prerequisites. For further assistance, please refer to the [GitHub Issues](https://github.com/vllm-project/guidellm/issues) page or consult the [Documentation](https://github.com/vllm-project/guidellm/tree/main/docs).

docs/guides/backends.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -42,4 +42,4 @@ For more information on starting a TGI server, see the [TGI Documentation](https
4242

4343
## Expanding Backend Support
4444

45-
GuideLLM is an open platform, and we encourage contributions to extend its backend support. Whether it's adding new server implementations, integrating with Python-based backends, or enhancing existing capabilities, your contributions are welcome. For more details on how to contribute, see the [CONTRIBUTING.md](https://github.com/neuralmagic/guidellm/blob/main/CONTRIBUTING.md) file.
45+
GuideLLM is an open platform, and we encourage contributions to extend its backend support. Whether it's adding new server implementations, integrating with Python-based backends, or enhancing existing capabilities, your contributions are welcome. For more details on how to contribute, see the [CONTRIBUTING.md](https://github.com/vllm-project/guidellm/blob/main/CONTRIBUTING.md) file.

docs/guides/datasets.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ The following arguments can be used to configure datasets and their processing:
2222
```bash
2323
guidellm benchmark \
2424
--target "http://localhost:8000" \
25-
--rate-type "throughput" \
25+
--profile "throughput" \
2626
--max-requests 1000 \
2727
--data "path/to/dataset|dataset_id" \
2828
--data-args '{"prompt_column": "prompt", "split": "train"}' \
@@ -44,7 +44,7 @@ Synthetic datasets allow you to generate data on the fly with customizable param
4444
```bash
4545
guidellm benchmark \
4646
--target "http://localhost:8000" \
47-
--rate-type "throughput" \
47+
--profile "throughput" \
4848
--max-requests 1000 \
4949
--data "prompt_tokens=256,output_tokens=128"
5050
```
@@ -54,7 +54,7 @@ Or using a JSON string:
5454
```bash
5555
guidellm benchmark \
5656
--target "http://localhost:8000" \
57-
--rate-type "throughput" \
57+
--profile "throughput" \
5858
--max-requests 1000 \
5959
--data '{"prompt_tokens": 256, "output_tokens": 128}'
6060
```
@@ -85,7 +85,7 @@ GuideLLM supports datasets from the Hugging Face Hub or local directories that f
8585
```bash
8686
guidellm benchmark \
8787
--target "http://localhost:8000" \
88-
--rate-type "throughput" \
88+
--profile "throughput" \
8989
--max-requests 1000 \
9090
--data "garage-bAInd/Open-Platypus"
9191
```
@@ -95,7 +95,7 @@ Or using a local dataset:
9595
```bash
9696
guidellm benchmark \
9797
--target "http://localhost:8000" \
98-
--rate-type "throughput" \
98+
--profile "throughput" \
9999
--max-requests 1000 \
100100
--data "path/to/dataset"
101101
```
@@ -147,7 +147,7 @@ GuideLLM supports various file formats for datasets, including text, CSV, JSON,
147147
```bash
148148
guidellm benchmark \
149149
--target "http://localhost:8000" \
150-
--rate-type "throughput" \
150+
--profile "throughput" \
151151
--max-requests 1000 \
152152
--data "path/to/dataset.ext" \
153153
--data-args '{"prompt_column": "prompt", "split": "train"}'

docs/guides/outputs.md

Lines changed: 16 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ For all of the output formats, `--output-extras` can be used to include addition
77
```bash
88
guidellm benchmark \
99
--target "http://localhost:8000" \
10-
--rate-type sweep \
10+
--profile sweep \
1111
--max-seconds 30 \
1212
--data "prompt_tokens=256,output_tokens=128" \
1313
--output-extras '{"tag": "my_tag", "metadata": {"key": "value"}}'
@@ -31,7 +31,7 @@ To disable the progress outputs to the console, use the `disable-progress` flag
3131
```bash
3232
guidellm benchmark \
3333
--target "http://localhost:8000" \
34-
--rate-type sweep \
34+
--profile sweep \
3535
--max-seconds 30 \
3636
--data "prompt_tokens=256,output_tokens=128" \
3737
--disable-progress
@@ -42,57 +42,45 @@ To disable console output, use the `--disable-console-outputs` flag when running
4242
```bash
4343
guidellm benchmark \
4444
--target "http://localhost:8000" \
45-
--rate-type sweep \
45+
--profile sweep \
4646
--max-seconds 30 \
4747
--data "prompt_tokens=256,output_tokens=128" \
4848
--disable-console-outputs
4949
```
5050

51-
### Enabling Extra Information
52-
53-
GuideLLM includes the option to display extra information during the benchmark runs to monitor the overheads and performance of the system. This can be enabled by using the `--display-scheduler-stats` flag when running the `guidellm benchmark` command. For example:
54-
55-
```bash
56-
guidellm benchmark \
57-
--target "http://localhost:8000" \
58-
--rate-type sweep \
59-
--max-seconds 30 \
60-
--data "prompt_tokens=256,output_tokens=128" \
61-
--display-scheduler-stats
62-
```
63-
64-
The above command will display an additional row for each benchmark within the progress output, showing the scheduler overheads and other relevant information.
65-
6651
## File-Based Outputs
6752

6853
GuideLLM supports saving benchmark results to files in various formats, including JSON, YAML, and CSV. These files can be used for further analysis, reporting, or reloading into Python for detailed exploration.
6954

7055
### Supported File Formats
7156

7257
1. **JSON**: Contains all benchmark results, including full statistics and request data. This format is ideal for reloading into Python for in-depth analysis.
73-
2. **YAML**: Similar to JSON, YAML files include all benchmark results and are human-readable.
74-
3. **CSV**: Provides a summary of the benchmark data, focusing on key metrics and statistics. Note that CSV does not include detailed request-level data.
58+
2. **CSV**: Provides a summary of the benchmark data, focusing on key metrics and statistics. Note that CSV does not include detailed request-level data.
59+
3. **HTML**: Interactive HTML report with tables and visualizations of benchmark results.
60+
4. **Console**: Terminal output displayed during execution (can be disabled).
7561

7662
### Configuring File Outputs
7763

78-
- **Output Path**: Use the `--output-path` argument to specify the file path or directory for saving the results. If a directory is provided, the results will be saved as `benchmarks.json` by default. The file type is determined by the file extension (e.g., `.json`, `.yaml`, `.csv`).
79-
- **Sampling**: To limit the size of the output files, you can configure sampling options for the dataset using the `--output-sampling` argument.
64+
- **Output Directory**: Use the `--output-dir` argument to specify the directory for saving the results. By default, files are saved in the current directory.
65+
- **Output Formats**: Use the `--outputs` argument to specify which formats to generate. By default, JSON, CSV, and HTML are generated.
66+
- **Sampling**: To limit the size of the output files and number of detailed request samples included, you can configure sampling options using the `--sample-requests` argument.
8067

81-
Example command to save results in YAML format:
68+
Example command to save results in specific formats:
8269

8370
```bash
8471
guidellm benchmark \
8572
--target "http://localhost:8000" \
86-
--rate-type sweep \
73+
--profile sweep \
8774
--max-seconds 30 \
8875
--data "prompt_tokens=256,output_tokens=128" \
89-
--output-path "results/benchmarks.csv" \
90-
--output-sampling 20
76+
--output-dir "results/" \
77+
--outputs json csv \
78+
--sample-requests 20
9179
```
9280

9381
### Reloading Results
9482

95-
JSON and YAML files can be reloaded into Python for further analysis using the `GenerativeBenchmarksReport` class. Below is a sample code snippet for reloading results:
83+
JSON files can be reloaded into Python for further analysis using the `GenerativeBenchmarksReport` class. Below is a sample code snippet for reloading results:
9684

9785
```python
9886
from guidellm.benchmark import GenerativeBenchmarksReport
@@ -106,4 +94,4 @@ for benchmark in benchmarks:
10694
print(benchmark.id_)
10795
```
10896

109-
For more details on the `GenerativeBenchmarksReport` class and its methods, refer to the [source code](https://github.com/neuralmagic/guidellm/blob/main/src/guidellm/benchmark/output.py).
97+
For more details on the `GenerativeBenchmarksReport` class and its methods, refer to the [source code](https://github.com/vllm-project/guidellm/blob/main/src/guidellm/benchmark/schemas/generative/reports.py).

docs/index.md

Lines changed: 7 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -8,17 +8,18 @@
88
</p>
99

1010
<h3 align="center">
11-
Scale Efficiently: Evaluate and Optimize Your LLM Deployments for Real-World Inference
11+
SLO-Aware Benchmarking and Evaluation Platform for Optimizing Real-World LLM Inference
1212
</h3>
1313

14-
**GuideLLM** is a platform for evaluating and optimizing the deployment of large language models (LLMs). By simulating real-world inference workloads, GuideLLM enables users to assess the performance, resource requirements, and cost implications of deploying LLMs on various hardware configurations. This approach ensures efficient, scalable, and cost-effective LLM inference serving while maintaining high service quality.
14+
**GuideLLM** is a platform for evaluating how language models perform under real workloads and configurations. It simulates end-to-end interactions with OpenAI-compatible and vLLM-native servers, generates workload patterns that reflect production usage, and produces detailed reports that help teams understand system behavior, resource needs, and operational limits. GuideLLM supports real and synthetic datasets, multimodal inputs, and flexible execution profiles, giving engineering and ML teams a consistent framework for assessing model behavior, tuning deployments, and planning capacity as their systems evolve.
1515

1616
## Key Features
1717

18-
- **Performance Evaluation:** Analyze LLM inference under different load scenarios to ensure your system meets your service level objectives (SLOs).
19-
- **Resource Optimization:** Determine the most suitable hardware configurations for running your models effectively.
20-
- **Cost Estimation:** Understand the financial impact of different deployment strategies and make informed decisions to minimize costs.
21-
- **Scalability Testing:** Simulate scaling to handle large numbers of concurrent users without performance degradation.
18+
- **Captures complete latency and token-level statistics for SLO-driven evaluation:** Including full distributions for TTFT, ITL, and end-to-end behavior.
19+
- **Generates realistic, configurable traffic patterns:** Across synchronous, concurrent, and rate-based modes, including reproducible sweeps to identify safe operating ranges.
20+
- **Supports both real and synthetic multimodal datasets:** Enabling controlled experiments and production-style evaluations in one framework with support for text, image, audio, and video inputs.
21+
- **Produces standardized, exportable reports:** For dashboards, analysis, and regression tracking, ensuring consistency across teams and workflows.
22+
- **Delivers high-throughput, extensible benchmarking:** With multiprocessing, threading, async execution, and a flexible CLI/API for customization or quickstarts.
2223

2324
## Key Sections
2425

src/guidellm/__main__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@
1111
Example:
1212
::
1313
# Run a benchmark against a model
14-
guidellm benchmark run --target http://localhost:8000 --data dataset.json \\
14+
guidellm benchmark --target http://localhost:8000 --data dataset.json \\
1515
--profile sweep
1616
1717
# Preprocess a dataset

0 commit comments

Comments
 (0)