You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/examples/index.md
+3-3Lines changed: 3 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -8,12 +8,12 @@ Welcome to the GuideLLM examples section! This area is designed to showcase prac
8
8
9
9
## Call for Contributions
10
10
11
-
Currently, we do not have any specific examples available, but we welcome contributions from the community! If you have examples of how you've used GuideLLM to solve real-world problems or optimize your LLM deployments, we'd love to feature them here.
11
+
Currently, we do not have many specific examples available, but we welcome contributions from the community! If you have examples of how you've used GuideLLM to solve real-world problems or optimize your LLM deployments, we'd love to feature them here.
12
12
13
13
To contribute an example:
14
14
15
-
1. Fork the [GuideLLM repository](https://github.com/neuralmagic/guidellm)
16
-
2. Create your example in the `docs/examples/` directory following our [contribution guidelines](https://github.com/neuralmagic/guidellm/blob/main/CONTRIBUTING.md)
15
+
1. Fork the [GuideLLM repository](https://github.com/vllm-project/guidellm)
16
+
2. Create your example in the `docs/examples/` directory following our [contribution guidelines](https://github.com/vllm-project/guidellm/blob/main/CONTRIBUTING.md)
17
17
3. Submit a pull request with your contribution
18
18
19
19
Your examples will help others leverage GuideLLM more effectively and contribute to the growing knowledge base around LLM deployment optimization.
Copy file name to clipboardExpand all lines: docs/getting-started/analyze.md
+20-9Lines changed: 20 additions & 9 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -64,20 +64,31 @@ The p99 (99th percentile) values are particularly important for SLO analysis, as
64
64
65
65
## Analyzing the Results File
66
66
67
-
For deeper analysis, GuideLLM saves detailed results to a file (default: `benchmarks.json`). This file contains all metrics with more comprehensive statistics and individual request data.
67
+
For deeper analysis, GuideLLM saves detailed results to multiple files by default in your current directory:
68
+
69
+
-`benchmarks.json`: Complete benchmark data in JSON format
70
+
-`benchmarks.csv`: Summary of key metrics in CSV format
71
+
-`benchmarks.html`: Interactive HTML report with visualizations
68
72
69
73
### File Formats
70
74
71
-
GuideLLM supports multiple output formats:
75
+
GuideLLM supports multiple output formats that can be customized:
76
+
77
+
-**JSON**: Complete benchmark data in JSON format with full request samples
78
+
-**CSV**: Summary of key metrics in CSV format suitable for spreadsheets
79
+
-**HTML**: Interactive HTML report with tables and visualizations
80
+
-**Console**: Terminal output displayed during execution
81
+
82
+
To specify which formats to generate, use the `--outputs` argument:
72
83
73
-
-**JSON**: Complete benchmark data in JSON format (default)
74
-
-**YAML**: Complete benchmark data in human-readable YAML format
Copy file name to clipboardExpand all lines: docs/getting-started/benchmark.md
+8-7Lines changed: 8 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -27,7 +27,7 @@ This command:
27
27
- Connects to your vLLM server running at `http://localhost:8000`
28
28
- Uses synthetic data with 256 prompt tokens and 128 output tokens per request
29
29
- Automatically determines the available model on the server
30
-
- Runs a "sweep" benchmark (default) to find optimal performance points
30
+
- Runs a "sweep" profile (default) to find optimal performance points
31
31
32
32
During the benchmark, you'll see a progress display similar to this:
33
33
@@ -44,14 +44,15 @@ GuideLLM offers a wide range of configuration options to customize your benchmar
44
44
|`--target`| URL of the OpenAI-compatible server |`--target "http://localhost:8000"`|
45
45
|`--model`| Model name to benchmark (optional) |`--model "Meta-Llama-3.1-8B-Instruct"`|
46
46
|`--data`| Data configuration for benchmarking |`--data "prompt_tokens=256,output_tokens=128"`|
47
-
|`--rate-type`| Type of benchmark to run |`--rate-type sweep`|
47
+
|`--profile`| Type of benchmark profile to run |`--profile sweep`|
48
48
|`--rate`| Request rate or number of benchmarks for sweep |`--rate 10`|
49
49
|`--max-seconds`| Duration for each benchmark in seconds |`--max-seconds 30`|
50
-
|`--output-path`| Output file path and format |`--output-path results.json`|
50
+
|`--output-dir`| Directory path to save output files |`--output-dir results/`|
51
+
|`--outputs`| Output formats to generate |`--outputs json csv html`|
51
52
52
-
### Benchmark Types (`--rate-type`)
53
+
### Benchmark Profiles (`--profile`)
53
54
54
-
GuideLLM supports several benchmark types:
55
+
GuideLLM supports several benchmark profiles and strategies:
55
56
56
57
-`synchronous`: Runs requests one at a time (sequential)
57
58
-`throughput`: Tests maximum throughput by running requests in parallel
@@ -82,12 +83,12 @@ While synthetic data is convenient for quick tests, you can benchmark with real-
82
83
guidellm benchmark \
83
84
--target "http://localhost:8000" \
84
85
--data "/path/to/your/dataset.json" \
85
-
--rate-type constant \
86
+
--profile constant \
86
87
--rate 5
87
88
```
88
89
89
90
You can also use datasets from HuggingFace or customize synthetic data generation with additional parameters such as standard deviation, minimum, and maximum values.
90
91
91
-
By default, complete results are saved to `benchmarks.json`in your current directory. Use the `--output-path` parameter to specify a different location or format.
92
+
By default, complete results are saved to `benchmarks.json`, `benchmarks.csv`, and `benchmarks.html`in your current directory. Use the `--output-dir` parameter to specify a different location and `--outputs` to control which formats are generated.
92
93
93
94
Learn more about dataset options in the [Datasets documentation](../guides/datasets.md).
Replace `feature-branch` with the name of the branch you want to install.
@@ -88,4 +88,4 @@ This should display the installed version of GuideLLM.
88
88
89
89
## Troubleshooting
90
90
91
-
If you encounter any issues during installation, ensure that your Python and pip versions meet the prerequisites. For further assistance, please refer to the [GitHub Issues](https://github.com/neuralmagic/guidellm/issues) page or consult the [Documentation](https://github.com/neuralmagic/guidellm/tree/main/docs).
91
+
If you encounter any issues during installation, ensure that your Python and pip versions meet the prerequisites. For further assistance, please refer to the [GitHub Issues](https://github.com/vllm-project/guidellm/issues) page or consult the [Documentation](https://github.com/vllm-project/guidellm/tree/main/docs).
Copy file name to clipboardExpand all lines: docs/guides/backends.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -42,4 +42,4 @@ For more information on starting a TGI server, see the [TGI Documentation](https
42
42
43
43
## Expanding Backend Support
44
44
45
-
GuideLLM is an open platform, and we encourage contributions to extend its backend support. Whether it's adding new server implementations, integrating with Python-based backends, or enhancing existing capabilities, your contributions are welcome. For more details on how to contribute, see the [CONTRIBUTING.md](https://github.com/neuralmagic/guidellm/blob/main/CONTRIBUTING.md) file.
45
+
GuideLLM is an open platform, and we encourage contributions to extend its backend support. Whether it's adding new server implementations, integrating with Python-based backends, or enhancing existing capabilities, your contributions are welcome. For more details on how to contribute, see the [CONTRIBUTING.md](https://github.com/vllm-project/guidellm/blob/main/CONTRIBUTING.md) file.
@@ -31,7 +31,7 @@ To disable the progress outputs to the console, use the `disable-progress` flag
31
31
```bash
32
32
guidellm benchmark \
33
33
--target "http://localhost:8000" \
34
-
--rate-type sweep \
34
+
--profile sweep \
35
35
--max-seconds 30 \
36
36
--data "prompt_tokens=256,output_tokens=128" \
37
37
--disable-progress
@@ -42,57 +42,45 @@ To disable console output, use the `--disable-console-outputs` flag when running
42
42
```bash
43
43
guidellm benchmark \
44
44
--target "http://localhost:8000" \
45
-
--rate-type sweep \
45
+
--profile sweep \
46
46
--max-seconds 30 \
47
47
--data "prompt_tokens=256,output_tokens=128" \
48
48
--disable-console-outputs
49
49
```
50
50
51
-
### Enabling Extra Information
52
-
53
-
GuideLLM includes the option to display extra information during the benchmark runs to monitor the overheads and performance of the system. This can be enabled by using the `--display-scheduler-stats` flag when running the `guidellm benchmark` command. For example:
54
-
55
-
```bash
56
-
guidellm benchmark \
57
-
--target "http://localhost:8000" \
58
-
--rate-type sweep \
59
-
--max-seconds 30 \
60
-
--data "prompt_tokens=256,output_tokens=128" \
61
-
--display-scheduler-stats
62
-
```
63
-
64
-
The above command will display an additional row for each benchmark within the progress output, showing the scheduler overheads and other relevant information.
65
-
66
51
## File-Based Outputs
67
52
68
53
GuideLLM supports saving benchmark results to files in various formats, including JSON, YAML, and CSV. These files can be used for further analysis, reporting, or reloading into Python for detailed exploration.
69
54
70
55
### Supported File Formats
71
56
72
57
1.**JSON**: Contains all benchmark results, including full statistics and request data. This format is ideal for reloading into Python for in-depth analysis.
73
-
2.**YAML**: Similar to JSON, YAML files include all benchmark results and are human-readable.
74
-
3.**CSV**: Provides a summary of the benchmark data, focusing on key metrics and statistics. Note that CSV does not include detailed request-level data.
58
+
2.**CSV**: Provides a summary of the benchmark data, focusing on key metrics and statistics. Note that CSV does not include detailed request-level data.
59
+
3.**HTML**: Interactive HTML report with tables and visualizations of benchmark results.
60
+
4.**Console**: Terminal output displayed during execution (can be disabled).
75
61
76
62
### Configuring File Outputs
77
63
78
-
-**Output Path**: Use the `--output-path` argument to specify the file path or directory for saving the results. If a directory is provided, the results will be saved as `benchmarks.json` by default. The file type is determined by the file extension (e.g., `.json`, `.yaml`, `.csv`).
79
-
-**Sampling**: To limit the size of the output files, you can configure sampling options for the dataset using the `--output-sampling` argument.
64
+
-**Output Directory**: Use the `--output-dir` argument to specify the directory for saving the results. By default, files are saved in the current directory.
65
+
-**Output Formats**: Use the `--outputs` argument to specify which formats to generate. By default, JSON, CSV, and HTML are generated.
66
+
-**Sampling**: To limit the size of the output files and number of detailed request samples included, you can configure sampling options using the `--sample-requests` argument.
80
67
81
-
Example command to save results in YAML format:
68
+
Example command to save results in specific formats:
82
69
83
70
```bash
84
71
guidellm benchmark \
85
72
--target "http://localhost:8000" \
86
-
--rate-type sweep \
73
+
--profile sweep \
87
74
--max-seconds 30 \
88
75
--data "prompt_tokens=256,output_tokens=128" \
89
-
--output-path "results/benchmarks.csv" \
90
-
--output-sampling 20
76
+
--output-dir "results/" \
77
+
--outputs json csv \
78
+
--sample-requests 20
91
79
```
92
80
93
81
### Reloading Results
94
82
95
-
JSON and YAML files can be reloaded into Python for further analysis using the `GenerativeBenchmarksReport` class. Below is a sample code snippet for reloading results:
83
+
JSON files can be reloaded into Python for further analysis using the `GenerativeBenchmarksReport` class. Below is a sample code snippet for reloading results:
96
84
97
85
```python
98
86
from guidellm.benchmark import GenerativeBenchmarksReport
@@ -106,4 +94,4 @@ for benchmark in benchmarks:
106
94
print(benchmark.id_)
107
95
```
108
96
109
-
For more details on the `GenerativeBenchmarksReport` class and its methods, refer to the [source code](https://github.com/neuralmagic/guidellm/blob/main/src/guidellm/benchmark/output.py).
97
+
For more details on the `GenerativeBenchmarksReport` class and its methods, refer to the [source code](https://github.com/vllm-project/guidellm/blob/main/src/guidellm/benchmark/schemas/generative/reports.py).
Copy file name to clipboardExpand all lines: docs/index.md
+7-6Lines changed: 7 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -8,17 +8,18 @@
8
8
</p>
9
9
10
10
<h3align="center">
11
-
Scale Efficiently: Evaluate and Optimize Your LLM Deployments for Real-World Inference
11
+
SLO-Aware Benchmarking and Evaluation Platform for Optimizing Real-World LLM Inference
12
12
</h3>
13
13
14
-
**GuideLLM** is a platform for evaluating and optimizing the deployment of large language models (LLMs). By simulating real-world inference workloads, GuideLLM enables users to assess the performance, resource requirements, and cost implications of deploying LLMs on various hardware configurations. This approach ensures efficient, scalable, and cost-effective LLM inference serving while maintaining high service quality.
14
+
**GuideLLM** is a platform for evaluating how language models perform under real workloads and configurations. It simulates end-to-end interactions with OpenAI-compatible and vLLM-native servers, generates workload patterns that reflect production usage, and produces detailed reports that help teams understand system behavior, resource needs, and operational limits. GuideLLM supports real and synthetic datasets, multimodal inputs, and flexible execution profiles, giving engineering and ML teams a consistent framework for assessing model behavior, tuning deployments, and planning capacity as their systems evolve.
15
15
16
16
## Key Features
17
17
18
-
-**Performance Evaluation:** Analyze LLM inference under different load scenarios to ensure your system meets your service level objectives (SLOs).
19
-
-**Resource Optimization:** Determine the most suitable hardware configurations for running your models effectively.
20
-
-**Cost Estimation:** Understand the financial impact of different deployment strategies and make informed decisions to minimize costs.
21
-
-**Scalability Testing:** Simulate scaling to handle large numbers of concurrent users without performance degradation.
18
+
-**Captures complete latency and token-level statistics for SLO-driven evaluation:** Including full distributions for TTFT, ITL, and end-to-end behavior.
19
+
-**Generates realistic, configurable traffic patterns:** Across synchronous, concurrent, and rate-based modes, including reproducible sweeps to identify safe operating ranges.
20
+
-**Supports both real and synthetic multimodal datasets:** Enabling controlled experiments and production-style evaluations in one framework with support for text, image, audio, and video inputs.
21
+
-**Produces standardized, exportable reports:** For dashboards, analysis, and regression tracking, ensuring consistency across teams and workflows.
22
+
-**Delivers high-throughput, extensible benchmarking:** With multiprocessing, threading, async execution, and a flexible CLI/API for customization or quickstarts.
0 commit comments