Skip to content

Commit 11cf5d2

Browse files
committed
Merge branch 'main' into features/reenable-preprocess
2 parents 81fbbe8 + 8f1e001 commit 11cf5d2

32 files changed

+1292
-613
lines changed

README.md

Lines changed: 145 additions & 138 deletions
Large diffs are not rendered by default.

docs/developer/index.md

Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
---
2+
weight: -3
3+
---
4+
5+
# Developer
6+
7+
Welcome to the Developer section of GuideLLM! This area provides essential resources for developers who want to contribute to or extend GuideLLM. Whether you're interested in fixing bugs, adding new features, improving documentation, or understanding the project's governance, you'll find comprehensive guides to help you get started.
8+
9+
GuideLLM is an open-source project that values community contributions. We maintain high standards for code quality, documentation, and community interactions to ensure that GuideLLM remains a robust, reliable, and user-friendly tool for evaluating and optimizing LLM deployments.
10+
11+
## Developer Resources
12+
13+
<div class="grid cards" markdown>
14+
15+
- :material-handshake:{ .lg .middle } Code of Conduct
16+
17+
______________________________________________________________________
18+
19+
Our community guidelines ensure that participation in the GuideLLM project is a positive, inclusive, and respectful experience for everyone.
20+
21+
[:octicons-arrow-right-24: Code of Conduct](code-of-conduct.md)
22+
23+
- :material-source-pull:{ .lg .middle } Contributing Guide
24+
25+
______________________________________________________________________
26+
27+
Learn how to effectively contribute to GuideLLM, including reporting bugs, suggesting features, improving documentation, and submitting code.
28+
29+
[:octicons-arrow-right-24: Contributing Guide](contributing.md)
30+
31+
- :material-tools:{ .lg .middle } Development Guide
32+
33+
______________________________________________________________________
34+
35+
Detailed instructions for setting up your development environment, implementing changes, and adhering to the project's coding standards and best practices.
36+
37+
[:octicons-arrow-right-24: Development Guide](developing.md)
38+
39+
</div>

docs/examples/index.md

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
---
2+
weight: -4
3+
---
4+
5+
# Examples
6+
7+
Welcome to the GuideLLM examples section! This area is designed to showcase practical applications of GuideLLM for evaluating and optimizing LLM deployments in various real-world scenarios. Our goal is to provide you with concrete examples that demonstrate how to use GuideLLM effectively in your own workflows.
8+
9+
## Call for Contributions
10+
11+
Currently, we do not have many specific examples available, but we welcome contributions from the community! If you have examples of how you've used GuideLLM to solve real-world problems or optimize your LLM deployments, we'd love to feature them here.
12+
13+
To contribute an example:
14+
15+
1. Fork the [GuideLLM repository](https://github.com/vllm-project/guidellm)
16+
2. Create your example in the `docs/examples/` directory following our [contribution guidelines](https://github.com/vllm-project/guidellm/blob/main/CONTRIBUTING.md)
17+
3. Submit a pull request with your contribution
18+
19+
Your examples will help others leverage GuideLLM more effectively and contribute to the growing knowledge base around LLM deployment optimization.

docs/examples/practice_on_vllm_simulator.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -96,7 +96,7 @@ guidellm benchmark \
9696
--target "http://localhost:8000/" \
9797
--model "tweet-summary-0" \
9898
--processor "${local_path}/Qwen2.5-1.5B-Instruct" \
99-
--rate-type sweep \
99+
--profile sweep \
100100
--max-seconds 10 \
101101
--max-requests 10 \
102102
--data "prompt_tokens=128,output_tokens=56"

docs/getting-started/analyze.md

Lines changed: 169 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,169 @@
1+
---
2+
weight: -4
3+
---
4+
5+
# Analyze Results
6+
7+
After [running a benchmark](benchmark.md), GuideLLM provides comprehensive results that help you understand your LLM deployment's performance. This guide explains how to interpret both console output and file-based results.
8+
9+
## Understanding Console Output
10+
11+
Upon benchmark completion, GuideLLM automatically displays results in the console, divided into three main sections:
12+
13+
### 1. Benchmarks Metadata
14+
15+
This section provides a high-level summary of the benchmark run, including:
16+
17+
- **Server configuration**: Target URL, model name, and backend details
18+
- **Data configuration**: Data source, token counts, and dataset properties
19+
- **Profile arguments**: Rate type, maximum duration, request limits, etc.
20+
- **Extras**: Any additional metadata provided via the `--output-extras` argument
21+
22+
Example:
23+
24+
```
25+
Benchmarks Metadata
26+
------------------
27+
Args: {"backend_type": "openai", "target": "http://localhost:8000", "model": "Meta-Llama-3.1-8B-Instruct-quantized", ...}
28+
Worker: {"type_": "generative", "backend_type": "openai", "backend_args": {"timeout": 120.0, ...}, ...}
29+
Request Loader: {"type_": "generative", "data_args": {"prompt_tokens": 256, "output_tokens": 128, ...}, ...}
30+
Extras: {}
31+
```
32+
33+
### 2. Benchmarks Info
34+
35+
This section summarizes the key information about each benchmark run, presented as a table with columns:
36+
37+
- **Type**: The benchmark type (e.g., synchronous, constant, poisson, etc.)
38+
- **Start/End Time**: When the benchmark started and ended
39+
- **Duration**: Total duration of the benchmark in seconds
40+
- **Requests**: Count of successful, incomplete, and errored requests
41+
- **Token Stats**: Average token counts and totals for prompts and outputs
42+
43+
This section helps you understand what was executed and provides a quick overview of the results.
44+
45+
### 3. Benchmarks Stats
46+
47+
This is the most critical section for performance analysis. It displays detailed statistics for each metric:
48+
49+
- **Throughput Metrics**:
50+
51+
- Requests per second (RPS)
52+
- Request concurrency
53+
- Output tokens per second
54+
- Total tokens per second
55+
56+
- **Latency Metrics**:
57+
58+
- Request latency (mean, median, p99)
59+
- Time to first token (TTFT) (mean, median, p99)
60+
- Inter-token latency (ITL) (mean, median, p99)
61+
- Time per output token (mean, median, p99)
62+
63+
The p99 (99th percentile) values are particularly important for SLO analysis, as they represent the worst-case performance for 99% of requests.
64+
65+
## Analyzing the Results File
66+
67+
For deeper analysis, GuideLLM saves detailed results to multiple files by default in your current directory:
68+
69+
- `benchmarks.json`: Complete benchmark data in JSON format
70+
- `benchmarks.csv`: Summary of key metrics in CSV format
71+
- `benchmarks.html`: Interactive HTML report with visualizations
72+
73+
### File Formats
74+
75+
GuideLLM supports multiple output formats that can be customized:
76+
77+
- **JSON**: Complete benchmark data in JSON format with full request samples
78+
- **YAML**: Complete benchmark data in YAML format with full request samples
79+
- **CSV**: Summary of key metrics in CSV format suitable for spreadsheets
80+
- **HTML**: Interactive HTML report with tables and visualizations
81+
- **Console**: Terminal output displayed during execution
82+
83+
To specify which formats to generate, use the `--outputs` argument:
84+
85+
```bash
86+
guidellm benchmark --target "http://localhost:8000" --outputs json csv
87+
```
88+
89+
The `--outputs` argument additionally accepts full file names to further customize/differentiate outputs:
90+
91+
```bash
92+
guidellm benchmark --target "http://localhost:8000" --outputs results/benchmarks.json results/summary.csv
93+
```
94+
95+
To change the output directory, use the `--output-dir` argument:
96+
97+
```bash
98+
guidellm benchmark --target "http://localhost:8000" --output-dir results/
99+
```
100+
101+
### Programmatic Analysis
102+
103+
For custom analysis, you can reload the results into Python:
104+
105+
```python
106+
from guidellm.benchmark import GenerativeBenchmarksReport
107+
108+
# Load results from file
109+
report = GenerativeBenchmarksReport.load_file("benchmarks.json")
110+
111+
# Access individual benchmarks
112+
for benchmark in report.benchmarks:
113+
# Print basic info
114+
print(f"Benchmark: {benchmark.id_}")
115+
print(f"Type: {benchmark.type_}")
116+
117+
# Access metrics
118+
print(f"Avg RPS: {benchmark.metrics.requests_per_second.successful.mean}")
119+
print(f"p99 latency: {benchmark.metrics.request_latency.successful.percentiles.p99}")
120+
print(f"TTFT (p99): {benchmark.metrics.time_to_first_token_ms.successful.percentiles.p99}")
121+
```
122+
123+
## Key Performance Indicators
124+
125+
When analyzing your results, focus on these key indicators:
126+
127+
### 1. Throughput and Capacity
128+
129+
- **Maximum RPS**: What's the highest request rate your server can handle?
130+
- **Concurrency**: How many concurrent requests can your server process?
131+
- **Token Throughput**: How many tokens per second can your server generate?
132+
133+
### 2. Latency and Responsiveness
134+
135+
- **Time to First Token (TTFT)**: How quickly does the model start generating output?
136+
- **Inter-Token Latency (ITL)**: How smoothly does the model generate subsequent tokens?
137+
- **Total Request Latency**: How long do complete requests take end-to-end?
138+
139+
### 3. Reliability and Error Rates
140+
141+
- **Success Rate**: What percentage of requests completes successfully?
142+
- **Error Distribution**: What types of errors occur and at what rates?
143+
144+
## Additional Analysis Techniques
145+
146+
### Comparing Different Models or Hardware
147+
148+
Run benchmarks with different models or hardware configurations, then compare:
149+
150+
```bash
151+
guidellm benchmark --target "http://server1:8000" --output-dir model1/
152+
guidellm benchmark --target "http://server2:8000" --output-dir model2/
153+
```
154+
155+
### Cost Optimization
156+
157+
Calculate cost-effectiveness by analyzing:
158+
159+
- Tokens per second per dollar of hardware cost
160+
- Maximum throughput for different hardware configurations
161+
- Optimal batch size vs. latency tradeoffs
162+
163+
### Determining Scaling Requirements
164+
165+
Use your benchmark results to plan:
166+
167+
- How many servers you need to handle your expected load
168+
- When to automatically scale up or down based on demand
169+
- What hardware provides the best price/performance for your workload

docs/getting-started/benchmark.md

Lines changed: 95 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,95 @@
1+
---
2+
weight: -6
3+
---
4+
5+
# Run a Benchmark
6+
7+
After [installing GuideLLM](install.md) and [starting a server](server.md), you're ready to run benchmarks to evaluate your LLM deployment's performance.
8+
9+
Running a GuideLLM benchmark is straightforward. The basic command structure is:
10+
11+
```bash
12+
guidellm benchmark --target <server-url> [options]
13+
```
14+
15+
### Basic Example
16+
17+
To run a benchmark against your local vLLM server with default settings:
18+
19+
```bash
20+
guidellm benchmark \
21+
--target "http://localhost:8000" \
22+
--data "prompt_tokens=256,output_tokens=128" \
23+
--max-seconds 60
24+
```
25+
26+
This command:
27+
28+
- Connects to your vLLM server running at `http://localhost:8000`
29+
- Uses synthetic data with 256 prompt tokens and 128 output tokens per request
30+
- Automatically determines the available model on the server
31+
- Runs a "sweep" profile (default) to find optimal performance points
32+
33+
During the benchmark, you'll see a progress display similar to this:
34+
35+
![Benchmark Progress](../assets/sample-benchmarks.gif)
36+
37+
## Understanding Benchmark Options
38+
39+
GuideLLM offers a wide range of configuration options to customize your benchmarks. Here are the most important parameters you should know:
40+
41+
### Key Parameters
42+
43+
| Parameter | Description | Example |
44+
| --------------- | ---------------------------------------------- | ---------------------------------------------- |
45+
| `--target` | URL of the OpenAI-compatible server | `--target "http://localhost:8000"` |
46+
| `--model` | Model name to benchmark (optional) | `--model "Meta-Llama-3.1-8B-Instruct"` |
47+
| `--data` | Data configuration for benchmarking | `--data "prompt_tokens=256,output_tokens=128"` |
48+
| `--profile` | Type of benchmark profile to run | `--profile sweep` |
49+
| `--rate` | Request rate or number of benchmarks for sweep | `--rate 10` |
50+
| `--max-seconds` | Duration for each benchmark in seconds | `--max-seconds 30` |
51+
| `--output-dir` | Directory path to save output files | `--output-dir results/` |
52+
| `--outputs` | Output formats to generate | `--outputs json csv html` |
53+
54+
### Benchmark Profiles (`--profile`)
55+
56+
GuideLLM supports several benchmark profiles and strategies:
57+
58+
- `synchronous`: Runs requests one at a time (sequential)
59+
- `throughput`: Tests maximum throughput by running requests in parallel
60+
- `concurrent`: Runs a fixed number of parallel request streams
61+
- `constant`: Sends requests at a fixed rate per second
62+
- `poisson`: Sends requests following a Poisson distribution
63+
- `sweep`: Automatically determines optimal performance points (default)
64+
65+
### Data Options
66+
67+
For synthetic data, some key options include, among others:
68+
69+
- `prompt_tokens`: Average number of tokens for prompts
70+
- `output_tokens`: Average number of tokens for outputs
71+
- `samples`: Number of samples to generate (default: 1000)
72+
73+
For a complete list of options, run:
74+
75+
```bash
76+
guidellm benchmark run --help
77+
```
78+
79+
## Working with Real Data
80+
81+
While synthetic data is convenient for quick tests, you can benchmark with real-world data:
82+
83+
```bash
84+
guidellm benchmark \
85+
--target "http://localhost:8000" \
86+
--data "/path/to/your/dataset.json" \
87+
--profile constant \
88+
--rate 5
89+
```
90+
91+
You can also use datasets from HuggingFace or customize synthetic data generation with additional parameters such as standard deviation, minimum, and maximum values.
92+
93+
By default, complete results are saved to `benchmarks.json`, `benchmarks.csv`, and `benchmarks.html` in your current directory. Use the `--output-dir` parameter to specify a different location and `--outputs` to control which formats are generated.
94+
95+
Learn more about dataset options in the [Datasets documentation](../guides/datasets.md).

docs/getting-started/index.md

Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,49 @@
1+
---
2+
weight: -10
3+
---
4+
5+
# Getting Started
6+
7+
Welcome to GuideLLM! This section will guide you through the process of installing the tool, setting up your benchmarking environment, running your first benchmark, and analyzing the results to optimize your LLM deployment for real-world inference workloads.
8+
9+
GuideLLM makes it simple to evaluate and optimize your large language model deployments, helping you find the perfect balance between performance, resource utilization, and cost-effectiveness.
10+
11+
## Quick Start Guides
12+
13+
Follow the guides below in sequence to get the most out of GuideLLM and optimize your LLM deployments for production use.
14+
15+
<div class="grid cards" markdown>
16+
17+
- :material-package-variant:{ .lg .middle } Installation
18+
19+
______________________________________________________________________
20+
21+
Learn how to install GuideLLM using pip, from source, or with specific version requirements.
22+
23+
[:octicons-arrow-right-24: Installation Guide](install.md)
24+
25+
- :material-server:{ .lg .middle } Start a Server
26+
27+
______________________________________________________________________
28+
29+
Set up an OpenAI-compatible server using vLLM or other supported backends to benchmark your LLM deployments.
30+
31+
[:octicons-arrow-right-24: Server Setup Guide](server.md)
32+
33+
- :material-speedometer:{ .lg .middle } Run Benchmarks
34+
35+
______________________________________________________________________
36+
37+
Learn how to configure and run performance benchmarks against your LLM server under various load conditions.
38+
39+
[:octicons-arrow-right-24: Benchmarking Guide](benchmark.md)
40+
41+
- :material-chart-bar:{ .lg .middle } Analyze Results
42+
43+
______________________________________________________________________
44+
45+
Interpret benchmark results to understand throughput, latency, reliability, and optimize your deployments.
46+
47+
[:octicons-arrow-right-24: Analysis Guide](analyze.md)
48+
49+
</div>

0 commit comments

Comments
 (0)