Skip to content

Commit 71fe942

Browse files
committed
Add updated and extensive docs and readme for 0.4 release
Signed-off-by: Mark Kurtz <[email protected]>
1 parent ec6c916 commit 71fe942

File tree

24 files changed

+909
-340
lines changed

24 files changed

+909
-340
lines changed

README.md

Lines changed: 144 additions & 139 deletions
Large diffs are not rendered by default.

docs/developer/index.md

Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
---
2+
weight: -3
3+
---
4+
5+
# Developer
6+
7+
Welcome to the Developer section of GuideLLM! This area provides essential resources for developers who want to contribute to or extend GuideLLM. Whether you're interested in fixing bugs, adding new features, improving documentation, or understanding the project's governance, you'll find comprehensive guides to help you get started.
8+
9+
GuideLLM is an open-source project that values community contributions. We maintain high standards for code quality, documentation, and community interactions to ensure that GuideLLM remains a robust, reliable, and user-friendly tool for evaluating and optimizing LLM deployments.
10+
11+
## Developer Resources
12+
13+
<div class="grid cards" markdown>
14+
15+
- :material-handshake:{ .lg .middle } Code of Conduct
16+
17+
______________________________________________________________________
18+
19+
Our community guidelines ensure that participation in the GuideLLM project is a positive, inclusive, and respectful experience for everyone.
20+
21+
[:octicons-arrow-right-24: Code of Conduct](code-of-conduct.md)
22+
23+
- :material-source-pull:{ .lg .middle } Contributing Guide
24+
25+
______________________________________________________________________
26+
27+
Learn how to effectively contribute to GuideLLM, including reporting bugs, suggesting features, improving documentation, and submitting code.
28+
29+
[:octicons-arrow-right-24: Contributing Guide](contributing.md)
30+
31+
- :material-tools:{ .lg .middle } Development Guide
32+
33+
______________________________________________________________________
34+
35+
Detailed instructions for setting up your development environment, implementing changes, and adhering to the project's coding standards and best practices.
36+
37+
[:octicons-arrow-right-24: Development Guide](developing.md)
38+
39+
</div>

docs/examples/index.md

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
---
2+
weight: -4
3+
---
4+
5+
# Examples
6+
7+
Welcome to the GuideLLM examples section! This area is designed to showcase practical applications of GuideLLM for evaluating and optimizing LLM deployments in various real-world scenarios. Our goal is to provide you with concrete examples that demonstrate how to use GuideLLM effectively in your own workflows.
8+
9+
## Call for Contributions
10+
11+
Currently, we do not have any specific examples available, but we welcome contributions from the community! If you have examples of how you've used GuideLLM to solve real-world problems or optimize your LLM deployments, we'd love to feature them here.
12+
13+
To contribute an example:
14+
15+
1. Fork the [GuideLLM repository](https://github.com/neuralmagic/guidellm)
16+
2. Create your example in the `docs/examples/` directory following our [contribution guidelines](https://github.com/neuralmagic/guidellm/blob/main/CONTRIBUTING.md)
17+
3. Submit a pull request with your contribution
18+
19+
Your examples will help others leverage GuideLLM more effectively and contribute to the growing knowledge base around LLM deployment optimization.

docs/getting-started/analyze.md

Lines changed: 151 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,151 @@
1+
---
2+
weight: -4
3+
---
4+
5+
# Analyze Results
6+
7+
After [running a benchmark](benchmark.md), GuideLLM provides comprehensive results that help you understand your LLM deployment's performance. This guide explains how to interpret both console output and file-based results.
8+
9+
## Understanding Console Output
10+
11+
Upon benchmark completion, GuideLLM automatically displays results in the console, divided into three main sections:
12+
13+
### 1. Benchmarks Metadata
14+
15+
This section provides a high-level summary of the benchmark run, including:
16+
17+
- **Server configuration**: Target URL, model name, and backend details
18+
- **Data configuration**: Data source, token counts, and dataset properties
19+
- **Profile arguments**: Rate type, maximum duration, request limits, etc.
20+
- **Extras**: Any additional metadata provided via the `--output-extras` argument
21+
22+
Example:
23+
24+
```
25+
Benchmarks Metadata
26+
------------------
27+
Args: {"backend_type": "openai", "target": "http://localhost:8000", "model": "Meta-Llama-3.1-8B-Instruct-quantized", ...}
28+
Worker: {"type_": "generative", "backend_type": "openai", "backend_args": {"timeout": 120.0, ...}, ...}
29+
Request Loader: {"type_": "generative", "data_args": {"prompt_tokens": 256, "output_tokens": 128, ...}, ...}
30+
Extras: {}
31+
```
32+
33+
### 2. Benchmarks Info
34+
35+
This section summarizes the key information about each benchmark run, presented as a table with columns:
36+
37+
- **Type**: The benchmark type (e.g., synchronous, constant, poisson, etc.)
38+
- **Start/End Time**: When the benchmark started and ended
39+
- **Duration**: Total duration of the benchmark in seconds
40+
- **Requests**: Count of successful, incomplete, and errored requests
41+
- **Token Stats**: Average token counts and totals for prompts and outputs
42+
43+
This section helps you understand what was executed and provides a quick overview of the results.
44+
45+
### 3. Benchmarks Stats
46+
47+
This is the most critical section for performance analysis. It displays detailed statistics for each metric:
48+
49+
- **Throughput Metrics**:
50+
51+
- Requests per second (RPS)
52+
- Request concurrency
53+
- Output tokens per second
54+
- Total tokens per second
55+
56+
- **Latency Metrics**:
57+
58+
- Request latency (mean, median, p99)
59+
- Time to first token (TTFT) (mean, median, p99)
60+
- Inter-token latency (ITL) (mean, median, p99)
61+
- Time per output token (mean, median, p99)
62+
63+
The p99 (99th percentile) values are particularly important for SLO analysis, as they represent the worst-case performance for 99% of requests.
64+
65+
## Analyzing the Results File
66+
67+
For deeper analysis, GuideLLM saves detailed results to a file (default: `benchmarks.json`). This file contains all metrics with more comprehensive statistics and individual request data.
68+
69+
### File Formats
70+
71+
GuideLLM supports multiple output formats:
72+
73+
- **JSON**: Complete benchmark data in JSON format (default)
74+
- **YAML**: Complete benchmark data in human-readable YAML format
75+
- **CSV**: Summary of key metrics in CSV format
76+
77+
To specify the format, use the `--output-path` argument with the appropriate extension:
78+
79+
```bash
80+
guidellm benchmark --target "http://localhost:8000" --output-path results.yaml
81+
```
82+
83+
### Programmatic Analysis
84+
85+
For custom analysis, you can reload the results into Python:
86+
87+
```python
88+
from guidellm.benchmark import GenerativeBenchmarksReport
89+
90+
# Load results from file
91+
report = GenerativeBenchmarksReport.load_file("benchmarks.json")
92+
93+
# Access individual benchmarks
94+
for benchmark in report.benchmarks:
95+
# Print basic info
96+
print(f"Benchmark: {benchmark.id_}")
97+
print(f"Type: {benchmark.type_}")
98+
99+
# Access metrics
100+
print(f"Avg RPS: {benchmark.metrics.requests_per_second.successful.mean}")
101+
print(f"p99 latency: {benchmark.metrics.request_latency.successful.percentiles.p99}")
102+
print(f"TTFT (p99): {benchmark.metrics.time_to_first_token_ms.successful.percentiles.p99}")
103+
```
104+
105+
## Key Performance Indicators
106+
107+
When analyzing your results, focus on these key indicators:
108+
109+
### 1. Throughput and Capacity
110+
111+
- **Maximum RPS**: What's the highest request rate your server can handle?
112+
- **Concurrency**: How many concurrent requests can your server process?
113+
- **Token Throughput**: How many tokens per second can your server generate?
114+
115+
### 2. Latency and Responsiveness
116+
117+
- **Time to First Token (TTFT)**: How quickly does the model start generating output?
118+
- **Inter-Token Latency (ITL)**: How smoothly does the model generate subsequent tokens?
119+
- **Total Request Latency**: How long do complete requests take end-to-end?
120+
121+
### 3. Reliability and Error Rates
122+
123+
- **Success Rate**: What percentage of requests completes successfully?
124+
- **Error Distribution**: What types of errors occur and at what rates?
125+
126+
## Additional Analysis Techniques
127+
128+
### Comparing Different Models or Hardware
129+
130+
Run benchmarks with different models or hardware configurations, then compare:
131+
132+
```bash
133+
guidellm benchmark --target "http://server1:8000" --output-path model1.json
134+
guidellm benchmark --target "http://server2:8000" --output-path model2.json
135+
```
136+
137+
### Cost Optimization
138+
139+
Calculate cost-effectiveness by analyzing:
140+
141+
- Tokens per second per dollar of hardware cost
142+
- Maximum throughput for different hardware configurations
143+
- Optimal batch size vs. latency tradeoffs
144+
145+
### Determining Scaling Requirements
146+
147+
Use your benchmark results to plan:
148+
149+
- How many servers you need to handle your expected load
150+
- When to automatically scale up or down based on demand
151+
- What hardware provides the best price/performance for your workload

docs/getting-started/benchmark.md

Lines changed: 93 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,93 @@
1+
---
2+
weight: -6
3+
---
4+
5+
# Run a Benchmark
6+
7+
After [installing GuideLLM](install.md) and [starting a server](server.md), you're ready to run benchmarks to evaluate your LLM deployment's performance.
8+
9+
Running a GuideLLM benchmark is straightforward. The basic command structure is:
10+
11+
```bash
12+
guidellm benchmark --target <server-url> [options]
13+
```
14+
15+
### Basic Example
16+
17+
To run a benchmark against your local vLLM server with default settings:
18+
19+
```bash
20+
guidellm benchmark \
21+
--target "http://localhost:8000" \
22+
--data "prompt_tokens=256,output_tokens=128"
23+
```
24+
25+
This command:
26+
27+
- Connects to your vLLM server running at `http://localhost:8000`
28+
- Uses synthetic data with 256 prompt tokens and 128 output tokens per request
29+
- Automatically determines the available model on the server
30+
- Runs a "sweep" benchmark (default) to find optimal performance points
31+
32+
During the benchmark, you'll see a progress display similar to this:
33+
34+
![Benchmark Progress](../assets/sample-benchmarks.gif)
35+
36+
## Understanding Benchmark Options
37+
38+
GuideLLM offers a wide range of configuration options to customize your benchmarks. Here are the most important parameters you should know:
39+
40+
### Key Parameters
41+
42+
| Parameter | Description | Example |
43+
| --------------- | ---------------------------------------------- | ---------------------------------------------- |
44+
| `--target` | URL of the OpenAI-compatible server | `--target "http://localhost:8000"` |
45+
| `--model` | Model name to benchmark (optional) | `--model "Meta-Llama-3.1-8B-Instruct"` |
46+
| `--data` | Data configuration for benchmarking | `--data "prompt_tokens=256,output_tokens=128"` |
47+
| `--rate-type` | Type of benchmark to run | `--rate-type sweep` |
48+
| `--rate` | Request rate or number of benchmarks for sweep | `--rate 10` |
49+
| `--max-seconds` | Duration for each benchmark in seconds | `--max-seconds 30` |
50+
| `--output-path` | Output file path and format | `--output-path results.json` |
51+
52+
### Benchmark Types (`--rate-type`)
53+
54+
GuideLLM supports several benchmark types:
55+
56+
- `synchronous`: Runs requests one at a time (sequential)
57+
- `throughput`: Tests maximum throughput by running requests in parallel
58+
- `concurrent`: Runs a fixed number of parallel request streams
59+
- `constant`: Sends requests at a fixed rate per second
60+
- `poisson`: Sends requests following a Poisson distribution
61+
- `sweep`: Automatically determines optimal performance points (default)
62+
63+
### Data Options
64+
65+
For synthetic data, you can customize:
66+
67+
- `prompt_tokens`: Average number of tokens for prompts
68+
- `output_tokens`: Average number of tokens for outputs
69+
- `samples`: Number of samples to generate (default: 1000)
70+
71+
For a complete list of options, run:
72+
73+
```bash
74+
guidellm benchmark --help
75+
```
76+
77+
## Working with Real Data
78+
79+
While synthetic data is convenient for quick tests, you can benchmark with real-world data:
80+
81+
```bash
82+
guidellm benchmark \
83+
--target "http://localhost:8000" \
84+
--data "/path/to/your/dataset.json" \
85+
--rate-type constant \
86+
--rate 5
87+
```
88+
89+
You can also use datasets from HuggingFace or customize synthetic data generation with additional parameters such as standard deviation, minimum, and maximum values.
90+
91+
By default, complete results are saved to `benchmarks.json` in your current directory. Use the `--output-path` parameter to specify a different location or format.
92+
93+
Learn more about dataset options in the [Datasets documentation](../guides/datasets.md).

docs/getting-started/index.md

Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,49 @@
1+
---
2+
weight: -10
3+
---
4+
5+
# Getting Started
6+
7+
Welcome to GuideLLM! This section will guide you through the process of installing the tool, setting up your benchmarking environment, running your first benchmark, and analyzing the results to optimize your LLM deployment for real-world inference workloads.
8+
9+
GuideLLM makes it simple to evaluate and optimize your large language model deployments, helping you find the perfect balance between performance, resource utilization, and cost-effectiveness.
10+
11+
## Quick Start Guides
12+
13+
Follow the guides below in sequence to get the most out of GuideLLM and optimize your LLM deployments for production use.
14+
15+
<div class="grid cards" markdown>
16+
17+
- :material-package-variant:{ .lg .middle } Installation
18+
19+
______________________________________________________________________
20+
21+
Learn how to install GuideLLM using pip, from source, or with specific version requirements.
22+
23+
[:octicons-arrow-right-24: Installation Guide](install.md)
24+
25+
- :material-server:{ .lg .middle } Start a Server
26+
27+
______________________________________________________________________
28+
29+
Set up an OpenAI-compatible server using vLLM or other supported backends to benchmark your LLM deployments.
30+
31+
[:octicons-arrow-right-24: Server Setup Guide](server.md)
32+
33+
- :material-speedometer:{ .lg .middle } Run Benchmarks
34+
35+
______________________________________________________________________
36+
37+
Learn how to configure and run performance benchmarks against your LLM server under various load conditions.
38+
39+
[:octicons-arrow-right-24: Benchmarking Guide](benchmark.md)
40+
41+
- :material-chart-bar:{ .lg .middle } Analyze Results
42+
43+
______________________________________________________________________
44+
45+
Interpret benchmark results to understand throughput, latency, reliability, and optimize your deployments.
46+
47+
[:octicons-arrow-right-24: Analysis Guide](analyze.md)
48+
49+
</div>

docs/install.md renamed to docs/getting-started/install.md

Lines changed: 9 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,8 @@
1-
# Installation Guide for GuideLLM
1+
---
2+
weight: -10
3+
---
4+
5+
# Install
26

37
GuideLLM can be installed using several methods depending on your requirements. Below are the detailed instructions for each installation pathway.
48

@@ -8,7 +12,7 @@ Before installing GuideLLM, ensure you have the following prerequisites:
812

913
- **Operating System:** Linux or MacOS
1014

11-
- **Python Version:** 3.10 – 3.13
15+
- **Python Version:** 3.9 – 3.13
1216

1317
- **Pip Version:** Ensure you have the latest version of pip installed. You can upgrade pip using the following command:
1418

@@ -41,7 +45,7 @@ pip install guidellm==0.2.0
4145
To install the latest development version of GuideLLM from the main branch, use the following command:
4246

4347
```bash
44-
pip install git+https://github.com/vllm-project/guidellm.git
48+
pip install git+https://github.com/neuralmagic/guidellm.git
4549
```
4650

4751
This will clone the repository and install GuideLLM directly from the main branch.
@@ -51,7 +55,7 @@ This will clone the repository and install GuideLLM directly from the main branc
5155
If you want to install GuideLLM from a specific branch (e.g., `feature-branch`), use the following command:
5256

5357
```bash
54-
pip install git+https://github.com/vllm-project/guidellm.git@feature-branch
58+
pip install git+https://github.com/neuralmagic/guidellm.git@feature-branch
5559
```
5660

5761
Replace `feature-branch` with the name of the branch you want to install.
@@ -84,4 +88,4 @@ This should display the installed version of GuideLLM.
8488

8589
## Troubleshooting
8690

87-
If you encounter any issues during installation, ensure that your Python and pip versions meet the prerequisites. For further assistance, please refer to the [GitHub Issues](https://github.com/vllm-project/guidellm/issues) page or consult the [Documentation](https://github.com/vllm-project/guidellm/tree/main/docs).
91+
If you encounter any issues during installation, ensure that your Python and pip versions meet the prerequisites. For further assistance, please refer to the [GitHub Issues](https://github.com/neuralmagic/guidellm/issues) page or consult the [Documentation](https://github.com/neuralmagic/guidellm/tree/main/docs).

0 commit comments

Comments
 (0)