You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+7-5Lines changed: 7 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -22,7 +22,7 @@ SLO-aware Benchmarking and Evaluation Platform for Optimizing Real-World LLM Inf
22
22
23
23
**GuideLLM** is a platform for evaluating how language models perform under real workloads and configurations. It simulates end-to-end interactions with OpenAI-compatible and vLLM-native servers, generates workload patterns that reflect production usage, and produces detailed reports that help teams understand system behavior, resource needs, and operational limits. GuideLLM supports real and synthetic datasets, multimodal inputs, and flexible execution profiles, giving engineering and ML teams a consistent framework for assessing model behavior, tuning deployments, and planning capacity as their systems evolve.
24
24
25
-
## Why GuideLLM?
25
+
###Why GuideLLM?
26
26
27
27
GuideLLM gives teams a clear picture of performance, efficiency, and reliability when deploying LLMs in production-like environments.
28
28
@@ -144,6 +144,8 @@ The console provides a lightweight summary with high-level statistics for each b
144
144
145
145
This file is the authoritative record of the entire benchmark session. It includes configuration, metadata, per-benchmark statistics, and sample request entries with individual request timings. Use it for debugging, deeper analysis, or loading into Python with `GenerativeBenchmarksReport`.
146
146
147
+
Alternatively, a yaml version of this file can be generated for easier human readability with the same content as `benchmarks.json` using the `--outputs yaml` argument.
148
+
147
149
**benchmarks.csv**
148
150
149
151
This file provides a compact tabular view of each benchmark with the fields most commonly used for reporting—throughput, latency percentiles, token counts, and rate information. It opens cleanly in spreadsheets and BI tools and is well-suited for comparisons across runs.
@@ -158,7 +160,7 @@ GuideLLM supports a wide range of LLM benchmarking workflows. The examples below
158
160
159
161
### Load Patterns
160
162
161
-
Different applications require different traffic shapes. This example demonstrates rate-based load testing using a constant profile at 10 requests per second, running for 20 seconds with synthetic data of 128 prompt tokens and 256 output tokens.
163
+
Simmulating different applications requires different traffic shapes. This example demonstrates rate-based load testing using a constant profile at 10 requests per second, running for 20 seconds with synthetic data of 128 prompt tokens and 256 output tokens.
162
164
163
165
```bash
164
166
guidellm benchmark \
@@ -191,6 +193,7 @@ guidellm benchmark \
191
193
-`--data`: Data source specification - accepts HuggingFace dataset IDs (prefix with `hf:`), local file paths (`.json`, `.csv`, `.jsonl`, `.txt`), or synthetic data configs (JSON object or `key=value` pairs like `prompt_tokens=256,output_tokens=128`)
192
194
-`--data-args`: JSON object of arguments for dataset creation - commonly used to specify column mappings like `prompt_column`, `output_tokens_count_column`, or HuggingFace dataset parameters
193
195
-`--data-samples`: Number of samples to use from the dataset - use `-1` (default) for all samples with dynamic generation, or specify a positive integer to limit sample count
196
+
-`--processor`: Tokenizer or processor name used for generating synthetic data - if not provided and required for the dataset, automatically loads from the model; accepts HuggingFace model IDs or local paths
194
197
195
198
### Request Types and API Targets
196
199
@@ -205,8 +208,7 @@ guidellm benchmark \
205
208
206
209
**Key parameters:**
207
210
208
-
-`--request-type`: Specifies the API endpoint format - options include `chat_completions` (chat API format), `completions` (text completion format), and other OpenAI-compatible request types
209
-
-`--processor`: Tokenizer or processor name for token counting - if not provided, automatically loads from the model; accepts HuggingFace model IDs or local paths
211
+
-`--request-type`: Specifies the API endpoint format - options include `chat_completions` (chat API format), `completions` (text completion format), `audio_transcription` (audio transcription), and `audio_translation` (audio translation).
210
212
211
213
### Using Scenarios
212
214
@@ -236,7 +238,7 @@ guidellm benchmark \
236
238
237
239
**Key parameters:**
238
240
239
-
-`--warmup`: Warm-up specification - values between 0 and 1 represent a percentage of total requests/time, values ≥1 represent absolute request or time counts (interpretation depends on active constraint)
241
+
-`--warmup`: Warm-up specification - values between 0 and 1 represent a percentage of total requests/time, values ≥1 represent absolute request or time units.
240
242
-`--cooldown`: Cool-down specification - same format as warmup, excludes final portion of benchmark from analysis to avoid shutdown effects
241
243
-`--max-seconds`: Maximum duration in seconds for each benchmark before automatic termination
242
244
-`--max-requests`: Maximum number of requests per benchmark before automatic termination
Copy file name to clipboardExpand all lines: docs/guides/outputs.md
+5-4Lines changed: 5 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -55,14 +55,15 @@ GuideLLM supports saving benchmark results to files in various formats, includin
55
55
### Supported File Formats
56
56
57
57
1.**JSON**: Contains all benchmark results, including full statistics and request data. This format is ideal for reloading into Python for in-depth analysis.
58
-
2.**CSV**: Provides a summary of the benchmark data, focusing on key metrics and statistics. Note that CSV does not include detailed request-level data.
59
-
3.**HTML**: Interactive HTML report with tables and visualizations of benchmark results.
60
-
4.**Console**: Terminal output displayed during execution (can be disabled).
58
+
2.**YAML**: Contains all benchmark results, including full statistics and request data, in YAML format which is human-readable and easy to work with in various tools.
59
+
3.**CSV**: Provides a summary of the benchmark data, focusing on key metrics and statistics. Note that CSV does not include detailed request-level data.
60
+
4.**HTML**: Interactive HTML report with tables and visualizations of benchmark results.
61
+
5.**Console**: Terminal output displayed during execution (can be disabled).
61
62
62
63
### Configuring File Outputs
63
64
64
65
-**Output Directory**: Use the `--output-dir` argument to specify the directory for saving the results. By default, files are saved in the current directory.
65
-
-**Output Formats**: Use the `--outputs` argument to specify which formats to generate. By default, JSON, CSV, and HTML are generated.
66
+
-**Output Formats**: Use the `--outputs` argument to specify which formats or exact file names (with supported file extensions, e.g. `benchmarks.json`) to generate. By default, JSON, CSV, and HTML are generated.
66
67
-**Sampling**: To limit the size of the output files and number of detailed request samples included, you can configure sampling options using the `--sample-requests` argument.
67
68
68
69
Example command to save results in specific formats:
0 commit comments