Fixes for docs from reviews and styling/pre-commit fixes

markurtz · markurtz · commit 29054ae6182c · 2025-11-14T16:08:34.000-05:00
Signed-off-by: Mark Kurtz &lt;mark.kurtz@neuralmagic.com&gt;
diff --git a/README.md b/README.md
@@ -22,7 +22,7 @@ SLO-aware Benchmarking and Evaluation Platform for Optimizing Real-World LLM Inf
 
 **GuideLLM** is a platform for evaluating how language models perform under real workloads and configurations. It simulates end-to-end interactions with OpenAI-compatible and vLLM-native servers, generates workload patterns that reflect production usage, and produces detailed reports that help teams understand system behavior, resource needs, and operational limits. GuideLLM supports real and synthetic datasets, multimodal inputs, and flexible execution profiles, giving engineering and ML teams a consistent framework for assessing model behavior, tuning deployments, and planning capacity as their systems evolve.
 
-## Why GuideLLM?
+### Why GuideLLM?
 
 GuideLLM gives teams a clear picture of performance, efficiency, and reliability when deploying LLMs in production-like environments.
 
@@ -144,6 +144,8 @@ The console provides a lightweight summary with high-level statistics for each b
 
 This file is the authoritative record of the entire benchmark session. It includes configuration, metadata, per-benchmark statistics, and sample request entries with individual request timings. Use it for debugging, deeper analysis, or loading into Python with `GenerativeBenchmarksReport`.
 
+Alternatively, a yaml version of this file can be generated for easier human readability with the same content as `benchmarks.json` using the `--outputs yaml` argument.
+
 **benchmarks.csv**
 
 This file provides a compact tabular view of each benchmark with the fields most commonly used for reporting—throughput, latency percentiles, token counts, and rate information. It opens cleanly in spreadsheets and BI tools and is well-suited for comparisons across runs.
@@ -158,7 +160,7 @@ GuideLLM supports a wide range of LLM benchmarking workflows. The examples below
 
 ### Load Patterns
 
-Different applications require different traffic shapes. This example demonstrates rate-based load testing using a constant profile at 10 requests per second, running for 20 seconds with synthetic data of 128 prompt tokens and 256 output tokens.
+Simmulating different applications requires different traffic shapes. This example demonstrates rate-based load testing using a constant profile at 10 requests per second, running for 20 seconds with synthetic data of 128 prompt tokens and 256 output tokens.
 
 ```bash
 guidellm benchmark \
@@ -191,6 +193,7 @@ guidellm benchmark \
 - `--data`: Data source specification - accepts HuggingFace dataset IDs (prefix with `hf:`), local file paths (`.json`, `.csv`, `.jsonl`, `.txt`), or synthetic data configs (JSON object or `key=value` pairs like `prompt_tokens=256,output_tokens=128`)
 - `--data-args`: JSON object of arguments for dataset creation - commonly used to specify column mappings like `prompt_column`, `output_tokens_count_column`, or HuggingFace dataset parameters
 - `--data-samples`: Number of samples to use from the dataset - use `-1` (default) for all samples with dynamic generation, or specify a positive integer to limit sample count
+- `--processor`: Tokenizer or processor name used for generating synthetic data - if not provided and required for the dataset, automatically loads from the model; accepts HuggingFace model IDs or local paths
 
 ### Request Types and API Targets
 
@@ -205,8 +208,7 @@ guidellm benchmark \
 
 **Key parameters:**
 
-- `--request-type`: Specifies the API endpoint format - options include `chat_completions` (chat API format), `completions` (text completion format), and other OpenAI-compatible request types
-- `--processor`: Tokenizer or processor name for token counting - if not provided, automatically loads from the model; accepts HuggingFace model IDs or local paths
+- `--request-type`: Specifies the API endpoint format - options include `chat_completions` (chat API format), `completions` (text completion format), `audio_transcription` (audio transcription), and `audio_translation` (audio translation).
 
 ### Using Scenarios
 
@@ -236,7 +238,7 @@ guidellm benchmark \
 
 **Key parameters:**
 
-- `--warmup`: Warm-up specification - values between 0 and 1 represent a percentage of total requests/time, values ≥1 represent absolute request or time counts (interpretation depends on active constraint)
+- `--warmup`: Warm-up specification - values between 0 and 1 represent a percentage of total requests/time, values ≥1 represent absolute request or time units.
 - `--cooldown`: Cool-down specification - same format as warmup, excludes final portion of benchmark from analysis to avoid shutdown effects
 - `--max-seconds`: Maximum duration in seconds for each benchmark before automatic termination
 - `--max-requests`: Maximum number of requests per benchmark before automatic termination
diff --git a/docs/getting-started/analyze.md b/docs/getting-started/analyze.md
@@ -75,6 +75,7 @@ For deeper analysis, GuideLLM saves detailed results to multiple files by defaul
 GuideLLM supports multiple output formats that can be customized:
 
 - **JSON**: Complete benchmark data in JSON format with full request samples
+- **YAML**: Complete benchmark data in YAML format with full request samples
 - **CSV**: Summary of key metrics in CSV format suitable for spreadsheets
 - **HTML**: Interactive HTML report with tables and visualizations
 - **Console**: Terminal output displayed during execution
@@ -85,6 +86,12 @@ To specify which formats to generate, use the `--outputs` argument:
 guidellm benchmark --target "http://localhost:8000" --outputs json csv
 ```
 
+The `--outputs` argument additionally accepts full file names to further customize/differentiate outputs:
+
+```bash
+guidellm benchmark --target "http://localhost:8000" --outputs results/benchmarks.json results/summary.csv
+```
+
 To change the output directory, use the `--output-dir` argument:
 
 ```bash
diff --git a/docs/getting-started/benchmark.md b/docs/getting-started/benchmark.md
@@ -19,7 +19,8 @@ To run a benchmark against your local vLLM server with default settings:
 ```bash
 guidellm benchmark \
   --target "http://localhost:8000" \
-  --data "prompt_tokens=256,output_tokens=128"
+  --data "prompt_tokens=256,output_tokens=128" \
+  --max-seconds 60
 ```
 
 This command:
@@ -63,7 +64,7 @@ GuideLLM supports several benchmark profiles and strategies:
 
 ### Data Options
 
-For synthetic data, you can customize:
+For synthetic data, some key options include, among others:
 
 - `prompt_tokens`: Average number of tokens for prompts
 - `output_tokens`: Average number of tokens for outputs
@@ -72,7 +73,7 @@ For synthetic data, you can customize:
 For a complete list of options, run:
 
 ```bash
-guidellm benchmark --help
+guidellm benchmark run --help
 ```
 
 ## Working with Real Data
diff --git a/docs/guides/outputs.md b/docs/guides/outputs.md
@@ -55,14 +55,15 @@ GuideLLM supports saving benchmark results to files in various formats, includin
 ### Supported File Formats
 
 1. **JSON**: Contains all benchmark results, including full statistics and request data. This format is ideal for reloading into Python for in-depth analysis.
-2. **CSV**: Provides a summary of the benchmark data, focusing on key metrics and statistics. Note that CSV does not include detailed request-level data.
-3. **HTML**: Interactive HTML report with tables and visualizations of benchmark results.
-4. **Console**: Terminal output displayed during execution (can be disabled).
+2. **YAML**: Contains all benchmark results, including full statistics and request data, in YAML format which is human-readable and easy to work with in various tools.
+3. **CSV**: Provides a summary of the benchmark data, focusing on key metrics and statistics. Note that CSV does not include detailed request-level data.
+4. **HTML**: Interactive HTML report with tables and visualizations of benchmark results.
+5. **Console**: Terminal output displayed during execution (can be disabled).
 
 ### Configuring File Outputs
 
 - **Output Directory**: Use the `--output-dir` argument to specify the directory for saving the results. By default, files are saved in the current directory.
-- **Output Formats**: Use the `--outputs` argument to specify which formats to generate. By default, JSON, CSV, and HTML are generated.
+- **Output Formats**: Use the `--outputs` argument to specify which formats or exact file names (with supported file extensions, e.g. `benchmarks.json`) to generate. By default, JSON, CSV, and HTML are generated.
 - **Sampling**: To limit the size of the output files and number of detailed request samples included, you can configure sampling options using the `--sample-requests` argument.
 
 Example command to save results in specific formats:
diff --git a/mkdocs.yml b/mkdocs.yml
@@ -118,4 +118,3 @@ extra_css:
 extra_javascript:
   - scripts/mathjax.js
   - https://unpkg.com/mathjax@3/es5/tex-mml-chtml.js
-  
diff --git a/src/guidellm/__main__.py b/src/guidellm/__main__.py
@@ -11,7 +11,7 @@
 Example:
 ::
     # Run a benchmark against a model
-    guidellm benchmark --target http://localhost:8000 --data dataset.json \\
+    guidellm benchmark run --target http://localhost:8000 --data dataset.json \\
         --profile sweep
 
     # Preprocess a dataset