Updates to docs folder for new CLI args

markurtz · markurtz · commit b1f5f5f27200 · 2025-11-14T16:08:34.000-05:00
Signed-off-by: Mark Kurtz &lt;mark.kurtz@neuralmagic.com&gt;
diff --git a/docs/examples/index.md b/docs/examples/index.md
@@ -8,12 +8,12 @@ Welcome to the GuideLLM examples section! This area is designed to showcase prac
 
 ## Call for Contributions
 
-Currently, we do not have any specific examples available, but we welcome contributions from the community! If you have examples of how you've used GuideLLM to solve real-world problems or optimize your LLM deployments, we'd love to feature them here.
+Currently, we do not have many specific examples available, but we welcome contributions from the community! If you have examples of how you've used GuideLLM to solve real-world problems or optimize your LLM deployments, we'd love to feature them here.
 
 To contribute an example:
 
-1. Fork the [GuideLLM repository](https://github.com/neuralmagic/guidellm)
-2. Create your example in the `docs/examples/` directory following our [contribution guidelines](https://github.com/neuralmagic/guidellm/blob/main/CONTRIBUTING.md)
+1. Fork the [GuideLLM repository](https://github.com/vllm-project/guidellm)
+2. Create your example in the `docs/examples/` directory following our [contribution guidelines](https://github.com/vllm-project/guidellm/blob/main/CONTRIBUTING.md)
 3. Submit a pull request with your contribution
 
 Your examples will help others leverage GuideLLM more effectively and contribute to the growing knowledge base around LLM deployment optimization.
diff --git a/docs/examples/practice_on_vllm_simulator.md b/docs/examples/practice_on_vllm_simulator.md
@@ -96,7 +96,7 @@ guidellm benchmark \
 --target "http://localhost:8000/" \
 --model "tweet-summary-0" \
 --processor "${local_path}/Qwen2.5-1.5B-Instruct" \
---rate-type sweep \
+--profile sweep \
 --max-seconds 10 \
 --max-requests 10 \
 --data "prompt_tokens=128,output_tokens=56"
diff --git a/docs/getting-started/analyze.md b/docs/getting-started/analyze.md
@@ -64,20 +64,31 @@ The p99 (99th percentile) values are particularly important for SLO analysis, as
 
 ## Analyzing the Results File
 
-For deeper analysis, GuideLLM saves detailed results to a file (default: `benchmarks.json`). This file contains all metrics with more comprehensive statistics and individual request data.
+For deeper analysis, GuideLLM saves detailed results to multiple files by default in your current directory:
+
+- `benchmarks.json`: Complete benchmark data in JSON format
+- `benchmarks.csv`: Summary of key metrics in CSV format
+- `benchmarks.html`: Interactive HTML report with visualizations
 
 ### File Formats
 
-GuideLLM supports multiple output formats:
+GuideLLM supports multiple output formats that can be customized:
+
+- **JSON**: Complete benchmark data in JSON format with full request samples
+- **CSV**: Summary of key metrics in CSV format suitable for spreadsheets
+- **HTML**: Interactive HTML report with tables and visualizations
+- **Console**: Terminal output displayed during execution
+
+To specify which formats to generate, use the `--outputs` argument:
 
-- **JSON**: Complete benchmark data in JSON format (default)
-- **YAML**: Complete benchmark data in human-readable YAML format
-- **CSV**: Summary of key metrics in CSV format
+```bash
+guidellm benchmark --target "http://localhost:8000" --outputs json csv
+```
 
-To specify the format, use the `--output-path` argument with the appropriate extension:
+To change the output directory, use the `--output-dir` argument:
 
 ```bash
-guidellm benchmark --target "http://localhost:8000" --output-path results.yaml
+guidellm benchmark --target "http://localhost:8000" --output-dir results/
 ```
 
 ### Programmatic Analysis
@@ -130,8 +141,8 @@ When analyzing your results, focus on these key indicators:
 Run benchmarks with different models or hardware configurations, then compare:
 
 ```bash
-guidellm benchmark --target "http://server1:8000" --output-path model1.json
-guidellm benchmark --target "http://server2:8000" --output-path model2.json
+guidellm benchmark --target "http://server1:8000" --output-dir model1/
+guidellm benchmark --target "http://server2:8000" --output-dir model2/
 ```
 
 ### Cost Optimization
diff --git a/docs/getting-started/benchmark.md b/docs/getting-started/benchmark.md
@@ -27,7 +27,7 @@ This command:
 - Connects to your vLLM server running at `http://localhost:8000`
 - Uses synthetic data with 256 prompt tokens and 128 output tokens per request
 - Automatically determines the available model on the server
-- Runs a "sweep" benchmark (default) to find optimal performance points
+- Runs a "sweep" profile (default) to find optimal performance points
 
 During the benchmark, you'll see a progress display similar to this:
 
@@ -44,14 +44,15 @@ GuideLLM offers a wide range of configuration options to customize your benchmar
 | `--target`      | URL of the OpenAI-compatible server            | `--target "http://localhost:8000"`             |
 | `--model`       | Model name to benchmark (optional)             | `--model "Meta-Llama-3.1-8B-Instruct"`         |
 | `--data`        | Data configuration for benchmarking            | `--data "prompt_tokens=256,output_tokens=128"` |
-| `--rate-type`   | Type of benchmark to run                       | `--rate-type sweep`                            |
+| `--profile`     | Type of benchmark profile to run               | `--profile sweep`                              |
 | `--rate`        | Request rate or number of benchmarks for sweep | `--rate 10`                                    |
 | `--max-seconds` | Duration for each benchmark in seconds         | `--max-seconds 30`                             |
-| `--output-path` | Output file path and format                    | `--output-path results.json`                   |
+| `--output-dir`  | Directory path to save output files            | `--output-dir results/`                        |
+| `--outputs`     | Output formats to generate                     | `--outputs json csv html`                      |
 
-### Benchmark Types (`--rate-type`)
+### Benchmark Profiles (`--profile`)
 
-GuideLLM supports several benchmark types:
+GuideLLM supports several benchmark profiles and strategies:
 
 - `synchronous`: Runs requests one at a time (sequential)
 - `throughput`: Tests maximum throughput by running requests in parallel
@@ -82,12 +83,12 @@ While synthetic data is convenient for quick tests, you can benchmark with real-
 guidellm benchmark \
   --target "http://localhost:8000" \
   --data "/path/to/your/dataset.json" \
-  --rate-type constant \
+  --profile constant \
   --rate 5
 ```
 
 You can also use datasets from HuggingFace or customize synthetic data generation with additional parameters such as standard deviation, minimum, and maximum values.
 
-By default, complete results are saved to `benchmarks.json` in your current directory. Use the `--output-path` parameter to specify a different location or format.
+By default, complete results are saved to `benchmarks.json`, `benchmarks.csv`, and `benchmarks.html` in your current directory. Use the `--output-dir` parameter to specify a different location and `--outputs` to control which formats are generated.
 
 Learn more about dataset options in the [Datasets documentation](../guides/datasets.md).
diff --git a/docs/getting-started/install.md b/docs/getting-started/install.md
@@ -12,7 +12,7 @@ Before installing GuideLLM, ensure you have the following prerequisites:
 
 - **Operating System:** Linux or MacOS
 
-- **Python Version:** 3.9 – 3.13
+- **Python Version:** 3.10 – 3.13
 
 - **Pip Version:** Ensure you have the latest version of pip installed. You can upgrade pip using the following command:
 
@@ -27,10 +27,10 @@ Before installing GuideLLM, ensure you have the following prerequisites:
 The simplest way to install GuideLLM is via pip from the Python Package Index (PyPI):
 
 ```bash
-pip install guidellm
+pip install guidellm[recommended]
 ```
 
-This will install the latest stable release of GuideLLM.
+This will install the latest stable release of GuideLLM with recommended dependencies.
 
 ### 2. Install a Specific Version from PyPI
 
@@ -45,7 +45,7 @@ pip install guidellm==0.2.0
 To install the latest development version of GuideLLM from the main branch, use the following command:
 
 ```bash
-pip install git+https://github.com/neuralmagic/guidellm.git
+pip install git+https://github.com/vllm-project/guidellm.git
 ```
 
 This will clone the repository and install GuideLLM directly from the main branch.
@@ -55,7 +55,7 @@ This will clone the repository and install GuideLLM directly from the main branc
 If you want to install GuideLLM from a specific branch (e.g., `feature-branch`), use the following command:
 
 ```bash
-pip install git+https://github.com/neuralmagic/guidellm.git@feature-branch
+pip install git+https://github.com/vllm-project/guidellm.git@feature-branch
 ```
 
 Replace `feature-branch` with the name of the branch you want to install.
@@ -88,4 +88,4 @@ This should display the installed version of GuideLLM.
 
 ## Troubleshooting
 
-If you encounter any issues during installation, ensure that your Python and pip versions meet the prerequisites. For further assistance, please refer to the [GitHub Issues](https://github.com/neuralmagic/guidellm/issues) page or consult the [Documentation](https://github.com/neuralmagic/guidellm/tree/main/docs).
+If you encounter any issues during installation, ensure that your Python and pip versions meet the prerequisites. For further assistance, please refer to the [GitHub Issues](https://github.com/vllm-project/guidellm/issues) page or consult the [Documentation](https://github.com/vllm-project/guidellm/tree/main/docs).
diff --git a/docs/guides/backends.md b/docs/guides/backends.md
@@ -42,4 +42,4 @@ For more information on starting a TGI server, see the [TGI Documentation](https
 
 ## Expanding Backend Support
 
-GuideLLM is an open platform, and we encourage contributions to extend its backend support. Whether it's adding new server implementations, integrating with Python-based backends, or enhancing existing capabilities, your contributions are welcome. For more details on how to contribute, see the [CONTRIBUTING.md](https://github.com/neuralmagic/guidellm/blob/main/CONTRIBUTING.md) file.
+GuideLLM is an open platform, and we encourage contributions to extend its backend support. Whether it's adding new server implementations, integrating with Python-based backends, or enhancing existing capabilities, your contributions are welcome. For more details on how to contribute, see the [CONTRIBUTING.md](https://github.com/vllm-project/guidellm/blob/main/CONTRIBUTING.md) file.
diff --git a/docs/guides/datasets.md b/docs/guides/datasets.md
@@ -22,7 +22,7 @@ The following arguments can be used to configure datasets and their processing:
 ```bash
 guidellm benchmark \
     --target "http://localhost:8000" \
-    --rate-type "throughput" \
+    --profile "throughput" \
     --max-requests 1000 \
     --data "path/to/dataset|dataset_id" \
     --data-args '{"prompt_column": "prompt", "split": "train"}' \
@@ -44,7 +44,7 @@ Synthetic datasets allow you to generate data on the fly with customizable param
 ```bash
 guidellm benchmark \
     --target "http://localhost:8000" \
-    --rate-type "throughput" \
+    --profile "throughput" \
     --max-requests 1000 \
     --data "prompt_tokens=256,output_tokens=128"
 ```
@@ -54,7 +54,7 @@ Or using a JSON string:
 ```bash
 guidellm benchmark \
     --target "http://localhost:8000" \
-    --rate-type "throughput" \
+    --profile "throughput" \
     --max-requests 1000 \
     --data '{"prompt_tokens": 256, "output_tokens": 128}'
 ```
@@ -85,7 +85,7 @@ GuideLLM supports datasets from the Hugging Face Hub or local directories that f
 ```bash
 guidellm benchmark \
     --target "http://localhost:8000" \
-    --rate-type "throughput" \
+    --profile "throughput" \
     --max-requests 1000 \
     --data "garage-bAInd/Open-Platypus"
 ```
@@ -95,7 +95,7 @@ Or using a local dataset:
 ```bash
 guidellm benchmark \
     --target "http://localhost:8000" \
-    --rate-type "throughput" \
+    --profile "throughput" \
     --max-requests 1000 \
     --data "path/to/dataset"
 ```
@@ -147,7 +147,7 @@ GuideLLM supports various file formats for datasets, including text, CSV, JSON,
 ```bash
 guidellm benchmark \
     --target "http://localhost:8000" \
-    --rate-type "throughput" \
+    --profile "throughput" \
     --max-requests 1000 \
     --data "path/to/dataset.ext" \
     --data-args '{"prompt_column": "prompt", "split": "train"}'
diff --git a/docs/guides/outputs.md b/docs/guides/outputs.md
@@ -7,7 +7,7 @@ For all of the output formats, `--output-extras` can be used to include addition
 ```bash
 guidellm benchmark \
   --target "http://localhost:8000" \
-  --rate-type sweep \
+  --profile sweep \
   --max-seconds 30 \
   --data "prompt_tokens=256,output_tokens=128" \
   --output-extras '{"tag": "my_tag", "metadata": {"key": "value"}}'
@@ -31,7 +31,7 @@ To disable the progress outputs to the console, use the `disable-progress` flag
 ```bash
 guidellm benchmark \
   --target "http://localhost:8000" \
-  --rate-type sweep \
+  --profile sweep \
   --max-seconds 30 \
   --data "prompt_tokens=256,output_tokens=128" \
   --disable-progress
@@ -42,57 +42,45 @@ To disable console output, use the `--disable-console-outputs` flag when running
 ```bash
 guidellm benchmark \
   --target "http://localhost:8000" \
-  --rate-type sweep \
+  --profile sweep \
   --max-seconds 30 \
   --data "prompt_tokens=256,output_tokens=128" \
   --disable-console-outputs
 ```
 
-### Enabling Extra Information
-
-GuideLLM includes the option to display extra information during the benchmark runs to monitor the overheads and performance of the system. This can be enabled by using the `--display-scheduler-stats` flag when running the `guidellm benchmark` command. For example:
-
-```bash
-guidellm benchmark \
-  --target "http://localhost:8000" \
-  --rate-type sweep \
-  --max-seconds 30 \
-  --data "prompt_tokens=256,output_tokens=128" \
-  --display-scheduler-stats
-```
-
-The above command will display an additional row for each benchmark within the progress output, showing the scheduler overheads and other relevant information.
-
 ## File-Based Outputs
 
 GuideLLM supports saving benchmark results to files in various formats, including JSON, YAML, and CSV. These files can be used for further analysis, reporting, or reloading into Python for detailed exploration.
 
 ### Supported File Formats
 
 1. **JSON**: Contains all benchmark results, including full statistics and request data. This format is ideal for reloading into Python for in-depth analysis.
-2. **YAML**: Similar to JSON, YAML files include all benchmark results and are human-readable.
-3. **CSV**: Provides a summary of the benchmark data, focusing on key metrics and statistics. Note that CSV does not include detailed request-level data.
+2. **CSV**: Provides a summary of the benchmark data, focusing on key metrics and statistics. Note that CSV does not include detailed request-level data.
+3. **HTML**: Interactive HTML report with tables and visualizations of benchmark results.
+4. **Console**: Terminal output displayed during execution (can be disabled).
 
 ### Configuring File Outputs
 
-- **Output Path**: Use the `--output-path` argument to specify the file path or directory for saving the results. If a directory is provided, the results will be saved as `benchmarks.json` by default. The file type is determined by the file extension (e.g., `.json`, `.yaml`, `.csv`).
-- **Sampling**: To limit the size of the output files, you can configure sampling options for the dataset using the `--output-sampling` argument.
+- **Output Directory**: Use the `--output-dir` argument to specify the directory for saving the results. By default, files are saved in the current directory.
+- **Output Formats**: Use the `--outputs` argument to specify which formats to generate. By default, JSON, CSV, and HTML are generated.
+- **Sampling**: To limit the size of the output files and number of detailed request samples included, you can configure sampling options using the `--sample-requests` argument.
 
-Example command to save results in YAML format:
+Example command to save results in specific formats:
 
 ```bash
 guidellm benchmark \
   --target "http://localhost:8000" \
-  --rate-type sweep \
+  --profile sweep \
   --max-seconds 30 \
   --data "prompt_tokens=256,output_tokens=128" \
-  --output-path "results/benchmarks.csv" \
-  --output-sampling 20
+  --output-dir "results/" \
+  --outputs json csv \
+  --sample-requests 20
 ```
 
 ### Reloading Results
 
-JSON and YAML files can be reloaded into Python for further analysis using the `GenerativeBenchmarksReport` class. Below is a sample code snippet for reloading results:
+JSON files can be reloaded into Python for further analysis using the `GenerativeBenchmarksReport` class. Below is a sample code snippet for reloading results:
 
 ```python
 from guidellm.benchmark import GenerativeBenchmarksReport
@@ -106,4 +94,4 @@ for benchmark in benchmarks:
     print(benchmark.id_)
 ```
 
-For more details on the `GenerativeBenchmarksReport` class and its methods, refer to the [source code](https://github.com/neuralmagic/guidellm/blob/main/src/guidellm/benchmark/output.py).
+For more details on the `GenerativeBenchmarksReport` class and its methods, refer to the [source code](https://github.com/vllm-project/guidellm/blob/main/src/guidellm/benchmark/schemas/generative/reports.py).
diff --git a/docs/index.md b/docs/index.md
@@ -8,17 +8,18 @@
 </p>
 
 <h3 align="center">
-Scale Efficiently: Evaluate and Optimize Your LLM Deployments for Real-World Inference
+SLO-Aware Benchmarking and Evaluation Platform for Optimizing Real-World LLM Inference
 </h3>
 
-**GuideLLM** is a platform for evaluating and optimizing the deployment of large language models (LLMs). By simulating real-world inference workloads, GuideLLM enables users to assess the performance, resource requirements, and cost implications of deploying LLMs on various hardware configurations. This approach ensures efficient, scalable, and cost-effective LLM inference serving while maintaining high service quality.
+**GuideLLM** is a platform for evaluating how language models perform under real workloads and configurations. It simulates end-to-end interactions with OpenAI-compatible and vLLM-native servers, generates workload patterns that reflect production usage, and produces detailed reports that help teams understand system behavior, resource needs, and operational limits. GuideLLM supports real and synthetic datasets, multimodal inputs, and flexible execution profiles, giving engineering and ML teams a consistent framework for assessing model behavior, tuning deployments, and planning capacity as their systems evolve.
 
 ## Key Features
 
-- **Performance Evaluation:** Analyze LLM inference under different load scenarios to ensure your system meets your service level objectives (SLOs).
-- **Resource Optimization:** Determine the most suitable hardware configurations for running your models effectively.
-- **Cost Estimation:** Understand the financial impact of different deployment strategies and make informed decisions to minimize costs.
-- **Scalability Testing:** Simulate scaling to handle large numbers of concurrent users without performance degradation.
+- **Captures complete latency and token-level statistics for SLO-driven evaluation:** Including full distributions for TTFT, ITL, and end-to-end behavior.
+- **Generates realistic, configurable traffic patterns:** Across synchronous, concurrent, and rate-based modes, including reproducible sweeps to identify safe operating ranges.
+- **Supports both real and synthetic multimodal datasets:** Enabling controlled experiments and production-style evaluations in one framework with support for text, image, audio, and video inputs.
+- **Produces standardized, exportable reports:** For dashboards, analysis, and regression tracking, ensuring consistency across teams and workflows.
+- **Delivers high-throughput, extensible benchmarking:** With multiprocessing, threading, async execution, and a flexible CLI/API for customization or quickstarts.
 
 ## Key Sections
 
diff --git a/src/guidellm/__main__.py b/src/guidellm/__main__.py
@@ -11,7 +11,7 @@
 Example:
 ::
     # Run a benchmark against a model
-    guidellm benchmark run --target http://localhost:8000 --data dataset.json \\
+    guidellm benchmark --target http://localhost:8000 --data dataset.json \\
         --profile sweep
 
     # Preprocess a dataset

Original file line number	Diff line number	Diff line change
`@@ -42,4 +42,4 @@ For more information on starting a TGI server, see the [TGI Documentation](https`
`42`	`42`
`43`	`43`	`## Expanding Backend Support`
`44`	`44`
`45`		`-GuideLLM is an open platform, and we encourage contributions to extend its backend support. Whether it's adding new server implementations, integrating with Python-based backends, or enhancing existing capabilities, your contributions are welcome. For more details on how to contribute, see the [CONTRIBUTING.md](https://github.com/neuralmagic/guidellm/blob/main/CONTRIBUTING.md) file.`
	`45`	`+GuideLLM is an open platform, and we encourage contributions to extend its backend support. Whether it's adding new server implementations, integrating with Python-based backends, or enhancing existing capabilities, your contributions are welcome. For more details on how to contribute, see the [CONTRIBUTING.md](https://github.com/vllm-project/guidellm/blob/main/CONTRIBUTING.md) file.`