vllm-project
diff --git a/‎README.md‎
Lines changed: 145 additions & 138 deletions b/‎README.md‎
Lines changed: 145 additions & 138 deletions
diff --git a/‎docs/developer/index.md‎
Lines changed: 39 additions & 0 deletions b/‎docs/developer/index.md‎
Lines changed: 39 additions & 0 deletions
diff --git a/‎docs/examples/index.md‎
Lines changed: 19 additions & 0 deletions b/‎docs/examples/index.md‎
Lines changed: 19 additions & 0 deletions
diff --git a/‎docs/examples/practice_on_vllm_simulator.md‎
Lines changed: 1 addition & 1 deletion b/‎docs/examples/practice_on_vllm_simulator.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/getting-started/analyze.md‎
Lines changed: 169 additions & 0 deletions b/‎docs/getting-started/analyze.md‎
Lines changed: 169 additions & 0 deletions
diff --git a/‎docs/getting-started/benchmark.md‎
Lines changed: 95 additions & 0 deletions b/‎docs/getting-started/benchmark.md‎
Lines changed: 95 additions & 0 deletions
diff --git a/‎docs/getting-started/index.md‎
Lines changed: 49 additions & 0 deletions b/‎docs/getting-started/index.md‎
Lines changed: 49 additions & 0 deletions
@@ -0,0 +1,39 @@
+---
+weight: -3
+---
+
+# Developer
+
+Welcome to the Developer section of GuideLLM! This area provides essential resources for developers who want to contribute to or extend GuideLLM. Whether you're interested in fixing bugs, adding new features, improving documentation, or understanding the project's governance, you'll find comprehensive guides to help you get started.
+
+GuideLLM is an open-source project that values community contributions. We maintain high standards for code quality, documentation, and community interactions to ensure that GuideLLM remains a robust, reliable, and user-friendly tool for evaluating and optimizing LLM deployments.
+
+## Developer Resources
+
+<div class="grid cards" markdown>
+
+- :material-handshake:{ .lg .middle } Code of Conduct
+
+  ______________________________________________________________________
+
+  Our community guidelines ensure that participation in the GuideLLM project is a positive, inclusive, and respectful experience for everyone.
+
+  [:octicons-arrow-right-24: Code of Conduct](code-of-conduct.md)
+
+- :material-source-pull:{ .lg .middle } Contributing Guide
+
+  ______________________________________________________________________
+
+  Learn how to effectively contribute to GuideLLM, including reporting bugs, suggesting features, improving documentation, and submitting code.
+
+  [:octicons-arrow-right-24: Contributing Guide](contributing.md)
+
+- :material-tools:{ .lg .middle } Development Guide
+
+  ______________________________________________________________________
+
+  Detailed instructions for setting up your development environment, implementing changes, and adhering to the project's coding standards and best practices.
+
+  [:octicons-arrow-right-24: Development Guide](developing.md)
+
+</div>
@@ -0,0 +1,19 @@
+---
+weight: -4
+---
+
+# Examples
+
+Welcome to the GuideLLM examples section! This area is designed to showcase practical applications of GuideLLM for evaluating and optimizing LLM deployments in various real-world scenarios. Our goal is to provide you with concrete examples that demonstrate how to use GuideLLM effectively in your own workflows.
+
+## Call for Contributions
+
+Currently, we do not have many specific examples available, but we welcome contributions from the community! If you have examples of how you've used GuideLLM to solve real-world problems or optimize your LLM deployments, we'd love to feature them here.
+
+To contribute an example:
+
+1. Fork the [GuideLLM repository](https://github.com/vllm-project/guidellm)
+2. Create your example in the `docs/examples/` directory following our [contribution guidelines](https://github.com/vllm-project/guidellm/blob/main/CONTRIBUTING.md)
+3. Submit a pull request with your contribution
+
+Your examples will help others leverage GuideLLM more effectively and contribute to the growing knowledge base around LLM deployment optimization.
@@ -96,7 +96,7 @@ guidellm benchmark \
 --target "http://localhost:8000/" \
 --model "tweet-summary-0" \
 --processor "${local_path}/Qwen2.5-1.5B-Instruct" \
---rate-type sweep \
+--profile sweep \
 --max-seconds 10 \
 --max-requests 10 \
 --data "prompt_tokens=128,output_tokens=56"
 
@@ -0,0 +1,169 @@
+---
+weight: -4
+---
+
+# Analyze Results
+
+After [running a benchmark](benchmark.md), GuideLLM provides comprehensive results that help you understand your LLM deployment's performance. This guide explains how to interpret both console output and file-based results.
+
+## Understanding Console Output
+
+Upon benchmark completion, GuideLLM automatically displays results in the console, divided into three main sections:
+
+### 1. Benchmarks Metadata
+
+This section provides a high-level summary of the benchmark run, including:
+
+- **Server configuration**: Target URL, model name, and backend details
+- **Data configuration**: Data source, token counts, and dataset properties
+- **Profile arguments**: Rate type, maximum duration, request limits, etc.
+- **Extras**: Any additional metadata provided via the `--output-extras` argument
+
+Example:
+
+```
+Benchmarks Metadata
+------------------
+Args:        {"backend_type": "openai", "target": "http://localhost:8000", "model": "Meta-Llama-3.1-8B-Instruct-quantized", ...}
+Worker:      {"type_": "generative", "backend_type": "openai", "backend_args": {"timeout": 120.0, ...}, ...}
+Request Loader: {"type_": "generative", "data_args": {"prompt_tokens": 256, "output_tokens": 128, ...}, ...}
+Extras:      {}
+```
+
+### 2. Benchmarks Info
+
+This section summarizes the key information about each benchmark run, presented as a table with columns:
+
+- **Type**: The benchmark type (e.g., synchronous, constant, poisson, etc.)
+- **Start/End Time**: When the benchmark started and ended
+- **Duration**: Total duration of the benchmark in seconds
+- **Requests**: Count of successful, incomplete, and errored requests
+- **Token Stats**: Average token counts and totals for prompts and outputs
+
+This section helps you understand what was executed and provides a quick overview of the results.
+
+### 3. Benchmarks Stats
+
+This is the most critical section for performance analysis. It displays detailed statistics for each metric:
+
+- **Throughput Metrics**:
+
+  - Requests per second (RPS)
+  - Request concurrency
+  - Output tokens per second
+  - Total tokens per second
+
+- **Latency Metrics**:
+
+  - Request latency (mean, median, p99)
+  - Time to first token (TTFT) (mean, median, p99)
+  - Inter-token latency (ITL) (mean, median, p99)
+  - Time per output token (mean, median, p99)
+
+The p99 (99th percentile) values are particularly important for SLO analysis, as they represent the worst-case performance for 99% of requests.
+
+## Analyzing the Results File
+
+For deeper analysis, GuideLLM saves detailed results to multiple files by default in your current directory:
+
+- `benchmarks.json`: Complete benchmark data in JSON format
+- `benchmarks.csv`: Summary of key metrics in CSV format
+- `benchmarks.html`: Interactive HTML report with visualizations
+
+### File Formats
+
+GuideLLM supports multiple output formats that can be customized:
+
+- **JSON**: Complete benchmark data in JSON format with full request samples
+- **YAML**: Complete benchmark data in YAML format with full request samples
+- **CSV**: Summary of key metrics in CSV format suitable for spreadsheets
+- **HTML**: Interactive HTML report with tables and visualizations
+- **Console**: Terminal output displayed during execution
+
+To specify which formats to generate, use the `--outputs` argument:
+
+```bash
+guidellm benchmark --target "http://localhost:8000" --outputs json csv
+```
+
+The `--outputs` argument additionally accepts full file names to further customize/differentiate outputs:
+
+```bash
+guidellm benchmark --target "http://localhost:8000" --outputs results/benchmarks.json results/summary.csv
+```
+
+To change the output directory, use the `--output-dir` argument:
+
+```bash
+guidellm benchmark --target "http://localhost:8000" --output-dir results/
+```
+
+### Programmatic Analysis
+
+For custom analysis, you can reload the results into Python:
+
+```python
+from guidellm.benchmark import GenerativeBenchmarksReport
+
+# Load results from file
+report = GenerativeBenchmarksReport.load_file("benchmarks.json")
+
+# Access individual benchmarks
+for benchmark in report.benchmarks:
+    # Print basic info
+    print(f"Benchmark: {benchmark.id_}")
+    print(f"Type: {benchmark.type_}")
+
+    # Access metrics
+    print(f"Avg RPS: {benchmark.metrics.requests_per_second.successful.mean}")
+    print(f"p99 latency: {benchmark.metrics.request_latency.successful.percentiles.p99}")
+    print(f"TTFT (p99): {benchmark.metrics.time_to_first_token_ms.successful.percentiles.p99}")
+```
+
+## Key Performance Indicators
+
+When analyzing your results, focus on these key indicators:
+
+### 1. Throughput and Capacity
+
+- **Maximum RPS**: What's the highest request rate your server can handle?
+- **Concurrency**: How many concurrent requests can your server process?
+- **Token Throughput**: How many tokens per second can your server generate?
+
+### 2. Latency and Responsiveness
+
+- **Time to First Token (TTFT)**: How quickly does the model start generating output?
+- **Inter-Token Latency (ITL)**: How smoothly does the model generate subsequent tokens?
+- **Total Request Latency**: How long do complete requests take end-to-end?
+
+### 3. Reliability and Error Rates
+
+- **Success Rate**: What percentage of requests completes successfully?
+- **Error Distribution**: What types of errors occur and at what rates?
+
+## Additional Analysis Techniques
+
+### Comparing Different Models or Hardware
+
+Run benchmarks with different models or hardware configurations, then compare:
+
+```bash
+guidellm benchmark --target "http://server1:8000" --output-dir model1/
+guidellm benchmark --target "http://server2:8000" --output-dir model2/
+```
+
+### Cost Optimization
+
+Calculate cost-effectiveness by analyzing:
+
+- Tokens per second per dollar of hardware cost
+- Maximum throughput for different hardware configurations
+- Optimal batch size vs. latency tradeoffs
+
+### Determining Scaling Requirements
+
+Use your benchmark results to plan:
+
+- How many servers you need to handle your expected load
+- When to automatically scale up or down based on demand
+- What hardware provides the best price/performance for your workload
@@ -0,0 +1,95 @@
+---
+weight: -6
+---
+
+# Run a Benchmark
+
+After [installing GuideLLM](install.md) and [starting a server](server.md), you're ready to run benchmarks to evaluate your LLM deployment's performance.
+
+Running a GuideLLM benchmark is straightforward. The basic command structure is:
+
+```bash
+guidellm benchmark --target <server-url> [options]
+```
+
+### Basic Example
+
+To run a benchmark against your local vLLM server with default settings:
+
+```bash
+guidellm benchmark \
+  --target "http://localhost:8000" \
+  --data "prompt_tokens=256,output_tokens=128" \
+  --max-seconds 60
+```
+
+This command:
+
+- Connects to your vLLM server running at `http://localhost:8000`
+- Uses synthetic data with 256 prompt tokens and 128 output tokens per request
+- Automatically determines the available model on the server
+- Runs a "sweep" profile (default) to find optimal performance points
+
+During the benchmark, you'll see a progress display similar to this:
+
+![Benchmark Progress](../assets/sample-benchmarks.gif)
+
+## Understanding Benchmark Options
+
+GuideLLM offers a wide range of configuration options to customize your benchmarks. Here are the most important parameters you should know:
+
+### Key Parameters
+
+| Parameter       | Description                                    | Example                                        |
+| --------------- | ---------------------------------------------- | ---------------------------------------------- |
+| `--target`      | URL of the OpenAI-compatible server            | `--target "http://localhost:8000"`             |
+| `--model`       | Model name to benchmark (optional)             | `--model "Meta-Llama-3.1-8B-Instruct"`         |
+| `--data`        | Data configuration for benchmarking            | `--data "prompt_tokens=256,output_tokens=128"` |
+| `--profile`     | Type of benchmark profile to run               | `--profile sweep`                              |
+| `--rate`        | Request rate or number of benchmarks for sweep | `--rate 10`                                    |
+| `--max-seconds` | Duration for each benchmark in seconds         | `--max-seconds 30`                             |
+| `--output-dir`  | Directory path to save output files            | `--output-dir results/`                        |
+| `--outputs`     | Output formats to generate                     | `--outputs json csv html`                      |
+
+### Benchmark Profiles (`--profile`)
+
+GuideLLM supports several benchmark profiles and strategies:
+
+- `synchronous`: Runs requests one at a time (sequential)
+- `throughput`: Tests maximum throughput by running requests in parallel
+- `concurrent`: Runs a fixed number of parallel request streams
+- `constant`: Sends requests at a fixed rate per second
+- `poisson`: Sends requests following a Poisson distribution
+- `sweep`: Automatically determines optimal performance points (default)
+
+### Data Options
+
+For synthetic data, some key options include, among others:
+
+- `prompt_tokens`: Average number of tokens for prompts
+- `output_tokens`: Average number of tokens for outputs
+- `samples`: Number of samples to generate (default: 1000)
+
+For a complete list of options, run:
+
+```bash
+guidellm benchmark run --help
+```
+
+## Working with Real Data
+
+While synthetic data is convenient for quick tests, you can benchmark with real-world data:
+
+```bash
+guidellm benchmark \
+  --target "http://localhost:8000" \
+  --data "/path/to/your/dataset.json" \
+  --profile constant \
+  --rate 5
+```
+
+You can also use datasets from HuggingFace or customize synthetic data generation with additional parameters such as standard deviation, minimum, and maximum values.
+
+By default, complete results are saved to `benchmarks.json`, `benchmarks.csv`, and `benchmarks.html` in your current directory. Use the `--output-dir` parameter to specify a different location and `--outputs` to control which formats are generated.
+
+Learn more about dataset options in the [Datasets documentation](../guides/datasets.md).
@@ -0,0 +1,49 @@
+---
+weight: -10
+---
+
+# Getting Started
+
+Welcome to GuideLLM! This section will guide you through the process of installing the tool, setting up your benchmarking environment, running your first benchmark, and analyzing the results to optimize your LLM deployment for real-world inference workloads.
+
+GuideLLM makes it simple to evaluate and optimize your large language model deployments, helping you find the perfect balance between performance, resource utilization, and cost-effectiveness.
+
+## Quick Start Guides
+
+Follow the guides below in sequence to get the most out of GuideLLM and optimize your LLM deployments for production use.
+
+<div class="grid cards" markdown>
+
+- :material-package-variant:{ .lg .middle } Installation
+
+  ______________________________________________________________________
+
+  Learn how to install GuideLLM using pip, from source, or with specific version requirements.
+
+  [:octicons-arrow-right-24: Installation Guide](install.md)
+
+- :material-server:{ .lg .middle } Start a Server
+
+  ______________________________________________________________________
+
+  Set up an OpenAI-compatible server using vLLM or other supported backends to benchmark your LLM deployments.
+
+  [:octicons-arrow-right-24: Server Setup Guide](server.md)
+
+- :material-speedometer:{ .lg .middle } Run Benchmarks
+
+  ______________________________________________________________________
+
+  Learn how to configure and run performance benchmarks against your LLM server under various load conditions.
+
+  [:octicons-arrow-right-24: Benchmarking Guide](benchmark.md)
+
+- :material-chart-bar:{ .lg .middle } Analyze Results
+
+  ______________________________________________________________________
+
+  Interpret benchmark results to understand throughput, latency, reliability, and optimize your deployments.
+
+  [:octicons-arrow-right-24: Analysis Guide](analyze.md)
+
+</div>