vllm-project
diff --git a/‎README.md‎
Lines changed: 144 additions & 139 deletions b/‎README.md‎
Lines changed: 144 additions & 139 deletions
diff --git a/‎docs/developer/index.md‎
Lines changed: 39 additions & 0 deletions b/‎docs/developer/index.md‎
Lines changed: 39 additions & 0 deletions
diff --git a/‎docs/examples/index.md‎
Lines changed: 19 additions & 0 deletions b/‎docs/examples/index.md‎
Lines changed: 19 additions & 0 deletions
diff --git a/‎docs/getting-started/analyze.md‎
Lines changed: 151 additions & 0 deletions b/‎docs/getting-started/analyze.md‎
Lines changed: 151 additions & 0 deletions
diff --git a/‎docs/getting-started/benchmark.md‎
Lines changed: 93 additions & 0 deletions b/‎docs/getting-started/benchmark.md‎
Lines changed: 93 additions & 0 deletions
diff --git a/‎docs/getting-started/index.md‎
Lines changed: 49 additions & 0 deletions b/‎docs/getting-started/index.md‎
Lines changed: 49 additions & 0 deletions
diff --git a/‎docs/install.md‎ renamed to ‎docs/getting-started/install.md‎
Lines changed: 9 additions & 5 deletions b/‎docs/install.md‎ renamed to ‎docs/getting-started/install.md‎
Lines changed: 9 additions & 5 deletions
@@ -0,0 +1,39 @@
+---
+weight: -3
+---
+
+# Developer
+
+Welcome to the Developer section of GuideLLM! This area provides essential resources for developers who want to contribute to or extend GuideLLM. Whether you're interested in fixing bugs, adding new features, improving documentation, or understanding the project's governance, you'll find comprehensive guides to help you get started.
+
+GuideLLM is an open-source project that values community contributions. We maintain high standards for code quality, documentation, and community interactions to ensure that GuideLLM remains a robust, reliable, and user-friendly tool for evaluating and optimizing LLM deployments.
+
+## Developer Resources
+
+<div class="grid cards" markdown>
+
+- :material-handshake:{ .lg .middle } Code of Conduct
+
+  ______________________________________________________________________
+
+  Our community guidelines ensure that participation in the GuideLLM project is a positive, inclusive, and respectful experience for everyone.
+
+  [:octicons-arrow-right-24: Code of Conduct](code-of-conduct.md)
+
+- :material-source-pull:{ .lg .middle } Contributing Guide
+
+  ______________________________________________________________________
+
+  Learn how to effectively contribute to GuideLLM, including reporting bugs, suggesting features, improving documentation, and submitting code.
+
+  [:octicons-arrow-right-24: Contributing Guide](contributing.md)
+
+- :material-tools:{ .lg .middle } Development Guide
+
+  ______________________________________________________________________
+
+  Detailed instructions for setting up your development environment, implementing changes, and adhering to the project's coding standards and best practices.
+
+  [:octicons-arrow-right-24: Development Guide](developing.md)
+
+</div>
@@ -0,0 +1,19 @@
+---
+weight: -4
+---
+
+# Examples
+
+Welcome to the GuideLLM examples section! This area is designed to showcase practical applications of GuideLLM for evaluating and optimizing LLM deployments in various real-world scenarios. Our goal is to provide you with concrete examples that demonstrate how to use GuideLLM effectively in your own workflows.
+
+## Call for Contributions
+
+Currently, we do not have any specific examples available, but we welcome contributions from the community! If you have examples of how you've used GuideLLM to solve real-world problems or optimize your LLM deployments, we'd love to feature them here.
+
+To contribute an example:
+
+1. Fork the [GuideLLM repository](https://github.com/neuralmagic/guidellm)
+2. Create your example in the `docs/examples/` directory following our [contribution guidelines](https://github.com/neuralmagic/guidellm/blob/main/CONTRIBUTING.md)
+3. Submit a pull request with your contribution
+
+Your examples will help others leverage GuideLLM more effectively and contribute to the growing knowledge base around LLM deployment optimization.
@@ -0,0 +1,151 @@
+---
+weight: -4
+---
+
+# Analyze Results
+
+After [running a benchmark](benchmark.md), GuideLLM provides comprehensive results that help you understand your LLM deployment's performance. This guide explains how to interpret both console output and file-based results.
+
+## Understanding Console Output
+
+Upon benchmark completion, GuideLLM automatically displays results in the console, divided into three main sections:
+
+### 1. Benchmarks Metadata
+
+This section provides a high-level summary of the benchmark run, including:
+
+- **Server configuration**: Target URL, model name, and backend details
+- **Data configuration**: Data source, token counts, and dataset properties
+- **Profile arguments**: Rate type, maximum duration, request limits, etc.
+- **Extras**: Any additional metadata provided via the `--output-extras` argument
+
+Example:
+
+```
+Benchmarks Metadata
+------------------
+Args:        {"backend_type": "openai", "target": "http://localhost:8000", "model": "Meta-Llama-3.1-8B-Instruct-quantized", ...}
+Worker:      {"type_": "generative", "backend_type": "openai", "backend_args": {"timeout": 120.0, ...}, ...}
+Request Loader: {"type_": "generative", "data_args": {"prompt_tokens": 256, "output_tokens": 128, ...}, ...}
+Extras:      {}
+```
+
+### 2. Benchmarks Info
+
+This section summarizes the key information about each benchmark run, presented as a table with columns:
+
+- **Type**: The benchmark type (e.g., synchronous, constant, poisson, etc.)
+- **Start/End Time**: When the benchmark started and ended
+- **Duration**: Total duration of the benchmark in seconds
+- **Requests**: Count of successful, incomplete, and errored requests
+- **Token Stats**: Average token counts and totals for prompts and outputs
+
+This section helps you understand what was executed and provides a quick overview of the results.
+
+### 3. Benchmarks Stats
+
+This is the most critical section for performance analysis. It displays detailed statistics for each metric:
+
+- **Throughput Metrics**:
+
+  - Requests per second (RPS)
+  - Request concurrency
+  - Output tokens per second
+  - Total tokens per second
+
+- **Latency Metrics**:
+
+  - Request latency (mean, median, p99)
+  - Time to first token (TTFT) (mean, median, p99)
+  - Inter-token latency (ITL) (mean, median, p99)
+  - Time per output token (mean, median, p99)
+
+The p99 (99th percentile) values are particularly important for SLO analysis, as they represent the worst-case performance for 99% of requests.
+
+## Analyzing the Results File
+
+For deeper analysis, GuideLLM saves detailed results to a file (default: `benchmarks.json`). This file contains all metrics with more comprehensive statistics and individual request data.
+
+### File Formats
+
+GuideLLM supports multiple output formats:
+
+- **JSON**: Complete benchmark data in JSON format (default)
+- **YAML**: Complete benchmark data in human-readable YAML format
+- **CSV**: Summary of key metrics in CSV format
+
+To specify the format, use the `--output-path` argument with the appropriate extension:
+
+```bash
+guidellm benchmark --target "http://localhost:8000" --output-path results.yaml
+```
+
+### Programmatic Analysis
+
+For custom analysis, you can reload the results into Python:
+
+```python
+from guidellm.benchmark import GenerativeBenchmarksReport
+
+# Load results from file
+report = GenerativeBenchmarksReport.load_file("benchmarks.json")
+
+# Access individual benchmarks
+for benchmark in report.benchmarks:
+    # Print basic info
+    print(f"Benchmark: {benchmark.id_}")
+    print(f"Type: {benchmark.type_}")
+
+    # Access metrics
+    print(f"Avg RPS: {benchmark.metrics.requests_per_second.successful.mean}")
+    print(f"p99 latency: {benchmark.metrics.request_latency.successful.percentiles.p99}")
+    print(f"TTFT (p99): {benchmark.metrics.time_to_first_token_ms.successful.percentiles.p99}")
+```
+
+## Key Performance Indicators
+
+When analyzing your results, focus on these key indicators:
+
+### 1. Throughput and Capacity
+
+- **Maximum RPS**: What's the highest request rate your server can handle?
+- **Concurrency**: How many concurrent requests can your server process?
+- **Token Throughput**: How many tokens per second can your server generate?
+
+### 2. Latency and Responsiveness
+
+- **Time to First Token (TTFT)**: How quickly does the model start generating output?
+- **Inter-Token Latency (ITL)**: How smoothly does the model generate subsequent tokens?
+- **Total Request Latency**: How long do complete requests take end-to-end?
+
+### 3. Reliability and Error Rates
+
+- **Success Rate**: What percentage of requests completes successfully?
+- **Error Distribution**: What types of errors occur and at what rates?
+
+## Additional Analysis Techniques
+
+### Comparing Different Models or Hardware
+
+Run benchmarks with different models or hardware configurations, then compare:
+
+```bash
+guidellm benchmark --target "http://server1:8000" --output-path model1.json
+guidellm benchmark --target "http://server2:8000" --output-path model2.json
+```
+
+### Cost Optimization
+
+Calculate cost-effectiveness by analyzing:
+
+- Tokens per second per dollar of hardware cost
+- Maximum throughput for different hardware configurations
+- Optimal batch size vs. latency tradeoffs
+
+### Determining Scaling Requirements
+
+Use your benchmark results to plan:
+
+- How many servers you need to handle your expected load
+- When to automatically scale up or down based on demand
+- What hardware provides the best price/performance for your workload
@@ -0,0 +1,93 @@
+---
+weight: -6
+---
+
+# Run a Benchmark
+
+After [installing GuideLLM](install.md) and [starting a server](server.md), you're ready to run benchmarks to evaluate your LLM deployment's performance.
+
+Running a GuideLLM benchmark is straightforward. The basic command structure is:
+
+```bash
+guidellm benchmark --target <server-url> [options]
+```
+
+### Basic Example
+
+To run a benchmark against your local vLLM server with default settings:
+
+```bash
+guidellm benchmark \
+  --target "http://localhost:8000" \
+  --data "prompt_tokens=256,output_tokens=128"
+```
+
+This command:
+
+- Connects to your vLLM server running at `http://localhost:8000`
+- Uses synthetic data with 256 prompt tokens and 128 output tokens per request
+- Automatically determines the available model on the server
+- Runs a "sweep" benchmark (default) to find optimal performance points
+
+During the benchmark, you'll see a progress display similar to this:
+
+![Benchmark Progress](../assets/sample-benchmarks.gif)
+
+## Understanding Benchmark Options
+
+GuideLLM offers a wide range of configuration options to customize your benchmarks. Here are the most important parameters you should know:
+
+### Key Parameters
+
+| Parameter       | Description                                    | Example                                        |
+| --------------- | ---------------------------------------------- | ---------------------------------------------- |
+| `--target`      | URL of the OpenAI-compatible server            | `--target "http://localhost:8000"`             |
+| `--model`       | Model name to benchmark (optional)             | `--model "Meta-Llama-3.1-8B-Instruct"`         |
+| `--data`        | Data configuration for benchmarking            | `--data "prompt_tokens=256,output_tokens=128"` |
+| `--rate-type`   | Type of benchmark to run                       | `--rate-type sweep`                            |
+| `--rate`        | Request rate or number of benchmarks for sweep | `--rate 10`                                    |
+| `--max-seconds` | Duration for each benchmark in seconds         | `--max-seconds 30`                             |
+| `--output-path` | Output file path and format                    | `--output-path results.json`                   |
+
+### Benchmark Types (`--rate-type`)
+
+GuideLLM supports several benchmark types:
+
+- `synchronous`: Runs requests one at a time (sequential)
+- `throughput`: Tests maximum throughput by running requests in parallel
+- `concurrent`: Runs a fixed number of parallel request streams
+- `constant`: Sends requests at a fixed rate per second
+- `poisson`: Sends requests following a Poisson distribution
+- `sweep`: Automatically determines optimal performance points (default)
+
+### Data Options
+
+For synthetic data, you can customize:
+
+- `prompt_tokens`: Average number of tokens for prompts
+- `output_tokens`: Average number of tokens for outputs
+- `samples`: Number of samples to generate (default: 1000)
+
+For a complete list of options, run:
+
+```bash
+guidellm benchmark --help
+```
+
+## Working with Real Data
+
+While synthetic data is convenient for quick tests, you can benchmark with real-world data:
+
+```bash
+guidellm benchmark \
+  --target "http://localhost:8000" \
+  --data "/path/to/your/dataset.json" \
+  --rate-type constant \
+  --rate 5
+```
+
+You can also use datasets from HuggingFace or customize synthetic data generation with additional parameters such as standard deviation, minimum, and maximum values.
+
+By default, complete results are saved to `benchmarks.json` in your current directory. Use the `--output-path` parameter to specify a different location or format.
+
+Learn more about dataset options in the [Datasets documentation](../guides/datasets.md).
@@ -0,0 +1,49 @@
+---
+weight: -10
+---
+
+# Getting Started
+
+Welcome to GuideLLM! This section will guide you through the process of installing the tool, setting up your benchmarking environment, running your first benchmark, and analyzing the results to optimize your LLM deployment for real-world inference workloads.
+
+GuideLLM makes it simple to evaluate and optimize your large language model deployments, helping you find the perfect balance between performance, resource utilization, and cost-effectiveness.
+
+## Quick Start Guides
+
+Follow the guides below in sequence to get the most out of GuideLLM and optimize your LLM deployments for production use.
+
+<div class="grid cards" markdown>
+
+- :material-package-variant:{ .lg .middle } Installation
+
+  ______________________________________________________________________
+
+  Learn how to install GuideLLM using pip, from source, or with specific version requirements.
+
+  [:octicons-arrow-right-24: Installation Guide](install.md)
+
+- :material-server:{ .lg .middle } Start a Server
+
+  ______________________________________________________________________
+
+  Set up an OpenAI-compatible server using vLLM or other supported backends to benchmark your LLM deployments.
+
+  [:octicons-arrow-right-24: Server Setup Guide](server.md)
+
+- :material-speedometer:{ .lg .middle } Run Benchmarks
+
+  ______________________________________________________________________
+
+  Learn how to configure and run performance benchmarks against your LLM server under various load conditions.
+
+  [:octicons-arrow-right-24: Benchmarking Guide](benchmark.md)
+
+- :material-chart-bar:{ .lg .middle } Analyze Results
+
+  ______________________________________________________________________
+
+  Interpret benchmark results to understand throughput, latency, reliability, and optimize your deployments.
+
+  [:octicons-arrow-right-24: Analysis Guide](analyze.md)
+
+</div>
@@ -1,4 +1,8 @@
-# Installation Guide for GuideLLM
+---
+weight: -10
+---
+
+# Install
 
 GuideLLM can be installed using several methods depending on your requirements. Below are the detailed instructions for each installation pathway.
 
@@ -8,7 +12,7 @@ Before installing GuideLLM, ensure you have the following prerequisites:
 
 - **Operating System:** Linux or MacOS
 
-- **Python Version:** 3.10 – 3.13
+- **Python Version:** 3.9 – 3.13
 
 - **Pip Version:** Ensure you have the latest version of pip installed. You can upgrade pip using the following command:
 
@@ -41,7 +45,7 @@ pip install guidellm==0.2.0
 To install the latest development version of GuideLLM from the main branch, use the following command:
 
 ```bash
-pip install git+https://github.com/vllm-project/guidellm.git
+pip install git+https://github.com/neuralmagic/guidellm.git
 ```
 
 This will clone the repository and install GuideLLM directly from the main branch.
@@ -51,7 +55,7 @@ This will clone the repository and install GuideLLM directly from the main branc
 If you want to install GuideLLM from a specific branch (e.g., `feature-branch`), use the following command:
 
 ```bash
-pip install git+https://github.com/vllm-project/guidellm.git@feature-branch
+pip install git+https://github.com/neuralmagic/guidellm.git@feature-branch
 ```
 
 Replace `feature-branch` with the name of the branch you want to install.
@@ -84,4 +88,4 @@ This should display the installed version of GuideLLM.
 
 ## Troubleshooting
 
-If you encounter any issues during installation, ensure that your Python and pip versions meet the prerequisites. For further assistance, please refer to the [GitHub Issues](https://github.com/vllm-project/guidellm/issues) page or consult the [Documentation](https://github.com/vllm-project/guidellm/tree/main/docs).
+If you encounter any issues during installation, ensure that your Python and pip versions meet the prerequisites. For further assistance, please refer to the [GitHub Issues](https://github.com/neuralmagic/guidellm/issues) page or consult the [Documentation](https://github.com/neuralmagic/guidellm/tree/main/docs).