|
| 1 | +--- |
| 2 | +weight: -4 |
| 3 | +--- |
| 4 | + |
| 5 | +# Analyze Results |
| 6 | + |
| 7 | +After [running a benchmark](benchmark.md), GuideLLM provides comprehensive results that help you understand your LLM deployment's performance. This guide explains how to interpret both console output and file-based results. |
| 8 | + |
| 9 | +## Understanding Console Output |
| 10 | + |
| 11 | +Upon benchmark completion, GuideLLM automatically displays results in the console, divided into three main sections: |
| 12 | + |
| 13 | +### 1. Benchmarks Metadata |
| 14 | + |
| 15 | +This section provides a high-level summary of the benchmark run, including: |
| 16 | + |
| 17 | +- **Server configuration**: Target URL, model name, and backend details |
| 18 | +- **Data configuration**: Data source, token counts, and dataset properties |
| 19 | +- **Profile arguments**: Rate type, maximum duration, request limits, etc. |
| 20 | +- **Extras**: Any additional metadata provided via the `--output-extras` argument |
| 21 | + |
| 22 | +Example: |
| 23 | + |
| 24 | +``` |
| 25 | +Benchmarks Metadata |
| 26 | +------------------ |
| 27 | +Args: {"backend_type": "openai", "target": "http://localhost:8000", "model": "Meta-Llama-3.1-8B-Instruct-quantized", ...} |
| 28 | +Worker: {"type_": "generative", "backend_type": "openai", "backend_args": {"timeout": 120.0, ...}, ...} |
| 29 | +Request Loader: {"type_": "generative", "data_args": {"prompt_tokens": 256, "output_tokens": 128, ...}, ...} |
| 30 | +Extras: {} |
| 31 | +``` |
| 32 | + |
| 33 | +### 2. Benchmarks Info |
| 34 | + |
| 35 | +This section summarizes the key information about each benchmark run, presented as a table with columns: |
| 36 | + |
| 37 | +- **Type**: The benchmark type (e.g., synchronous, constant, poisson, etc.) |
| 38 | +- **Start/End Time**: When the benchmark started and ended |
| 39 | +- **Duration**: Total duration of the benchmark in seconds |
| 40 | +- **Requests**: Count of successful, incomplete, and errored requests |
| 41 | +- **Token Stats**: Average token counts and totals for prompts and outputs |
| 42 | + |
| 43 | +This section helps you understand what was executed and provides a quick overview of the results. |
| 44 | + |
| 45 | +### 3. Benchmarks Stats |
| 46 | + |
| 47 | +This is the most critical section for performance analysis. It displays detailed statistics for each metric: |
| 48 | + |
| 49 | +- **Throughput Metrics**: |
| 50 | + |
| 51 | + - Requests per second (RPS) |
| 52 | + - Request concurrency |
| 53 | + - Output tokens per second |
| 54 | + - Total tokens per second |
| 55 | + |
| 56 | +- **Latency Metrics**: |
| 57 | + |
| 58 | + - Request latency (mean, median, p99) |
| 59 | + - Time to first token (TTFT) (mean, median, p99) |
| 60 | + - Inter-token latency (ITL) (mean, median, p99) |
| 61 | + - Time per output token (mean, median, p99) |
| 62 | + |
| 63 | +The p99 (99th percentile) values are particularly important for SLO analysis, as they represent the worst-case performance for 99% of requests. |
| 64 | + |
| 65 | +## Analyzing the Results File |
| 66 | + |
| 67 | +For deeper analysis, GuideLLM saves detailed results to a file (default: `benchmarks.json`). This file contains all metrics with more comprehensive statistics and individual request data. |
| 68 | + |
| 69 | +### File Formats |
| 70 | + |
| 71 | +GuideLLM supports multiple output formats: |
| 72 | + |
| 73 | +- **JSON**: Complete benchmark data in JSON format (default) |
| 74 | +- **YAML**: Complete benchmark data in human-readable YAML format |
| 75 | +- **CSV**: Summary of key metrics in CSV format |
| 76 | + |
| 77 | +To specify the format, use the `--output-path` argument with the appropriate extension: |
| 78 | + |
| 79 | +```bash |
| 80 | +guidellm benchmark --target "http://localhost:8000" --output-path results.yaml |
| 81 | +``` |
| 82 | + |
| 83 | +### Programmatic Analysis |
| 84 | + |
| 85 | +For custom analysis, you can reload the results into Python: |
| 86 | + |
| 87 | +```python |
| 88 | +from guidellm.benchmark import GenerativeBenchmarksReport |
| 89 | + |
| 90 | +# Load results from file |
| 91 | +report = GenerativeBenchmarksReport.load_file("benchmarks.json") |
| 92 | + |
| 93 | +# Access individual benchmarks |
| 94 | +for benchmark in report.benchmarks: |
| 95 | + # Print basic info |
| 96 | + print(f"Benchmark: {benchmark.id_}") |
| 97 | + print(f"Type: {benchmark.type_}") |
| 98 | + |
| 99 | + # Access metrics |
| 100 | + print(f"Avg RPS: {benchmark.metrics.requests_per_second.successful.mean}") |
| 101 | + print(f"p99 latency: {benchmark.metrics.request_latency.successful.percentiles.p99}") |
| 102 | + print(f"TTFT (p99): {benchmark.metrics.time_to_first_token_ms.successful.percentiles.p99}") |
| 103 | +``` |
| 104 | + |
| 105 | +## Key Performance Indicators |
| 106 | + |
| 107 | +When analyzing your results, focus on these key indicators: |
| 108 | + |
| 109 | +### 1. Throughput and Capacity |
| 110 | + |
| 111 | +- **Maximum RPS**: What's the highest request rate your server can handle? |
| 112 | +- **Concurrency**: How many concurrent requests can your server process? |
| 113 | +- **Token Throughput**: How many tokens per second can your server generate? |
| 114 | + |
| 115 | +### 2. Latency and Responsiveness |
| 116 | + |
| 117 | +- **Time to First Token (TTFT)**: How quickly does the model start generating output? |
| 118 | +- **Inter-Token Latency (ITL)**: How smoothly does the model generate subsequent tokens? |
| 119 | +- **Total Request Latency**: How long do complete requests take end-to-end? |
| 120 | + |
| 121 | +### 3. Reliability and Error Rates |
| 122 | + |
| 123 | +- **Success Rate**: What percentage of requests completes successfully? |
| 124 | +- **Error Distribution**: What types of errors occur and at what rates? |
| 125 | + |
| 126 | +## Additional Analysis Techniques |
| 127 | + |
| 128 | +### Comparing Different Models or Hardware |
| 129 | + |
| 130 | +Run benchmarks with different models or hardware configurations, then compare: |
| 131 | + |
| 132 | +```bash |
| 133 | +guidellm benchmark --target "http://server1:8000" --output-path model1.json |
| 134 | +guidellm benchmark --target "http://server2:8000" --output-path model2.json |
| 135 | +``` |
| 136 | + |
| 137 | +### Cost Optimization |
| 138 | + |
| 139 | +Calculate cost-effectiveness by analyzing: |
| 140 | + |
| 141 | +- Tokens per second per dollar of hardware cost |
| 142 | +- Maximum throughput for different hardware configurations |
| 143 | +- Optimal batch size vs. latency tradeoffs |
| 144 | + |
| 145 | +### Determining Scaling Requirements |
| 146 | + |
| 147 | +Use your benchmark results to plan: |
| 148 | + |
| 149 | +- How many servers you need to handle your expected load |
| 150 | +- When to automatically scale up or down based on demand |
| 151 | +- What hardware provides the best price/performance for your workload |
0 commit comments