|
| 1 | +--- |
| 2 | +weight: -4 |
| 3 | +--- |
| 4 | + |
| 5 | +# Analyze Results |
| 6 | + |
| 7 | +After [running a benchmark](benchmark.md), GuideLLM provides comprehensive results that help you understand your LLM deployment's performance. This guide explains how to interpret both console output and file-based results. |
| 8 | + |
| 9 | +## Understanding Console Output |
| 10 | + |
| 11 | +Upon benchmark completion, GuideLLM automatically displays results in the console, divided into three main sections: |
| 12 | + |
| 13 | +### 1. Benchmarks Metadata |
| 14 | + |
| 15 | +This section provides a high-level summary of the benchmark run, including: |
| 16 | + |
| 17 | +- **Server configuration**: Target URL, model name, and backend details |
| 18 | +- **Data configuration**: Data source, token counts, and dataset properties |
| 19 | +- **Profile arguments**: Rate type, maximum duration, request limits, etc. |
| 20 | +- **Extras**: Any additional metadata provided via the `--output-extras` argument |
| 21 | + |
| 22 | +Example: |
| 23 | + |
| 24 | +``` |
| 25 | +Benchmarks Metadata |
| 26 | +------------------ |
| 27 | +Args: {"backend_type": "openai", "target": "http://localhost:8000", "model": "Meta-Llama-3.1-8B-Instruct-quantized", ...} |
| 28 | +Worker: {"type_": "generative", "backend_type": "openai", "backend_args": {"timeout": 120.0, ...}, ...} |
| 29 | +Request Loader: {"type_": "generative", "data_args": {"prompt_tokens": 256, "output_tokens": 128, ...}, ...} |
| 30 | +Extras: {} |
| 31 | +``` |
| 32 | + |
| 33 | +### 2. Benchmarks Info |
| 34 | + |
| 35 | +This section summarizes the key information about each benchmark run, presented as a table with columns: |
| 36 | + |
| 37 | +- **Type**: The benchmark type (e.g., synchronous, constant, poisson, etc.) |
| 38 | +- **Start/End Time**: When the benchmark started and ended |
| 39 | +- **Duration**: Total duration of the benchmark in seconds |
| 40 | +- **Requests**: Count of successful, incomplete, and errored requests |
| 41 | +- **Token Stats**: Average token counts and totals for prompts and outputs |
| 42 | + |
| 43 | +This section helps you understand what was executed and provides a quick overview of the results. |
| 44 | + |
| 45 | +### 3. Benchmarks Stats |
| 46 | + |
| 47 | +This is the most critical section for performance analysis. It displays detailed statistics for each metric: |
| 48 | + |
| 49 | +- **Throughput Metrics**: |
| 50 | + |
| 51 | + - Requests per second (RPS) |
| 52 | + - Request concurrency |
| 53 | + - Output tokens per second |
| 54 | + - Total tokens per second |
| 55 | + |
| 56 | +- **Latency Metrics**: |
| 57 | + |
| 58 | + - Request latency (mean, median, p99) |
| 59 | + - Time to first token (TTFT) (mean, median, p99) |
| 60 | + - Inter-token latency (ITL) (mean, median, p99) |
| 61 | + - Time per output token (mean, median, p99) |
| 62 | + |
| 63 | +The p99 (99th percentile) values are particularly important for SLO analysis, as they represent the worst-case performance for 99% of requests. |
| 64 | + |
| 65 | +## Analyzing the Results File |
| 66 | + |
| 67 | +For deeper analysis, GuideLLM saves detailed results to multiple files by default in your current directory: |
| 68 | + |
| 69 | +- `benchmarks.json`: Complete benchmark data in JSON format |
| 70 | +- `benchmarks.csv`: Summary of key metrics in CSV format |
| 71 | +- `benchmarks.html`: Interactive HTML report with visualizations |
| 72 | + |
| 73 | +### File Formats |
| 74 | + |
| 75 | +GuideLLM supports multiple output formats that can be customized: |
| 76 | + |
| 77 | +- **JSON**: Complete benchmark data in JSON format with full request samples |
| 78 | +- **YAML**: Complete benchmark data in YAML format with full request samples |
| 79 | +- **CSV**: Summary of key metrics in CSV format suitable for spreadsheets |
| 80 | +- **HTML**: Interactive HTML report with tables and visualizations |
| 81 | +- **Console**: Terminal output displayed during execution |
| 82 | + |
| 83 | +To specify which formats to generate, use the `--outputs` argument: |
| 84 | + |
| 85 | +```bash |
| 86 | +guidellm benchmark --target "http://localhost:8000" --outputs json csv |
| 87 | +``` |
| 88 | + |
| 89 | +The `--outputs` argument additionally accepts full file names to further customize/differentiate outputs: |
| 90 | + |
| 91 | +```bash |
| 92 | +guidellm benchmark --target "http://localhost:8000" --outputs results/benchmarks.json results/summary.csv |
| 93 | +``` |
| 94 | + |
| 95 | +To change the output directory, use the `--output-dir` argument: |
| 96 | + |
| 97 | +```bash |
| 98 | +guidellm benchmark --target "http://localhost:8000" --output-dir results/ |
| 99 | +``` |
| 100 | + |
| 101 | +### Programmatic Analysis |
| 102 | + |
| 103 | +For custom analysis, you can reload the results into Python: |
| 104 | + |
| 105 | +```python |
| 106 | +from guidellm.benchmark import GenerativeBenchmarksReport |
| 107 | + |
| 108 | +# Load results from file |
| 109 | +report = GenerativeBenchmarksReport.load_file("benchmarks.json") |
| 110 | + |
| 111 | +# Access individual benchmarks |
| 112 | +for benchmark in report.benchmarks: |
| 113 | + # Print basic info |
| 114 | + print(f"Benchmark: {benchmark.id_}") |
| 115 | + print(f"Type: {benchmark.type_}") |
| 116 | + |
| 117 | + # Access metrics |
| 118 | + print(f"Avg RPS: {benchmark.metrics.requests_per_second.successful.mean}") |
| 119 | + print(f"p99 latency: {benchmark.metrics.request_latency.successful.percentiles.p99}") |
| 120 | + print(f"TTFT (p99): {benchmark.metrics.time_to_first_token_ms.successful.percentiles.p99}") |
| 121 | +``` |
| 122 | + |
| 123 | +## Key Performance Indicators |
| 124 | + |
| 125 | +When analyzing your results, focus on these key indicators: |
| 126 | + |
| 127 | +### 1. Throughput and Capacity |
| 128 | + |
| 129 | +- **Maximum RPS**: What's the highest request rate your server can handle? |
| 130 | +- **Concurrency**: How many concurrent requests can your server process? |
| 131 | +- **Token Throughput**: How many tokens per second can your server generate? |
| 132 | + |
| 133 | +### 2. Latency and Responsiveness |
| 134 | + |
| 135 | +- **Time to First Token (TTFT)**: How quickly does the model start generating output? |
| 136 | +- **Inter-Token Latency (ITL)**: How smoothly does the model generate subsequent tokens? |
| 137 | +- **Total Request Latency**: How long do complete requests take end-to-end? |
| 138 | + |
| 139 | +### 3. Reliability and Error Rates |
| 140 | + |
| 141 | +- **Success Rate**: What percentage of requests completes successfully? |
| 142 | +- **Error Distribution**: What types of errors occur and at what rates? |
| 143 | + |
| 144 | +## Additional Analysis Techniques |
| 145 | + |
| 146 | +### Comparing Different Models or Hardware |
| 147 | + |
| 148 | +Run benchmarks with different models or hardware configurations, then compare: |
| 149 | + |
| 150 | +```bash |
| 151 | +guidellm benchmark --target "http://server1:8000" --output-dir model1/ |
| 152 | +guidellm benchmark --target "http://server2:8000" --output-dir model2/ |
| 153 | +``` |
| 154 | + |
| 155 | +### Cost Optimization |
| 156 | + |
| 157 | +Calculate cost-effectiveness by analyzing: |
| 158 | + |
| 159 | +- Tokens per second per dollar of hardware cost |
| 160 | +- Maximum throughput for different hardware configurations |
| 161 | +- Optimal batch size vs. latency tradeoffs |
| 162 | + |
| 163 | +### Determining Scaling Requirements |
| 164 | + |
| 165 | +Use your benchmark results to plan: |
| 166 | + |
| 167 | +- How many servers you need to handle your expected load |
| 168 | +- When to automatically scale up or down based on demand |
| 169 | +- What hardware provides the best price/performance for your workload |
0 commit comments