Skip to content

Commit c9cc407

Browse files
feat: add experiment level throughput (#9)
* feat: add experiment level throughput - Add monitoring dashboard and power efficiency plots to assets - Update analysis guide with new visualization documentation - Clarify throughput metrics definition in reports and schemas - Add system throughput and batch efficiency properties to BenchmarkResult - Enhance report generation with detailed throughput analysis section - Improve documentation comments in analysis code Signed-off-by: cmontemuino <[email protected]> * refactor: redesign the batch efficiency ratio metric Signed-off-by: cmontemuino <[email protected]> * chore: use proper formatting for percentages Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * fix: throughput_scaling_efficiency metric Signed-off-by: cmontemuino <[email protected]> * refactor: Improve batch efficiency calculation - Removes the flawed `batch_efficiency_ratio` and related properties from the `BenchmarkResult` schema. These methods used circular logic and produced misleading results. - Implement `BatchEfficiencyAnalyzer` to do the batch efficiency ratio clculation Signed-off-by: cmontemuino <[email protected]> * refactor: improve _group_by_batch_size Signed-off-by: cmontemuino <[email protected]> --------- Signed-off-by: cmontemuino <[email protected]> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
1 parent 25b4b3d commit c9cc407

File tree

15 files changed

+436
-10
lines changed

15 files changed

+436
-10
lines changed
-67 KB
Loading
1.05 KB
Loading
616 Bytes
Loading
403 KB
Loading
202 KB
Loading

docs/user-guide/analysis-guide.md

Lines changed: 12 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -26,18 +26,23 @@ After running the analysis pipeline, you'll get comprehensive performance insigh
2626
analysis/sample-output/
2727
├── plots/ # Visual performance analysis
2828
│ ├── batch_size_scaling.png # Batch size vs performance
29-
│ ├── batch_size_scaling_by_memory.png
30-
│ ├── memory_efficiency.png # Memory utilization effects
29+
│ ├── batch_size_scaling_by_memory.png # Same as batch_size_scaling.png but with a split per memory utilization
3130
│ ├── latency_analysis.png # Latency distribution analysis
31+
│ ├── memory_efficiency.png # Memory utilization effects
32+
│ ├── monitoring_dashboard.png # Dashboard with power consumption + GPU temp distribution + CPU-GPU power relationship
33+
│ ├── power_efficiency_analysis.png # Power consumption analysis + Power stability vs. efficiency
3234
│ └── throughput_comparison.png # Throughput comparisons
3335
├── reports/ # Comprehensive analysis reports
3436
│ ├── analysis_summary.json # Machine-readable summary
3537
│ └── benchmark_analysis_report.md # Human-readable report
3638
└── tables/ # Statistical summaries (CSV)
3739
├── batch_size_analysis.csv
40+
├── gpu_allocation_summary.csv
3841
├── memory_utilization_analysis.csv
3942
├── model_performance_summary.csv
40-
└── raw_results.csv
43+
├── monitoring_summary.csv
44+
├── raw_results.csv
45+
└── thermal_analysis.csv
4146
```
4247

4348
### Visual Analysis Guide
@@ -180,10 +185,13 @@ analysis/sample-output/
180185

181186
#### 5. Throughput Analysis
182187

183-
![Througput Analysis](../assets/img/sample-analysis/throughput_comparison.png)
188+
![Throughput Analysis](../assets/img/sample-analysis/throughput_comparison.png)
184189

185190
**What it shows**: Throughput performance across different configuration parameters, helping identify optimal settings for maximum system utilization
186191

192+
> ℹ️ **Note**: **Throughput** is defined as the **average latency** as a rate, representing
193+
**how frequently a single request completes**.
194+
187195
**How to interpret**:
188196

189197
- **Left plot (Throughput by Model and Batch Size)**: **Grouped bars** for each model showing different batch sizes

src/amd_bench/core/analysis.py

Lines changed: 68 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1115,7 +1115,6 @@ def _extract_vllm_latency_metrics(data: Dict[str, Any]) -> BenchmarkMetrics:
11151115
11161116
- **Per-Request Completion Rate**: 1 / avg_latency (requests/second/experiment)
11171117
- Measures how frequently individual requests complete
1118-
- Industry standard for latency benchmarks
11191118
- Lower values for larger batch sizes due to queueing delays
11201119
11211120
- **Batch-Level Throughput**: batch_size / avg_latency (theoretical max)
@@ -1203,8 +1202,9 @@ def get_percentile(p: int) -> float:
12031202
"""Extract percentile with flexible key handling."""
12041203
return float(percentiles.get(str(p), percentiles.get(p, 0.0)))
12051204

1206-
# Calculate per-request completion rate (industry standard for latency benchmarks)
1207-
per_request_completion_rate = 1.0 / avg_latency
1205+
# Calculate request completion rate (requests per second for single request processing)
1206+
# This represents the inverse of latency: how frequently one request completes
1207+
throughput = 1.0 / avg_latency
12081208

12091209
return BenchmarkMetrics(
12101210
# Core latency metrics
@@ -1217,7 +1217,7 @@ def get_percentile(p: int) -> float:
12171217
p99_latency=get_percentile(99),
12181218
# Per-request completion rate (requests/second per experiment)
12191219
# Note: This is NOT system-level throughput for batch processing
1220-
throughput=per_request_completion_rate,
1220+
throughput=throughput,
12211221
# Token-level metrics (not available in latency-only benchmarks)
12221222
tokens_per_second=0.0,
12231223
# Experimental metadata
@@ -1250,3 +1250,67 @@ def _generate_experiment_id(params: Dict[str, str]) -> str:
12501250
params.get("timestamp", "unknown"),
12511251
]
12521252
return "_".join(str(p).replace("/", "-") for p in key_params)
1253+
1254+
1255+
class BatchEfficiencyAnalyzer:
1256+
"""Analyze batch efficiency across multiple batch size configurations."""
1257+
1258+
def __init__(self, results: List[BenchmarkResult]):
1259+
self.results = results
1260+
self.by_batch_size = self._group_by_batch_size()
1261+
1262+
def _group_by_batch_size(self) -> Dict[int, List[BenchmarkResult]]:
1263+
"""Group results by batch size for comparison."""
1264+
from collections import defaultdict
1265+
1266+
groups: Dict[int, List[BenchmarkResult]] = defaultdict(list)
1267+
for result in self.results:
1268+
groups[result.config.batch_size].append(result)
1269+
return dict(groups)
1270+
1271+
def calculate_scaling_efficiency(self, baseline_batch_size: int = 1) -> Dict[int, float]:
1272+
"""
1273+
Calculate how efficiently each batch size scales compared to baseline.
1274+
1275+
Returns efficiency ratios where:
1276+
- 1.0 = same efficiency as baseline
1277+
- >1.0 = better than baseline
1278+
- <1.0 = worse than baseline
1279+
"""
1280+
if baseline_batch_size not in self.by_batch_size:
1281+
raise ValueError(f"No data for baseline batch size {baseline_batch_size}")
1282+
1283+
baseline_results = self.by_batch_size[baseline_batch_size]
1284+
baseline_throughput = sum(r.metrics.throughput for r in baseline_results) / len(
1285+
baseline_results
1286+
)
1287+
1288+
efficiencies = {}
1289+
for batch_size, results in self.by_batch_size.items():
1290+
avg_system_throughput = sum(r.system_throughput for r in results) / len(results)
1291+
theoretical_throughput = batch_size * baseline_throughput
1292+
efficiencies[batch_size] = avg_system_throughput / theoretical_throughput
1293+
1294+
return efficiencies
1295+
1296+
def get_scaling_grades(self, baseline_batch_size: int = 1) -> Dict[int, str]:
1297+
"""Generates a human-readable performance grade for each batch size."""
1298+
1299+
efficiency_ratios = self.calculate_scaling_efficiency(baseline_batch_size)
1300+
grades = {}
1301+
1302+
for batch_size, ratio in efficiency_ratios.items():
1303+
if ratio >= 1.1:
1304+
grades[batch_size] = "A+ (Excellent)"
1305+
elif ratio >= 1.0:
1306+
grades[batch_size] = "A (Very Good)"
1307+
elif ratio >= 0.9:
1308+
grades[batch_size] = "B (Good)"
1309+
elif ratio >= 0.8:
1310+
grades[batch_size] = "C (Fair)"
1311+
elif ratio >= 0.7:
1312+
grades[batch_size] = "D (Poor)"
1313+
else:
1314+
grades[batch_size] = "F (Very Poor)"
1315+
1316+
return grades

src/amd_bench/core/reporters.py

Lines changed: 50 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -113,6 +113,10 @@ def _create_markdown_report(
113113
f.write("\n## Model Performance Overview\n\n")
114114
self._write_model_performance_section(f)
115115

116+
# Throughput Analysis
117+
f.write("\n## Throughput Analysis\n\n")
118+
self._write_throughput_analysis_section(f)
119+
116120
# Configuration Analysis
117121
f.write("\n## Configuration Analysis\n\n")
118122
self._write_configuration_analysis(f)
@@ -143,9 +147,14 @@ def _write_executive_summary(self, file: TextIO) -> None:
143147
f"This analysis covers **{len(models)} models** across **{len(self.results)} experiments**.\n\n"
144148
)
145149
file.write(f"- **Average Latency**: {avg_latency:.4f} seconds\n")
146-
file.write(f"- **Average Throughput**: {avg_throughput:.2f} requests/second\n")
150+
file.write(f"- **Throughput**: {avg_throughput:.2f} requests/second\n")
147151
file.write(f"- **Models Tested**: {', '.join(sorted(models))}\n")
148152

153+
file.write(
154+
"""> ℹ️ **Note**: **Throughput** is defined as the **average latency** as a rate, representing
155+
**how frequently a single request completes**.\n"""
156+
)
157+
149158
def _write_model_performance_section(self, file: TextIO) -> None:
150159
"""Write model performance section to markdown report."""
151160
if not self.results:
@@ -178,6 +187,46 @@ def _write_model_performance_section(self, file: TextIO) -> None:
178187

179188
file.write("\n")
180189

190+
def _write_throughput_analysis_section(self, file: TextIO) -> None:
191+
"""Write enhanced throughput analysis with proper metric distinctions."""
192+
193+
file.write("**Important**: This analysis reports two different throughput metrics:\n\n")
194+
file.write(
195+
"- **Per-Request Completion Rate**: How frequently individual requests complete\n"
196+
)
197+
file.write(
198+
"- **System Throughput**: Total system processing capacity (batch_size × completion_rate)\n\n"
199+
)
200+
201+
file.write(
202+
"| Batch Size | Avg Latency (s) | Completion Rate (req/s) | System Throughput (req/s) | Input Length | Output Length | Mem Util (%) |\n"
203+
)
204+
file.write(
205+
"|------------|-----------------|-------------------------|---------------------------|--------------|---------------|--------------|\n"
206+
)
207+
208+
# Sort by batch_size first, then by latency within each batch size
209+
sorted_results = sorted(
210+
self.results, key=lambda r: (r.config.batch_size, r.metrics.avg_latency)
211+
)
212+
213+
for result in sorted_results:
214+
system_throughput = result.config.batch_size * result.metrics.throughput
215+
file.write(
216+
f"| {result.config.batch_size} | "
217+
f"{result.metrics.avg_latency:.3f} | {result.metrics.throughput:.3f} | "
218+
f"{system_throughput:.3f} | {result.config.input_length} | {result.config.output_length} | {result.config.memory_util * 100:.1f} |\n"
219+
)
220+
221+
file.write("\n**Key Insights:**\n")
222+
file.write("- Larger batch sizes reduce per-request completion rates due to queueing\n")
223+
file.write(
224+
"- System throughput may still increase with batch size despite higher latency\n"
225+
)
226+
file.write(
227+
"- Choose batch size based on your use case: latency-sensitive vs. throughput-optimized\n\n"
228+
)
229+
181230
def _write_configuration_analysis(self, file: TextIO) -> None:
182231
"""Write configuration analysis section."""
183232
if not self.results:

src/amd_bench/schemas/benchmark.py

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -115,6 +115,25 @@ def efficiency_score(self) -> float:
115115
return self.metrics.throughput / self.metrics.avg_latency
116116
return 0.0
117117

118+
@property
119+
def system_throughput(self) -> float:
120+
"""Calculate system-level throughput accounting for batch processing.
121+
122+
This property calculates the actual system throughput by considering
123+
the batch size used in the experiment, providing a more accurate measure
124+
of system processing capacity for batch workloads.
125+
126+
Returns:
127+
float: System throughput in requests/second (`self.metrics.throughput`),
128+
accounting for batch size
129+
130+
Example:
131+
For an experiment with batch_size=8 and avg_latency=2.0s:
132+
- per_request_completion_rate = 1/2.0 = 0.5 req/s
133+
- system_throughput = 8 * 0.5 = 4.0 req/s
134+
"""
135+
return self.config.batch_size * self.metrics.throughput
136+
118137

119138
class ExperimentFiles(BaseModel):
120139
"""File paths for a complete experiment."""

tests/integration/schema/__init__.py

Whitespace-only changes.

0 commit comments

Comments
 (0)