-
Notifications
You must be signed in to change notification settings - Fork 127
Add benchmark baseline option
#165
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -29,6 +29,7 @@ def display(self, tr, groups, progress_reporter=report_progress): | |
| bench["name"] = self.name_format(bench) | ||
|
|
||
| worst = {} | ||
| baseline = {} | ||
| best = {} | ||
| solo = len(benchmarks) == 1 | ||
| for line, prop in progress_reporter(("min", "max", "mean", "median", "iqr", "stddev", "ops"), | ||
|
|
@@ -38,11 +39,23 @@ def display(self, tr, groups, progress_reporter=report_progress): | |
| benchmarks, tr, "{line} ({pos}/{total})", line=line)) | ||
| best[prop] = max(bench[prop] for _, bench in progress_reporter( | ||
| benchmarks, tr, "{line} ({pos}/{total})", line=line)) | ||
| try: | ||
| baseline[prop] = max(bench[prop] for _, bench in progress_reporter( | ||
| benchmarks, tr, "{line} ({pos}/{total})", line=line) | ||
| if bench.get("baseline", True)) | ||
| except ValueError: | ||
|
Owner
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can we avoid the try/except somehow? What actually can raise that error?
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If there is no benchmark in the group marked as Should I convert this to an |
||
| baseline[prop] = None | ||
| else: | ||
| worst[prop] = max(bench[prop] for _, bench in progress_reporter( | ||
| benchmarks, tr, "{line} ({pos}/{total})", line=line)) | ||
| best[prop] = min(bench[prop] for _, bench in progress_reporter( | ||
| benchmarks, tr, "{line} ({pos}/{total})", line=line)) | ||
| try: | ||
| baseline[prop] = min(bench[prop] for _, bench in progress_reporter( | ||
| benchmarks, tr, "{line} ({pos}/{total})", line=line) | ||
| if bench.get("baseline", True)) | ||
| except ValueError: | ||
| baseline[prop] = None | ||
| for line, prop in progress_reporter(("outliers", "rounds", "iterations"), tr, "{line}: {value}", line=line): | ||
| worst[prop] = max(benchmark[prop] for _, benchmark in progress_reporter( | ||
| benchmarks, tr, "{line} ({pos}/{total})", line=line)) | ||
|
|
@@ -106,7 +119,7 @@ def display(self, tr, groups, progress_reporter=report_progress): | |
| ALIGNED_NUMBER_FMT.format( | ||
| bench[prop] * adjustment, | ||
| widths[prop], | ||
| compute_baseline_scale(best[prop], bench[prop], rpadding), | ||
| compute_baseline_scale(baseline[prop], bench[prop], rpadding), | ||
| rpadding | ||
| ), | ||
| green=not solo and bench[prop] == best.get(prop), | ||
|
|
@@ -118,7 +131,7 @@ def display(self, tr, groups, progress_reporter=report_progress): | |
| ALIGNED_NUMBER_FMT.format( | ||
| bench[prop] * ops_adjustment, | ||
| widths[prop], | ||
| compute_baseline_scale(best[prop], bench[prop], rpadding), | ||
| compute_baseline_scale(baseline[prop], bench[prop], rpadding), | ||
| rpadding | ||
| ), | ||
| green=not solo and bench[prop] == best.get(prop), | ||
|
|
@@ -147,7 +160,7 @@ def display(self, tr, groups, progress_reporter=report_progress): | |
|
|
||
|
|
||
| def compute_baseline_scale(baseline, value, width): | ||
| if not width: | ||
| if not width or baseline is None: | ||
| return "" | ||
| if value == baseline: | ||
| return " (1.0)".ljust(width) | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now sure how but we should have some way to validate that there is only 1 baseline per group. It doesn't make sense to have 2 baselines right? Lets not let users wonder why stuff doesn't work as expected (the annoying silent failure).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current implementation marks all results as possible baselines by default and only excludes the ones marked as
baseline=False. If there is more than onebaseline=Truebenchmark available it will choose the one with the lowest value/highest score. This perfectly integrates with the existing behaviour and means that I don't have to choose the baseline value selected for all times (as performance may differ between systems, etc). The included docs actually mention this. As you mentioned there cannot be baselines when the output is rendered, but there can be more than one potential baseline scores.Unless you have strong feelings on this, I'd like to keep it this way for extra flexibility. The wording could be improved however: maybe something along the lines of
potential_baseline, but shorter?