Conversation
|
Anyway, would be cool if somebody could tell me why the test fails, cause looking at the details gives me a 404. |
| baseline[prop] = max(bench[prop] for _, bench in progress_reporter( | ||
| benchmarks, tr, "{line} ({pos}/{total})", line=line) | ||
| if bench.get("baseline", True)) | ||
| except ValueError: |
There was a problem hiding this comment.
Can we avoid the try/except somehow? What actually can raise that error?
There was a problem hiding this comment.
If there is no benchmark in the group marked as baseline this will end up calling max(()) which raises ValueError: max() argument is empty sequence.
Should I convert this to an if? It would require evaluating the array of values up-front, then check their len(…). Or do you prefer me adding a comment about this?
| if name not in ( | ||
| "max_time", "min_rounds", "min_time", "timer", "group", "disable_gc", "warmup", | ||
| "warmup_iterations", "calibration_precision", "cprofile"): | ||
| "warmup_iterations", "calibration_precision", "cprofile", "baseline"): |
There was a problem hiding this comment.
Now sure how but we should have some way to validate that there is only 1 baseline per group. It doesn't make sense to have 2 baselines right? Lets not let users wonder why stuff doesn't work as expected (the annoying silent failure).
There was a problem hiding this comment.
The current implementation marks all results as possible baselines by default and only excludes the ones marked as baseline=False. If there is more than one baseline=True benchmark available it will choose the one with the lowest value/highest score. This perfectly integrates with the existing behaviour and means that I don't have to choose the baseline value selected for all times (as performance may differ between systems, etc). The included docs actually mention this. As you mentioned there cannot be baselines when the output is rendered, but there can be more than one potential baseline scores.
Unless you have strong feelings on this, I'd like to keep it this way for extra flexibility. The wording could be improved however: maybe something along the lines of potential_baseline, but shorter?
|
Glad to hear back from you! Did you take a look at the CI yet? |
|
@ionelmc: Ping |
|
This would be a nice feature |
|
Just had a need for this exact usecase, would really like to be able to pin a specific benchmark as 1.0x and make everything else relative to that. |
Add
baselineoption that determines (according to the included docs):