Skip to content

Add --compare-between mode#302

Open
akx wants to merge 3 commits intoionelmc:masterfrom
akx:compare-between
Open

Add --compare-between mode#302
akx wants to merge 3 commits intoionelmc:masterfrom
akx:compare-between

Conversation

@akx
Copy link

@akx akx commented Feb 26, 2026

This PR adds a --compare-between mode, effectively a pivot table between 2..N result files.

I needed this for django/asgiref#551 and cleaned it up for general use :)

Example output with color:

bench

@codecov
Copy link

codecov bot commented Feb 26, 2026

Codecov Report

❌ Patch coverage is 89.83051% with 12 lines in your changes missing coverage. Please review.
✅ Project coverage is 90.22%. Comparing base (b11b71b) to head (ee802f9).

Files with missing lines Patch % Lines
src/pytest_benchmark/table.py 90.09% 5 Missing and 5 partials ⚠️
src/pytest_benchmark/cli.py 77.77% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@           Coverage Diff            @@
##           master     #302    +/-   ##
========================================
  Coverage   90.21%   90.22%            
========================================
  Files          28       28            
  Lines        2934     3038   +104     
  Branches      318      339    +21     
========================================
+ Hits         2647     2741    +94     
- Misses        212      218     +6     
- Partials       75       79     +4     
Flag Coverage Δ
py310-pytest84-nodist-cover (macos/arm64) 86.01% <89.83%> (-0.05%) ⬇️
py310-pytest84-nodist-cover (ubuntu/x64) 86.50% <89.83%> (+0.13%) ⬆️
py310-pytest84-nodist-cover (windows/x64) 85.97% <89.83%> (+0.15%) ⬆️
py310-pytest90-nodist-cover (macos/arm64) 86.17% <89.83%> (+0.14%) ⬆️
py310-pytest90-nodist-cover (ubuntu/x64) 86.66% <89.83%> (+0.13%) ⬆️
py310-pytest90-nodist-cover (windows/x64) 85.94% <89.83%> (+0.15%) ⬆️
py311-pytest84-nodist-cover (macos/arm64) 86.20% <89.83%> (+0.14%) ⬆️
py311-pytest84-nodist-cover (ubuntu/x64) 86.50% <89.83%> (+0.13%) ⬆️
py311-pytest84-nodist-cover (windows/x64) 85.78% <89.83%> (-0.05%) ⬇️
py311-pytest90-nodist-cover (macos/arm64) 86.17% <89.83%> (+0.35%) ⬆️
py311-pytest90-nodist-cover (ubuntu/x64) 86.47% <89.83%> (+0.13%) ⬆️
py311-pytest90-nodist-cover (windows/x64) 85.94% <89.83%> (+0.36%) ⬆️
py312-pytest84-nodist-cover (macos/arm64) 86.20% <89.83%> (+0.14%) ⬆️
py312-pytest84-nodist-cover (ubuntu/x64) 86.50% <89.83%> (+0.10%) ⬆️
py312-pytest84-nodist-cover (windows/x64) 85.78% <89.83%> (+0.16%) ⬆️
py312-pytest90-nodist-cover (macos/arm64) 86.17% <89.83%> (+0.14%) ⬆️
py312-pytest90-nodist-cover (ubuntu/x64) 86.66% <89.83%> (+0.33%) ⬆️
py312-pytest90-nodist-cover (windows/x64) 85.94% <89.83%> (+0.36%) ⬆️
py313-pytest84-nodist-cover (macos/arm64) 86.20% <89.83%> (+0.14%) ⬆️
py313-pytest84-nodist-cover (ubuntu/x64) 86.50% <89.83%> (+0.13%) ⬆️
py313-pytest84-nodist-cover (windows/x64) 85.78% <89.83%> (-0.05%) ⬇️
py313-pytest90-nodist-cover (macos/arm64) 86.17% <89.83%> (+0.14%) ⬆️
py313-pytest90-nodist-cover (ubuntu/x64) 86.50% <89.83%> (+0.17%) ⬆️
py313-pytest90-nodist-cover (windows/x64) 85.74% <89.83%> (+0.16%) ⬆️
py314-pytest84-nodist-cover (macos/arm64) 89.43% <89.83%> (+0.03%) ⬆️
py314-pytest84-nodist-cover (ubuntu/x64) 89.69% <89.83%> (-0.22%) ⬇️
py314-pytest84-nodist-cover (windows/x64) 88.96% <89.83%> (-0.16%) ⬇️
py314-pytest90-nodist-cover (macos/arm64) 89.43% <89.83%> (+0.03%) ⬆️
py314-pytest90-nodist-cover (ubuntu/x64) 89.69% <89.83%> (-0.19%) ⬇️
py314-pytest90-nodist-cover (windows/x64) 89.16% <89.83%> (+0.04%) ⬆️
pypy311-pytest84-nodist-cover (macos/arm64) 85.38% <89.83%> (+0.17%) ⬆️
pypy311-pytest84-nodist-cover (ubuntu/x64) 85.71% <89.83%> (+0.16%) ⬆️
pypy311-pytest84-nodist-cover (windows/x64) 84.95% <89.83%> (-0.02%) ⬇️
pypy311-pytest90-nodist-cover (macos/arm64) 85.38% <89.83%> (+0.17%) ⬆️
pypy311-pytest90-nodist-cover (ubuntu/x64) 85.68% <89.83%> (+0.13%) ⬆️
pypy311-pytest90-nodist-cover (windows/x64) 85.15% <89.83%> (+0.18%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@ionelmc
Copy link
Owner

ionelmc commented Feb 26, 2026

Hmmm nice, looks like your did some refactorings, I'll try find time this week to review.

@akx
Copy link
Author

akx commented Feb 26, 2026

looks like your did some refactorings

Very tiny ones, separated into the first commit for ease of review. ef81697

add_display_options(compare_command.add_argument, prefix='')
add_histogram_options(compare_command.add_argument, prefix='')
compare_command.add_argument(
'--compare-between',
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should use the prefix.

@ionelmc
Copy link
Owner

ionelmc commented Mar 17, 2026

Sorry for the delays, I finally got to try this and I have an example to discuss. I have this stuff locally (bunch of crappy stats for 2 platforms):

> pytest-benchmark list

/home/ionel/open-source/pytest-benchmark/.benchmarks/Linux-CPython-3.7-64bit/0001_9aa5319fa75dd392863d4a22f3468e2e91f2c75b_20190107_005508_uncommited-changes.json
/home/ionel/open-source/pytest-benchmark/.benchmarks/Linux-CPython-3.8-64bit/0001_bf76dd3f3ef1923ff1bb94e49165db95cf9d57d5_20210417_141615_uncommited-changes.json
/home/ionel/open-source/pytest-benchmark/.benchmarks/Linux-CPython-3.7-64bit/0002_9aa5319fa75dd392863d4a22f3468e2e91f2c75b_20190107_005552_uncommited-changes.json
/home/ionel/open-source/pytest-benchmark/.benchmarks/Linux-CPython-3.8-64bit/0002_bf76dd3f3ef1923ff1bb94e49165db95cf9d57d5_20210417_141718_uncommited-changes.json
/home/ionel/open-source/pytest-benchmark/.benchmarks/Linux-CPython-3.7-64bit/0003_9aa5319fa75dd392863d4a22f3468e2e91f2c75b_20190107_005844_uncommited-changes.json
/home/ionel/open-source/pytest-benchmark/.benchmarks/Linux-CPython-3.8-64bit/0003_bf76dd3f3ef1923ff1bb94e49165db95cf9d57d5_20210417_141813_uncommited-changes.json
/home/ionel/open-source/pytest-benchmark/.benchmarks/Linux-CPython-3.7-64bit/0004_9aa5319fa75dd392863d4a22f3468e2e91f2c75b_20190107_010137_uncommited-changes.json
/home/ionel/open-source/pytest-benchmark/.benchmarks/Linux-CPython-3.8-64bit/0004_bf76dd3f3ef1923ff1bb94e49165db95cf9d57d5_20210417_143038_uncommited-changes.json
/home/ionel/open-source/pytest-benchmark/.benchmarks/Linux-CPython-3.7-64bit/0005_4330f5597d413b9c0d0e54928bac300679822cac_20190107_010839_uncommited-changes.json

If I run pytest-benchmark compare --compare-between --columns=min I get this (without --columns= it's even worse):

--------------------------------------------------------------------------------------------------------------------------------------------- benchmark: 9 tests, 9 sources ---------------------------------------------------------------------------------------------------------------------------------------------
Name (time in ns)        0001_9aa5319 Min  0001_bf76dd3 Min  0002_9aa5319 Min  0002_bf76dd3 Min  0003_9aa5319 Min  0003_bf76dd3 Min  0004_9aa5319 Min  0004_bf76dd3 Min  0005_4330f55 Min  Chg:0001_b/Min  Chg:0002_9/Min  Chg:0002_b/Min  Chg:0003_9/Min  Chg:0003_b/Min  Chg:0004_9/Min  Chg:0004_b/Min  Chg:0005_4/Min
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_xfast                        59.6046           34.2931           61.9888           34.3178           61.9888           34.3048           61.9888           34.3178           59.6046          -42.5%           +4.0%          -42.4%           +4.0%          -42.4%           +4.0%          -42.4%           +0.0%
test_fast                     17,200.0000        6,499.9999       15,400.0008        9,600.0003       14,900.0007        9,400.0002       14,799.9999        9,299.9999       15,799.9966          -62.2%          -10.5%          -44.2%          -13.4%          -45.3%          -14.0%          -45.9%           -8.1%
test_parametrized[2]          17,900.0017       69,300.0002       17,700.0002       51,900.0000       42,400.0027       24,800.0001       21,399.9992       28,399.9998       19,399.9986         +287.2%           -1.1%         +189.9%         +136.9%          +38.5%          +19.6%          +58.7%           +8.4%
test_parametrized[4]          22,799.9990       74,600.0001       64,200.0014       41,500.0000       22,699.9982       25,400.0001       25,299.9998       53,499.9999       19,700.0008         +227.2%         +181.6%          +82.0%           -0.4%          +11.4%          +11.0%         +134.6%          -13.6%
test_parametrized[3]          23,699.9986       74,300.0001       16,900.0014       37,400.0001       23,700.0022       42,199.9998       20,399.9989       21,000.0003       18,599.9997         +213.5%          -28.7%          +57.8%           +0.0%          +78.1%          -13.9%          -11.4%          -21.5%
test_parametrized[0]          29,499.9991       73,700.0000       19,899.9987       53,400.0001       35,799.9998       39,999.9999       22,001.0006       77,100.0000       23,999.9972         +149.8%          -32.5%          +81.0%          +21.4%          +35.6%          -25.4%         +161.4%          -18.6%
test_parametrized[1]          49,499.9986       43,099.9999       18,899.9984       47,999.9999       51,700.0008       32,599.9999       22,699.9982       53,499.9999       20,299.9981          -12.9%          -61.8%           -3.0%           +4.4%          -34.1%          -54.1%           +8.1%          -59.0%
test_slow                  1,060,704.9990    1,031,199.9999    1,061,405.0007    1,066,400.0001    1,049,705.0025    1,064,200.0002    1,039,404.0000    1,066,299.9998    1,036,099.9986           -2.8%           +0.1%           +0.5%           -1.0%           +0.3%           -2.0%           +0.5%           -2.3%
test_slower               10,039,549.0008   10,069,902.0004   10,072,847.9993   10,073,599.9999   10,072,043.9987   10,068,999.9999   10,072,840.0030   10,073,999.9998   10,067,498.9990           +0.3%           +0.3%           +0.3%           +0.3%           +0.3%           +0.3%           +0.3%           +0.3%
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

So the way I understand it 0001_9aa5319 is the reference and then everything is differentiated to that (correct me if I am wrong).

I think it would be better is the differences appear right after the column that the diff is made of.

Currently it's: reference, run1, run2, run3, diff1, diff2, diff3. I think reference, run1, diff1, run2, diff2, run3, diff3 would be better, and you'd be able to make the column headers more compact (eg: just show "difference" or "diff" instead of repeating and trying to fit the run name from the previous columns).

Also, about columns, I would like to propose this idea: make --between be exclusive with --columns. Cause it doesn't make sense to use --between with all the columns unless you're comparing just 2 runs. Actually I am not sure but the defaults are producing too many columns. It should default somehow to either compare 1 stat or compare all the default stats on just 2 runs. Maybe go with these:

  • --between (no value) does some magic: if 2 runs are in the input, compare all the default columns, if more than 2 runs then only compare a default stat like min or mean (I guess we have to decide which is the most useful one to show by default - what to you usually use?)
  • --between=min,max,etc shows those columns (either overrides what --columns sets or errors out by being exclusive option)

@ionelmc
Copy link
Owner

ionelmc commented Mar 17, 2026

So lemme know what you think, to sum it up, my ideas where to compact/remove redundant info from the column names (like Chg:runname just repeats runname from previous column) and give better defaults/prevent users from outputting something with 100 columns by default.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants