Add `--compare-between` mode by akx · Pull Request #302 · ionelmc/pytest-benchmark

akx · 2026-02-26T11:23:44Z

This PR adds a --compare-between mode, effectively a pivot table between 2..N result files.

I needed this for django/asgiref#551 and cleaned it up for general use :)

Example output with color:

codecov · 2026-02-26T15:59:08Z

Codecov Report

❌ Patch coverage is 89.83051% with 12 lines in your changes missing coverage. Please review.
✅ Project coverage is 90.22%. Comparing base (b11b71b) to head (ee802f9).

Files with missing lines	Patch %	Lines
src/pytest_benchmark/table.py	90.09%	5 Missing and 5 partials ⚠️
src/pytest_benchmark/cli.py	77.77%	1 Missing and 1 partial ⚠️

Additional details and impacted files

@@           Coverage Diff            @@
##           master     #302    +/-   ##
========================================
  Coverage   90.21%   90.22%            
========================================
  Files          28       28            
  Lines        2934     3038   +104     
  Branches      318      339    +21     
========================================
+ Hits         2647     2741    +94     
- Misses        212      218     +6     
- Partials       75       79     +4

Flag	Coverage Δ
py310-pytest84-nodist-cover (macos/arm64)	`86.01% <89.83%> (-0.05%)`	⬇️
py310-pytest84-nodist-cover (ubuntu/x64)	`86.50% <89.83%> (+0.13%)`	⬆️
py310-pytest84-nodist-cover (windows/x64)	`85.97% <89.83%> (+0.15%)`	⬆️
py310-pytest90-nodist-cover (macos/arm64)	`86.17% <89.83%> (+0.14%)`	⬆️
py310-pytest90-nodist-cover (ubuntu/x64)	`86.66% <89.83%> (+0.13%)`	⬆️
py310-pytest90-nodist-cover (windows/x64)	`85.94% <89.83%> (+0.15%)`	⬆️
py311-pytest84-nodist-cover (macos/arm64)	`86.20% <89.83%> (+0.14%)`	⬆️
py311-pytest84-nodist-cover (ubuntu/x64)	`86.50% <89.83%> (+0.13%)`	⬆️
py311-pytest84-nodist-cover (windows/x64)	`85.78% <89.83%> (-0.05%)`	⬇️
py311-pytest90-nodist-cover (macos/arm64)	`86.17% <89.83%> (+0.35%)`	⬆️
py311-pytest90-nodist-cover (ubuntu/x64)	`86.47% <89.83%> (+0.13%)`	⬆️
py311-pytest90-nodist-cover (windows/x64)	`85.94% <89.83%> (+0.36%)`	⬆️
py312-pytest84-nodist-cover (macos/arm64)	`86.20% <89.83%> (+0.14%)`	⬆️
py312-pytest84-nodist-cover (ubuntu/x64)	`86.50% <89.83%> (+0.10%)`	⬆️
py312-pytest84-nodist-cover (windows/x64)	`85.78% <89.83%> (+0.16%)`	⬆️
py312-pytest90-nodist-cover (macos/arm64)	`86.17% <89.83%> (+0.14%)`	⬆️
py312-pytest90-nodist-cover (ubuntu/x64)	`86.66% <89.83%> (+0.33%)`	⬆️
py312-pytest90-nodist-cover (windows/x64)	`85.94% <89.83%> (+0.36%)`	⬆️
py313-pytest84-nodist-cover (macos/arm64)	`86.20% <89.83%> (+0.14%)`	⬆️
py313-pytest84-nodist-cover (ubuntu/x64)	`86.50% <89.83%> (+0.13%)`	⬆️
py313-pytest84-nodist-cover (windows/x64)	`85.78% <89.83%> (-0.05%)`	⬇️
py313-pytest90-nodist-cover (macos/arm64)	`86.17% <89.83%> (+0.14%)`	⬆️
py313-pytest90-nodist-cover (ubuntu/x64)	`86.50% <89.83%> (+0.17%)`	⬆️
py313-pytest90-nodist-cover (windows/x64)	`85.74% <89.83%> (+0.16%)`	⬆️
py314-pytest84-nodist-cover (macos/arm64)	`89.43% <89.83%> (+0.03%)`	⬆️
py314-pytest84-nodist-cover (ubuntu/x64)	`89.69% <89.83%> (-0.22%)`	⬇️
py314-pytest84-nodist-cover (windows/x64)	`88.96% <89.83%> (-0.16%)`	⬇️
py314-pytest90-nodist-cover (macos/arm64)	`89.43% <89.83%> (+0.03%)`	⬆️
py314-pytest90-nodist-cover (ubuntu/x64)	`89.69% <89.83%> (-0.19%)`	⬇️
py314-pytest90-nodist-cover (windows/x64)	`89.16% <89.83%> (+0.04%)`	⬆️
pypy311-pytest84-nodist-cover (macos/arm64)	`85.38% <89.83%> (+0.17%)`	⬆️
pypy311-pytest84-nodist-cover (ubuntu/x64)	`85.71% <89.83%> (+0.16%)`	⬆️
pypy311-pytest84-nodist-cover (windows/x64)	`84.95% <89.83%> (-0.02%)`	⬇️
pypy311-pytest90-nodist-cover (macos/arm64)	`85.38% <89.83%> (+0.17%)`	⬆️
pypy311-pytest90-nodist-cover (ubuntu/x64)	`85.68% <89.83%> (+0.13%)`	⬆️
pypy311-pytest90-nodist-cover (windows/x64)	`85.15% <89.83%> (+0.18%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

ionelmc · 2026-02-26T17:20:01Z

Hmmm nice, looks like your did some refactorings, I'll try find time this week to review.

akx · 2026-02-26T18:13:27Z

looks like your did some refactorings

Very tiny ones, separated into the first commit for ease of review. ef81697

ionelmc · 2026-03-17T16:13:51Z

src/pytest_benchmark/cli.py

    add_display_options(compare_command.add_argument, prefix='')
    add_histogram_options(compare_command.add_argument, prefix='')
+    compare_command.add_argument(
+        '--compare-between',


This should use the prefix.

ionelmc · 2026-03-17T16:20:27Z

Sorry for the delays, I finally got to try this and I have an example to discuss. I have this stuff locally (bunch of crappy stats for 2 platforms):

> pytest-benchmark list

/home/ionel/open-source/pytest-benchmark/.benchmarks/Linux-CPython-3.7-64bit/0001_9aa5319fa75dd392863d4a22f3468e2e91f2c75b_20190107_005508_uncommited-changes.json
/home/ionel/open-source/pytest-benchmark/.benchmarks/Linux-CPython-3.8-64bit/0001_bf76dd3f3ef1923ff1bb94e49165db95cf9d57d5_20210417_141615_uncommited-changes.json
/home/ionel/open-source/pytest-benchmark/.benchmarks/Linux-CPython-3.7-64bit/0002_9aa5319fa75dd392863d4a22f3468e2e91f2c75b_20190107_005552_uncommited-changes.json
/home/ionel/open-source/pytest-benchmark/.benchmarks/Linux-CPython-3.8-64bit/0002_bf76dd3f3ef1923ff1bb94e49165db95cf9d57d5_20210417_141718_uncommited-changes.json
/home/ionel/open-source/pytest-benchmark/.benchmarks/Linux-CPython-3.7-64bit/0003_9aa5319fa75dd392863d4a22f3468e2e91f2c75b_20190107_005844_uncommited-changes.json
/home/ionel/open-source/pytest-benchmark/.benchmarks/Linux-CPython-3.8-64bit/0003_bf76dd3f3ef1923ff1bb94e49165db95cf9d57d5_20210417_141813_uncommited-changes.json
/home/ionel/open-source/pytest-benchmark/.benchmarks/Linux-CPython-3.7-64bit/0004_9aa5319fa75dd392863d4a22f3468e2e91f2c75b_20190107_010137_uncommited-changes.json
/home/ionel/open-source/pytest-benchmark/.benchmarks/Linux-CPython-3.8-64bit/0004_bf76dd3f3ef1923ff1bb94e49165db95cf9d57d5_20210417_143038_uncommited-changes.json
/home/ionel/open-source/pytest-benchmark/.benchmarks/Linux-CPython-3.7-64bit/0005_4330f5597d413b9c0d0e54928bac300679822cac_20190107_010839_uncommited-changes.json

If I run pytest-benchmark compare --compare-between --columns=min I get this (without --columns= it's even worse):

--------------------------------------------------------------------------------------------------------------------------------------------- benchmark: 9 tests, 9 sources ---------------------------------------------------------------------------------------------------------------------------------------------
Name (time in ns)        0001_9aa5319 Min  0001_bf76dd3 Min  0002_9aa5319 Min  0002_bf76dd3 Min  0003_9aa5319 Min  0003_bf76dd3 Min  0004_9aa5319 Min  0004_bf76dd3 Min  0005_4330f55 Min  Chg:0001_b/Min  Chg:0002_9/Min  Chg:0002_b/Min  Chg:0003_9/Min  Chg:0003_b/Min  Chg:0004_9/Min  Chg:0004_b/Min  Chg:0005_4/Min
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_xfast                        59.6046           34.2931           61.9888           34.3178           61.9888           34.3048           61.9888           34.3178           59.6046          -42.5%           +4.0%          -42.4%           +4.0%          -42.4%           +4.0%          -42.4%           +0.0%
test_fast                     17,200.0000        6,499.9999       15,400.0008        9,600.0003       14,900.0007        9,400.0002       14,799.9999        9,299.9999       15,799.9966          -62.2%          -10.5%          -44.2%          -13.4%          -45.3%          -14.0%          -45.9%           -8.1%
test_parametrized[2]          17,900.0017       69,300.0002       17,700.0002       51,900.0000       42,400.0027       24,800.0001       21,399.9992       28,399.9998       19,399.9986         +287.2%           -1.1%         +189.9%         +136.9%          +38.5%          +19.6%          +58.7%           +8.4%
test_parametrized[4]          22,799.9990       74,600.0001       64,200.0014       41,500.0000       22,699.9982       25,400.0001       25,299.9998       53,499.9999       19,700.0008         +227.2%         +181.6%          +82.0%           -0.4%          +11.4%          +11.0%         +134.6%          -13.6%
test_parametrized[3]          23,699.9986       74,300.0001       16,900.0014       37,400.0001       23,700.0022       42,199.9998       20,399.9989       21,000.0003       18,599.9997         +213.5%          -28.7%          +57.8%           +0.0%          +78.1%          -13.9%          -11.4%          -21.5%
test_parametrized[0]          29,499.9991       73,700.0000       19,899.9987       53,400.0001       35,799.9998       39,999.9999       22,001.0006       77,100.0000       23,999.9972         +149.8%          -32.5%          +81.0%          +21.4%          +35.6%          -25.4%         +161.4%          -18.6%
test_parametrized[1]          49,499.9986       43,099.9999       18,899.9984       47,999.9999       51,700.0008       32,599.9999       22,699.9982       53,499.9999       20,299.9981          -12.9%          -61.8%           -3.0%           +4.4%          -34.1%          -54.1%           +8.1%          -59.0%
test_slow                  1,060,704.9990    1,031,199.9999    1,061,405.0007    1,066,400.0001    1,049,705.0025    1,064,200.0002    1,039,404.0000    1,066,299.9998    1,036,099.9986           -2.8%           +0.1%           +0.5%           -1.0%           +0.3%           -2.0%           +0.5%           -2.3%
test_slower               10,039,549.0008   10,069,902.0004   10,072,847.9993   10,073,599.9999   10,072,043.9987   10,068,999.9999   10,072,840.0030   10,073,999.9998   10,067,498.9990           +0.3%           +0.3%           +0.3%           +0.3%           +0.3%           +0.3%           +0.3%           +0.3%
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

So the way I understand it 0001_9aa5319 is the reference and then everything is differentiated to that (correct me if I am wrong).

I think it would be better is the differences appear right after the column that the diff is made of.

Currently it's: reference, run1, run2, run3, diff1, diff2, diff3. I think reference, run1, diff1, run2, diff2, run3, diff3 would be better, and you'd be able to make the column headers more compact (eg: just show "difference" or "diff" instead of repeating and trying to fit the run name from the previous columns).

Also, about columns, I would like to propose this idea: make --between be exclusive with --columns. Cause it doesn't make sense to use --between with all the columns unless you're comparing just 2 runs. Actually I am not sure but the defaults are producing too many columns. It should default somehow to either compare 1 stat or compare all the default stats on just 2 runs. Maybe go with these:

--between (no value) does some magic: if 2 runs are in the input, compare all the default columns, if more than 2 runs then only compare a default stat like min or mean (I guess we have to decide which is the most useful one to show by default - what to you usually use?)
--between=min,max,etc shows those columns (either overrides what --columns sets or errors out by being exclusive option)

ionelmc · 2026-03-17T16:31:44Z

So lemme know what you think, to sum it up, my ideas where to compact/remove redundant info from the column names (like Chg:runname just repeats runname from previous column) and give better defaults/prevent users from outputting something with 100 columns by default.

akx added 3 commits February 26, 2026 12:45

Refactor table.py for reuse

ef81697

Add --compare-between mode

089705d

Add self to AUTHORS.rst

ee802f9

ionelmc reviewed Mar 17, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `--compare-between` mode#302

Add `--compare-between` mode#302
akx wants to merge 3 commits intoionelmc:masterfrom
akx:compare-between

akx commented Feb 26, 2026

Uh oh!

codecov bot commented Feb 26, 2026 •

edited

Loading

Uh oh!

ionelmc commented Feb 26, 2026

Uh oh!

akx commented Feb 26, 2026

Uh oh!

ionelmc Mar 17, 2026

Uh oh!

ionelmc commented Mar 17, 2026 •

edited

Loading

Uh oh!

ionelmc commented Mar 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

akx commented Feb 26, 2026

Uh oh!

codecov bot commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

ionelmc commented Feb 26, 2026

Uh oh!

akx commented Feb 26, 2026

Uh oh!

ionelmc Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

ionelmc commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ionelmc commented Mar 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov bot commented Feb 26, 2026 •

edited

Loading

ionelmc commented Mar 17, 2026 •

edited

Loading