Skip to content

Conversation

@jenstroeger
Copy link
Owner

@jenstroeger jenstroeger commented Nov 1, 2025

Based on conversations with @behnazh and the How to keep Unit tests and Integrations tests separate in pytest blurb on StackOverflow, this change adds three new Makefile goals:

  • make test-unit: run the unit tests, just like we have so far;
  • make test-integration: run integration tests; and
  • make test-performance: run performance tests:
    ---------------------------------------------- benchmark: 1 tests ----------------------------------------------
    Name (time in us)        Min     Max    Mean  StdDev  Median     IQR  Outliers  OPS (Kops/s)  Rounds  Iterations
    ----------------------------------------------------------------------------------------------------------------
    test_something        3.0105  4.7433  3.0774  0.1769  3.0461  0.0246      3;10      324.9469     100          10
    ----------------------------------------------------------------------------------------------------------------
    
    Legend:
      Outliers: 1 Standard Deviation from Mean; 1.5 IQR (InterQuartile Range) from 1st Quartile and 3rd Quartile.
      OPS: Operations Per Second, computed as 1 / Mean
    

The existing goal make test runs all tests from all three categories.

Points to consider:

@danirivas
Copy link

Based on conversations with @behnazh and the How to keep Unit tests and Integrations tests separate in pytest blurb on StackOverflow, this change adds three new Makefile goals:

* `make test-unit`: run the [unit tests](https://en.wikipedia.org/wiki/Unit_testing), just like we have so far;

* `make test-integration`: run [integration tests](https://en.wikipedia.org/wiki/Integration_testing); and

* `make test-performance`: run [performance tests](https://en.wikipedia.org/wiki/Software_performance_testing):
  ```

The existing goal make test runs all tests from all three categories.

I'm 100% on board with splitting the tests in these 3 categories. Even if you want to always run all categories, at least you can easily control the order in which they do.

Unit tests are the bare minimum so they are always a pre-condition. Integration tests may run slower but only run after unit tests make sure you didn't break anything. And there's no point in running performance tests if you don't know yet your changes are even correct. So having these 3, and in this order, makes perfect sense.

That said, why split them by category in different directories if they are then marked with the same categories? Not against it, just wondering.

Points to consider:

* [ ]  Rename “performance” to “benchmark” because the pytest plugin is called [pytest-benchmark](https://github.com/ionelmc/pytest-benchmark)?

I like performance if they'll be used for checking regression in, well, performance. We use a benchmark to asses performance 🤓

But if there's no good or bad (i.e. no regression checks) then benchmark seems accurate.

* [ ]  Look into and add performance regression checks (see also [Consider tracking performance across package releases. #563](https://github.com/jenstroeger/python-package-template/issues/563)).

The main complexity I've always faced when trying to add this to my projects is that if the benchmark is too small or too unrealistic, I fear that the latency might fall within the error margin and be subject to too much variability.

I know this is a template for a package but unless I'm testing compute-bound functions, the benchmark should probably consider integration tests in a realistic scenario (i.e. warmed up DBs with enough data and plans executed). I'm always wary of automating this part because, when it comes to performance, I'd love to gather as many data points (and plot them!) as possible before drawing any conclusions.

Another idea I've been entertaining (and this might be off-topic for the scope of this PR and repo) is counting the number of queries that a function or endpoint executes. That number alone might not tell the full story but in case there's an unexpected latency spike, it can help understand why. For example, a function that loops over something and accidentally does thousand small queries instead of one big join or IN clause or looping over an ORM object with lazy relationships, which ends up with one query per row.

@jenstroeger
Copy link
Owner Author

Unit tests are the bare minimum so they are always a pre-condition. Integration tests may run slower but only run after unit tests make sure you didn't break anything.

I’m curious: for test-driven development, would you write primarily unit tests or would you consider writing some integration tests as well?

And there's no point in running performance tests if you don't know yet your changes are even correct.

Agreed, good point!

That said, why split them by category in different directories if they are then marked with the same categories? Not against it, just wondering.

If I didn’t and I wanted to find e.g. performance tests then I might have to rummage through a bunch of files to find those. I thought it’s just a helpful way of organizing the test files; and because I can reuse the same filenames I thought that’ll help with orienting myself in the test code 🤔

Now if pytest would allow me to mark folders that’d be useful… maybe?

I like performance if they'll be used for checking regression in, well, performance. We use a benchmark to asses performance 🤓

Fair point, agreed! And like we discussed in issue #563 having regression tracking would be great!

The main complexity I've always faced […]

I very much agree, and I’ve looked at these performance/regression tests as close siblings of the unit tests, not so much of the overall integration tests. And personally, I’d be more interested in the performance of this or that critical/relevant function… not all of them, anyway.

Another idea I've been entertaining (and this might be off-topic for the scope of this PR and repo) is […]

This sounds like something a (sampling) profiler can do, or some targeted instrumentation?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants