Batch test execution #324

jmafoster1 · 2025-05-16T14:39:15Z

For the EA case study, trying to execute 87000 test cases fills up the RAM. The run_tests_in_batches method was meant to fix this, but didn't. I really want to be able to run all 87K tests at once, so I dug a bit deeper into this. The problem is that, when we execute a test case, the estimator.model attribute becomes non-None. It is this that fills up the RAM, but we don't use it here, so I just reset it to None after the test is executed. I'm not sure how I feel about this as a solution going forward, so any better suggestions would be much appreciated. I think test.estimator.model is used elsewhere, though (perhaps as part of test adequacy?) but we should be able to do everything we need to do before resetting it to None. On the other hand, it just feels a bit inelegant somehow.

github-actions · 2025-05-16T14:40:15Z

🦙 MegaLinter status: ✅ SUCCESS

Descriptor	Linter	Files	Fixed	Errors	Elapsed time
✅ PYTHON	black	31		0	0.96s
✅ PYTHON	pylint	31		0	5.81s

See detailed report in MegaLinter reports

MegaLinter is graciously provided by

codecov · 2025-05-16T14:43:15Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 95.69%. Comparing base (40faa83) to head (fcab481).
Report is 15 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #324      +/-   ##
==========================================
+ Coverage   93.84%   95.69%   +1.84%     
==========================================
  Files          27       27              
  Lines        1593     1602       +9     
==========================================
+ Hits         1495     1533      +38     
+ Misses         98       69      -29

Files with missing lines	Coverage Δ
causal_testing/__main__.py	`96.96% <100.00%> (+13.63%)`	⬆️
...esting/estimation/abstract_regression_estimator.py	`95.83% <100.00%> (-0.17%)`	⬇️
...ausal_testing/estimation/cubic_spline_estimator.py	`96.55% <100.00%> (ø)`
..._testing/estimation/linear_regression_estimator.py	`100.00% <100.00%> (ø)`
...esting/estimation/logistic_regression_estimator.py	`100.00% <100.00%> (ø)`
causal_testing/main.py	`95.89% <100.00%> (+12.48%)`	⬆️

... and 1 file with indirect coverage changes

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 15513be...fcab481. Read the comment docs.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

f-allian · 2025-05-20T09:48:59Z

@jmafoster1 I agree, definitely not the most elegant solution.

If it is because of the test_case.estimator.model attribute, it looks like it's coming from the abstract RegressionEstimator class where self.model is defined and later used for the prediction stages.

Can you explain a few things for me first, you say:

The problem is that, when we execute a test case, the estimator.model attribute becomes non-None. It is this that fills up the RAM, but we don't use it here, so I just reset it to None after the test is executed

Isn't it that the test case can be executed because the estimator.model attribute isn't non-None?
Why isn't the estimator.model used here? It looks like it's needed to run the regression algorithm on the input data.
Using this hacky method of setting estimator.model to None, how much time does it actually save you compared to not doing it?
This isn't too important but I'm curious - what batch sizes have you tested?

jmafoster1 · 2025-05-20T10:13:55Z

1 and 2. Yes and no. Of course, we need the model to evaluate the test case, but we don't need it to be assigned to test_case.estimator to do that. The basic process at the moment is

Train the model
Use this to evaluate the test
Assign it to the instance of estimator so it can be used later if needed (we don't need this here)
The problem here is that I physically can't run all 87000 tests in one hit without this. What I'd been doing before was breaking the causal tests into two batches, running them separately, and concatenating the results. This has a comparable runtime, but is annoying as I have to run 3 things instead of just one and have to babysit it a bit more (and discover through trial and error how many batches I need). With this, I can just press go and come back when it's done, so it's a lot easier.
I set the default to 100, which is completely arbitrary. That seems to use negligible extra RAM and runs acceptably fast, so I left it at that.

…CausalTestingFramework into 323-batch-test-execution

jmafoster1 · 2025-05-20T13:38:57Z

@f-allian I've removed the estimator.model property. It was only being used in tests, so I've either removed redundant assertions or modified them accordingly. There was only one that really needed the statsmodels model explicitly, which was for Richard's validation stuff, so I've made the train_model method public so that it can just be accessed directly.

f-allian

@jmafoster1 Excellent, nice one Michael!

f-allian · 2025-06-02T08:49:16Z

@jmafoster1 Is this ready to be merged?

jmafoster1 · 2025-06-02T08:51:19Z

Yeah I think so. I must have missed your review or something

jmafoster1 · 2025-07-09T07:59:01Z

Closes #323

jmafoster1 added 2 commits May 15, 2025 16:07

Changed run_tests_in_batches. Still takes up all the memory

d6ba0b4

Fixed it by removing the model from each test case after execution

fe1ce51

jmafoster1 requested a review from f-allian May 16, 2025 14:39

pylint

402dafb

jmafoster1 added 6 commits May 16, 2025 16:02

Extra test for codecov

11d39aa

Test running in batches produces equivalent output to running normally

0509fa2

Non-silent exception raising

c94733b

Silent exception raising

8b21043

Added batch size

63c68e7

Updated batch size so as not to polute the state

cdef513

jmafoster1 added 3 commits May 20, 2025 14:08

Stopped storing regression estimator models

e817f40

Stopped storing regression estimator models

f0b2dec

Merge branch '323-batch-test-execution' of github.com:CITCOM-project/…

0b126f3

…CausalTestingFramework into 323-batch-test-execution

jmafoster1 marked this pull request as ready for review May 20, 2025 13:38

f-allian approved these changes May 20, 2025

View reviewed changes

Merge branch 'main' into 323-batch-test-execution

fcab481

jmafoster1 merged commit 7da3aef into main Jun 2, 2025
13 checks passed

jmafoster1 deleted the 323-batch-test-execution branch June 2, 2025 08:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Batch test execution #324

Batch test execution #324

Uh oh!

jmafoster1 commented May 16, 2025

Uh oh!

github-actions bot commented May 16, 2025 •

edited

Loading

Uh oh!

codecov bot commented May 16, 2025 •

edited

Loading

Uh oh!

f-allian commented May 20, 2025

Uh oh!

jmafoster1 commented May 20, 2025

Uh oh!

jmafoster1 commented May 20, 2025

Uh oh!

f-allian left a comment

Uh oh!

f-allian commented Jun 2, 2025

Uh oh!

Uh oh!

jmafoster1 commented Jun 2, 2025

Uh oh!

jmafoster1 commented Jul 9, 2025

Uh oh!

Uh oh!

Batch test execution #324

Batch test execution #324

Uh oh!

Conversation

jmafoster1 commented May 16, 2025

Uh oh!

github-actions bot commented May 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🦙 MegaLinter status: ✅ SUCCESS

Uh oh!

codecov bot commented May 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

f-allian commented May 20, 2025

Uh oh!

jmafoster1 commented May 20, 2025

Uh oh!

jmafoster1 commented May 20, 2025

Uh oh!

f-allian left a comment

Choose a reason for hiding this comment

Uh oh!

f-allian commented Jun 2, 2025

Uh oh!

Uh oh!

jmafoster1 commented Jun 2, 2025

Uh oh!

jmafoster1 commented Jul 9, 2025

Uh oh!

Uh oh!

github-actions bot commented May 16, 2025 •

edited

Loading

codecov bot commented May 16, 2025 •

edited

Loading