Remove some randomization in the example and in the code base. #344

lionelkusch · 2025-08-22T16:41:17Z

I try to improve the reproducibility of the example by setting seeds in the example and better management of the random generator in the code base.

However, there is still some randomisation in PermutationFeatureImportance and in Model_X_Knockoff.

codecov · 2025-08-22T16:54:07Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 98.07%. Comparing base (6eb643c) to head (c8734d1).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #344      +/-   ##
==========================================
- Coverage   98.08%   98.07%   -0.01%     
==========================================
  Files          22       22              
  Lines        1147     1144       -3     
==========================================
- Hits         1125     1122       -3     
  Misses         22       22

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

bthirion

The PR LGTM overall.

examples/plot_diabetes_variable_importance_example.py

tools/examples/debugger_script/try_reproducibility.py

jpaillard

I can't reproduce the reproducibility issue you describe for CFI and knockoffs. Can you point more precisely to the problematic example/test?

examples/plot_diabetes_variable_importance_example.py

lionelkusch · 2025-08-25T09:48:04Z

I can't reproduce the reproducibility issue you describe for CFI and knockoffs. Can you point more precisely to the problematic example/test?

The problem is with PFI and CFI. For PFI: if you run example: plot_pitfalls_permutation_importance.pyin parallel (the file: tools/examples/debugger_script/try_reproducibility.py), the second figure is always a bit different. Forknockoff, if you run example: plot_knockoff_aggregation.py` in parallel, the figure for the empirical FDR is always a bit different.

The difference is quite minor but there is always some variations.

jpaillard

Thanks for the details. I could reproduce the issue.

It seems that the problem comes from using nested parallel loops: parallel calls of try_reproducibility.run_joblib, which has inner parallelization of plot_knockoff_aggregation.single_run

The inner processes might unpredictably inherit some state of the parent.

To fix it you can use Parallel(n_jobs=<nb_of_jobs>, require='sharedmem') or simply set n_jobs=1 in try_reproducibility.

lionelkusch · 2025-08-25T14:32:02Z

I fixed the issue for plot_knockoff_aggregation.
I was missing setting some random seed in LassoCV.

lionelkusch · 2025-08-25T16:36:00Z

For plot_pitfalls_permutation_importance, I find that seaborn requires a seed for the barplot. This was the cause of the difference.

lionelkusch · 2025-08-25T17:50:00Z

There is still a bit of uncontrolled randomness for plot_knockoff_aggregation but it will be easier to debug it after reformatting with the new API.

lionelkusch · 2025-08-26T13:52:40Z

I updated the management of the seed because I forgot that it's better to use a range of values for setting a seed than to use a random generator. The problem is because the random generator can generate 2 times the same numbers.
I merged the branch with main.

examples/plot_2D_simulation_example.py

examples/plot_conditional_vs_marginal_xor_data.py

examples/plot_dcrt_example.py

examples/plot_diabetes_variable_importance_example.py

jpaillard

Thank you.
I have one last comment.

src/hidimstat/noise_std.py

tools/examples/debugger_script/try_reproducibility.py

bthirion

I have some very minor comments.
There is no need to make KFold random in the examples (Better use ShuffleSplit if we want a random splitter, but this is the user's choice then).

examples/plot_dcrt_example.py

examples/plot_fmri_data_example.py

examples/plot_pitfalls_permutation_importance.py

Co-authored-by: bthirion <[email protected]>

lionelkusch · 2025-09-23T09:21:31Z

Last review before merging.

jpaillard

To be consistent with the changes suggested in #360 we should avoid showcasing the use of RandomState in examples and replace them with np.random.default_rng

examples/plot_importance_classification_iris.py

examples/plot_knockoff_aggregation.py

examples/plot_knockoffs_wisconsin.py

Co-authored-by: Joseph Paillard <[email protected]>

jpaillard

Looks good, thx

lionelkusch added 3 commits August 22, 2025 18:27

Improve reproducbility example

72094ff

Improve reproducibility in code base

fb8acdb

Test for testing reproducibilities

93fd34f

lionelkusch requested review from bthirion and jpaillard August 22, 2025 16:41

lionelkusch mentioned this pull request Aug 22, 2025

Add tests on the example image #345

Draft

fix error

4261f07

bthirion reviewed Aug 22, 2025

View reviewed changes

examples/plot_diabetes_variable_importance_example.py Outdated Show resolved Hide resolved

tools/examples/debugger_script/try_reproducibility.py Show resolved Hide resolved

jpaillard reviewed Aug 24, 2025

View reviewed changes

examples/plot_diabetes_variable_importance_example.py Outdated Show resolved Hide resolved

lionelkusch added 3 commits August 25, 2025 10:29

Merge branch 'main' into PR_remove_randomize

7ea339f

homogenize managment of seed in examples

46e9961

Merge branch 'main' into PR_remove_randomize

b5a4b08

jpaillard reviewed Aug 25, 2025

View reviewed changes

fix randomize in plot_konckoff_aggregation

8085706

lionelkusch added 3 commits August 25, 2025 16:48

fix error in plot

e68a9a3

fix path

0b90779

fix seed for seaborn

8f4011f

lionelkusch added 4 commits August 26, 2025 12:17

improve seed setting

fc0d21f

change seed in methods

99ed1e6

Merge branch 'main' into PR_remove_randomize

6f98698

Fix seed in test and example

9ac2059

lionelkusch requested review from bthirion and jpaillard August 26, 2025 13:52

bthirion reviewed Aug 26, 2025

View reviewed changes

fix clone

bf4a909

jpaillard reviewed Sep 10, 2025

View reviewed changes

src/hidimstat/noise_std.py Show resolved Hide resolved

lionelkusch added 2 commits September 10, 2025 16:18

fix tests

999cf8f

fix tests

3531d06

lionelkusch requested review from bthirion and jpaillard September 10, 2025 15:32

jpaillard reviewed Sep 10, 2025

View reviewed changes

tools/examples/debugger_script/try_reproducibility.py Show resolved Hide resolved

jpaillard approved these changes Sep 10, 2025

View reviewed changes

bthirion reviewed Sep 10, 2025

View reviewed changes

lionelkusch and others added 8 commits September 11, 2025 10:31

Apply suggestions from code review

f1faa3c

Co-authored-by: bthirion <[email protected]>

Apply suggestions from code review

8ff07ea

Co-authored-by: bthirion <[email protected]>

Apply suggestions from code review

dfc1c9b

Co-authored-by: bthirion <[email protected]>

Merge branch 'main' into PR_remove_randomize

8a0b9b8

format document

a232fa6

Merge branch 'main' into PR_remove_randomize

6b5f521

fix example

31ba5f5

fix language

08fa0d4

lionelkusch requested review from bthirion and jpaillard September 23, 2025 09:21

fix order import

aba9798

jpaillard reviewed Sep 23, 2025

View reviewed changes

examples/plot_importance_classification_iris.py Outdated Show resolved Hide resolved

examples/plot_knockoff_aggregation.py Outdated Show resolved Hide resolved

examples/plot_knockoffs_wisconsin.py Outdated Show resolved Hide resolved

lionelkusch and others added 3 commits September 23, 2025 16:48

Apply suggestions from code review

ce15242

Co-authored-by: Joseph Paillard <[email protected]>

chnage random state to generator

368c0d8

fix example

c8734d1

jpaillard approved these changes Sep 23, 2025

View reviewed changes

bthirion approved these changes Sep 23, 2025

View reviewed changes

lionelkusch merged commit c276002 into mind-inria:main Sep 23, 2025
30 of 31 checks passed

lionelkusch deleted the PR_remove_randomize branch September 23, 2025 17:10

lionelkusch mentioned this pull request Sep 24, 2025

Policie of test: How to test randomize methods? #375

Open

Remove some randomization in the example and in the code base. #344

Remove some randomization in the example and in the code base. #344

Uh oh!

Conversation

lionelkusch commented Aug 22, 2025

Uh oh!

codecov bot commented Aug 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

bthirion left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

jpaillard left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

lionelkusch commented Aug 25, 2025

Uh oh!

jpaillard left a comment

Choose a reason for hiding this comment

Uh oh!

lionelkusch commented Aug 25, 2025

Uh oh!

lionelkusch commented Aug 25, 2025

Uh oh!

lionelkusch commented Aug 25, 2025

Uh oh!

lionelkusch commented Aug 26, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jpaillard left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

bthirion left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lionelkusch commented Sep 23, 2025

Uh oh!

jpaillard left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jpaillard left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov bot commented Aug 22, 2025 •

edited

Loading