Add selection with fdr and associate test #361

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Merged

lionelkusch merged 63 commits into mind-inria:main from lionelkusch:PR_selection

Oct 10, 2025

Collaborator

lionelkusch commented Aug 28, 2025 •

edited

Loading

Add the method for FDR and the test associate test for these two methods.
Fix bug in selection

lionelkusch added 5 commits

August 28, 2025 10:48


          add method for selection base on FDR

9c2d77c


          fix default of the qunatile aggragation

5314c37


          fix selection

be837e0


          update docstring

1a42592


          fix docstring

5854f2e

codecov bot commented Aug 29, 2025 •

edited

Loading

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 99.43%. Comparing base (7d642a4) to head (f10bf06).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #361      +/-   ##
==========================================
+ Coverage   97.87%   99.43%   +1.56%     
==========================================
  Files          22       22              
  Lines        1223     1247      +24     
==========================================
+ Hits         1197     1240      +43     
+ Misses         26        7      -19

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

This was referenced Aug 29, 2025

Add Conditional Randomization Test #359

Draft

[API 2]: Model X knockoff #367

Merged


          Add test for 1 test_score

3c08f75

bthirion reviewed

View reviewed changes

Collaborator

bthirion left a comment

fdr control should be based on p-values or e-values only.
LGTM otherwise.

src/hidimstat/base_variable_importance.py Outdated Show resolved Hide resolved

lionelkusch commented

View reviewed changes

src/hidimstat/base_variable_importance.py Outdated Show resolved Hide resolved


          change the usage of test fdr without aggregation

7f3a117

lionelkusch commented

View reviewed changes

src/hidimstat/base_variable_importance.py Outdated Show resolved Hide resolved

src/hidimstat/base_variable_importance.py Outdated Show resolved Hide resolved

lionelkusch added 2 commits

September 1, 2025 18:51


          remove a print in test

21250b4


          Update selection

17d9d95

lionelkusch added the API 2 label

lionelkusch added 3 commits

September 9, 2025 18:53


          remove function for knockoff

e8134d8


          update selection_fdr

51685e8


          fix selection

39ec78f

lionelkusch commented

View reviewed changes

src/hidimstat/base_variable_importance.py Outdated Show resolved Hide resolved

lionelkusch added 7 commits

September 10, 2025 11:21


          improve selection

f3ff485


          fix some part of the selection

817af11


          Merge branch 'main' into PR_selection

846296a


          fix test

7e256c2


          try to fix test

5cc731c


          fix seed in generation of data

90e1425


          fix docstring

21d0614

bthirion reviewed

View reviewed changes

Collaborator

bthirion left a comment

oops, I had a few comments not pushed yet. HTH.

src/hidimstat/base_variable_importance.py Outdated Show resolved Hide resolved

src/hidimstat/base_variable_importance.py Outdated Show resolved Hide resolved

src/hidimstat/base_variable_importance.py Outdated Show resolved Hide resolved

lionelkusch added 2 commits

September 11, 2025 11:17


          Fix attribute in base_variable_importance

5e19e1b


          change name

c0af81a

lionelkusch requested a review from bthirion

October 2, 2025 17:13


          change defautl value

3b89e1e

bthirion reviewed

View reviewed changes

Collaborator

bthirion left a comment

Just started a pass.

src/hidimstat/base_variable_importance.py Outdated Show resolved Hide resolved

src/hidimstat/base_variable_importance.py Show resolved Hide resolved

src/hidimstat/base_variable_importance.py Outdated Show resolved Hide resolved

src/hidimstat/base_variable_importance.py Outdated Show resolved Hide resolved

src/hidimstat/base_variable_importance.py Outdated Show resolved Hide resolved

src/hidimstat/base_variable_importance.py Outdated Show resolved Hide resolved

jpaillard reviewed

View reviewed changes

src/hidimstat/base_variable_importance.py Outdated Show resolved Hide resolved

src/hidimstat/base_variable_importance.py Show resolved Hide resolved

lionelkusch and others added 12 commits

October 3, 2025 14:30


          Update src/hidimstat/base_variable_importance.py

79a58b6

Co-authored-by: bthirion <[email protected]>


          Update src/hidimstat/base_variable_importance.py

7e5442b

Co-authored-by: bthirion <[email protected]>


          Update src/hidimstat/base_variable_importance.py

9da3607

Co-authored-by: bthirion <[email protected]>


          Update src/hidimstat/base_variable_importance.py

ed39b3d

Co-authored-by: Joseph Paillard <[email protected]>


          update following the comments

d86644d


          fix bug


          Merge branch 'main' into PR_selection

626e47a


          selection one criteria

b28965c


          fix tests

c7e8d69


          fix format

529d28a


          fix k_lowest

b633e15


          Merge branch 'main' into PR_selection

b02a2e9

lionelkusch mentioned this pull request

Selection of importance or pvalue with more than 1 dimension. #480

Open

lionelkusch requested review from bthirion and jpaillard

October 9, 2025 16:21

bthirion reviewed

View reviewed changes

Collaborator

bthirion left a comment

Thx for taking care of that.
I have a few simplications suggestions.

src/hidimstat/base_variable_importance.py

    
                          Selects features with importance scores above the specified threshold.

                      threshold_pvalue : float, optional, default=None

                          Selects features with p-values below the specified threshold.

                      threshold_max : float, default=None

Collaborator

bthirion Oct 10, 2025

I'm not sure whether this argument really makes sense ?
I think I would have a unique threshold argument for this function.

Collaborator Author

lionelkusch Oct 10, 2025

It's because sometimes, we want to have the maximum or the minimum.

Collaborator Author

lionelkusch Oct 10, 2025

see issue ##481

src/hidimstat/base_variable_importance.py

    
                          Selects features based on a specified percentile of p-values.

                      threshold_max : float, default=0.05

                          Selects features with p-values below the specified maximum threshold (0 to 1).

                      threshold_min : float, default=None

Collaborator

bthirion Oct 10, 2025

similarly, I don't see any use case for threshold_min here.

Collaborator Author

lionelkusch Oct 10, 2025

The first idea is to propose a generic way of selection.
My first idea of using it is to have a selecting the feature to discard.

Collaborator Author

lionelkusch Oct 10, 2025

see issue ##481

src/hidimstat/base_variable_importance.py

    
                          Selects features with p-values below the specified maximum threshold (0 to 1).

                      threshold_min : float, default=None

                          Selects features with p-values above the specified minimum threshold (0 to 1).

                      alternative_hypothesis : bool, default=False

Collaborator

bthirion Oct 10, 2025

I don't see the use case for alternative hypothesis.

Collaborator Author

lionelkusch Oct 10, 2025

This was present in the EnCluDL, I add the option for keeping the same possibilities.

Collaborator Author

lionelkusch Oct 10, 2025

see issue ##481

src/hidimstat/base_variable_importance.py

    
                      reshaping_function: callable or None, default=None

                          Optional reshaping function for FDR control methods.

                          If None, defaults to sum of reciprocals for 'bhy'.

                      alternative_hippothesis: bool or None, default=False

Collaborator

bthirion Oct 10, 2025

Same thing here, I don't see any reason to consider an alternative hypothesis. This is because importance tests are all one-sided tests that test whether importance is greater 0 (=significantly different from 0, in that case).

Collaborator Author

lionelkusch Oct 10, 2025

This was present in the EnCluDL, I add the option for keeping the same possibilities.

Collaborator

bthirion Oct 10, 2025

Yes, but there are good reasons for that: EncluDL yields a signed statistic, not dCRT.

Collaborator Author

lionelkusch Oct 10, 2025

see issue ##481

src/hidimstat/distilled_conditional_randomization_test.py Show resolved Hide resolved

src/hidimstat/distilled_conditional_randomization_test.py

    
                  random_state=None,

                  reuse_screening_model=True,

                  k_best=None,

                  k_lowest=None,

Collaborator

bthirion Oct 10, 2025

k_lowest is hard to interpret: it only makes sense because we're considering p-values.

Collaborator Author

lionelkusch Oct 10, 2025

Users won't use it if they can't interpret.

Collaborator Author

lionelkusch Oct 10, 2025

For DCRT, only pvalue is considered.

jpaillard reviewed

View reviewed changes

Collaborator

jpaillard left a comment

Looks almost ready.

I agree with the comments regarding simplifying the signature of selection functions.
I suggest simplifying the smoke test: one test per function with multiple asserts to explore branching seems enough to me and would cut duplicated code.
It would be good to add an example illustrating how to use the new functions. No need to do it here, but could you open an issue for that?

test/test_base_variable_importance.py Outdated Show resolved Hide resolved

test/test_base_variable_importance.py

    
                  [0, 2],

                  ids=["default_seed", "another seed"],

              )

              class TestSelection:

Collaborator

jpaillard Oct 10, 2025

The tests in this class are smoke tests and have a lot of duplicated code. I think it would be ok to gather all the smoke tests that explore the different selections in one test, or maybe 2 to separate importance_selection and p_value_selection.

Collaborator Author

lionelkusch Oct 10, 2025 •

edited

Loading

There are not smoke tests because they test the result directly, the values into the array and not only the shape.

It's better to have only one assertion by test. This type of test is a call unit test and they shouldn't be gathered. To group them, I use classes for it.
I don't see the duplication of the code. Each test, test one specific parameter.

Collaborator Author

lionelkusch Oct 10, 2025

see issue #483

test/test_base_variable_importance.py

Collaborator

jpaillard Oct 10, 2025

I think it would also be good to have a "behaviour test" with simulated data.
Ideally, in high dimensions, with a method that is not computationally costly, to show that it reduces the number of false discoveries.

Collaborator Author

lionelkusch Oct 10, 2025

In issue #375, the "behaviour tests", also call system test/user acceptance test, were not defined for the moment.
I will open an issue in regard to it.

Collaborator Author

lionelkusch Oct 10, 2025

see issue #484

lionelkusch added 3 commits

October 10, 2025 16:35


          remove randomization in tests

246bfb6


          move all the tests for base importance in one file

62f71a4


          fix seed

f10bf06

This was referenced Oct 10, 2025

Simplification of selections #481

Open

Add an example of selection #482

Open

lionelkusch merged commit 894ff9d into mind-inria:main

24 checks passed

lionelkusch deleted the PR_selection branch

October 10, 2025 15:05

This was referenced Oct 10, 2025

Group unit test of selection #483

Open

Add "behaviour test" with simulated data for selection #484

Open

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

API 2