[API 2]: Model X knockoff #367

lionelkusch · 2025-08-29T12:03:35Z

This PR is based on the PR :

lionelkusch · 2025-10-02T17:12:43Z

Sure, but what I mean is that the Knockoffs and the dCRT for instance have different nature since one is aiming the FDR and the other one is p-values. Therefore, I think that the default selection for each should be the one is it made for, or at least include a warning or something to indicate the default control.

Originally posted by @AngelReyero in #361 (comment)

To follow up on @AngelReyero, since knockoffs have quite a different selection procedure (for computing the threshold..), should we consider overwriting the selection_fdr method of BaseVariableImportance in #367?

Also, should an Error be raised if the user passes a p-value threshold to the .selection function?

Originally posted by @jpaillard in #361 (comment)

[skip tests]

jpaillard

Looks overall good. I mostly have comments for naming and organisation.
I would suggest to set by default statistical_test='lcd', rename lasso_statistic_with_sampling --> ModelXKnockoff.lasso_coefficient_difference and make it an internal method of the class ModelXKnockoff. IMO the method should be attached to the class, and use the attribute self.estimator (see below).

To the same extend, the class should have a parameter estimator in order to expose the Lasso model instead of keeping it under the hood. Especially since it uses a particularly high default max_iter=200000,

jpaillard · 2025-10-20T21:02:16Z

examples/plot_knockoff_aggregation.py

 n_bootstraps = 25
 # number of jobs for repetition of the method
-n_jobs = 2
+n_jobs = 1


I think it makes sense to illustrate in the example where users should use parallelization and to do so using n_jobs>1

If we decide not to show it I would remove n_jobs and rely on the default

n_jobs=2 (or even 4) is a better option IMHO.

I get some issue when I was running with n_jobs=2.

What kind of issues ?

For running the example, I got some errors in the last three commits. This was the solution to my problem.

The creation of the class makes a track of some states, which uses too much memory and create an error.

May be related to #356.
We still have the problem of nested parallel loops in the example generation. I would still argue that showcasing parallelization in the example is valuable.

jpaillard · 2025-10-20T21:07:35Z

src/hidimstat/statistical_tools/lasso_test.py

+
+def preconfigure_LassoCV(estimator, X, X_tilde, y, n_alphas=20):
+    """
+    Configure the estimator for Model-X knockoffs.


I think the docstring should focus more on L43-44: the regularization path is defined in a data-dependent way.
The paragraph in Notes should be the central part of the docstring. Is there a reference?

@bthirion Do you have some reference?

src/hidimstat/statistical_tools/lasso_test.py

jpaillard · 2025-10-20T21:13:44Z

src/hidimstat/statistical_tools/lasso_test.py

+    return estimator
+
+
+def lasso_statistic_with_sampling(


Similar to D0CRT (regression_test, logistic_test), IMO lasso_statistic should be a method of the knockoff class. As its signature suggests, it is KO-specific. it would be more intuitive to place it in the same file, under the class.

From what I understand, the actual implementation of the test is the original one but there are propositions for modifying.
I want it to let the possibility for users to modify it.

Similar to D0CRT, we can consider extending to other test statistics, but they should still be methods of the class.

These methods will not be used anywhere else

Would be simplified if the class state were used

I wanted it to separate the configuration of Lasso from knockoff because I don't like to add the parameter preconfigure.
I put everything to a knockoff.

src/hidimstat/knockoffs.py

bthirion

Only minor things remaining, thx !

bthirion · 2025-10-21T07:11:14Z

examples/plot_knockoff_aggregation.py

 n_bootstraps = 25
 # number of jobs for repetition of the method
-n_jobs = 2
+n_jobs = 1


n_jobs=2 (or even 4) is a better option IMHO.

examples/plot_knockoff_aggregation.py

src/hidimstat/statistical_tools/lasso_test.py

test/test_knockoff.py

lionelkusch · 2025-10-21T14:19:11Z

To the same extend, the class should have a parameter estimator in order to expose the Lasso model instead of keeping it under the hood. Especially since it uses a particularly high default max_iter=200000,

The lasso is present at the user API level. I use the default value of the code. I can modify it in order to remove all these default parameters.

codecov · 2025-10-21T14:30:46Z

Codecov Report

❌ Patch coverage is 99.32886% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 99.19%. Comparing base (5f90dfa) to head (55c20c1).
⚠️ Report is 1 commits behind head on main.

Files with missing lines	Patch %	Lines
src/hidimstat/knockoffs.py	99.26%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #367      +/-   ##
==========================================
- Coverage   99.31%   99.19%   -0.12%     
==========================================
  Files          24       24              
  Lines        1309     1364      +55     
==========================================
+ Hits         1300     1353      +53     
- Misses          9       11       +2

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

[skip tests]

jpaillard · 2025-10-30T10:58:34Z

src/hidimstat/statistical_tools/multiple_testing.py

Please chekc the logic. This fixes the error reported in #520

examples/plot_knockoff_aggregation.py

lionelkusch · 2025-10-31T12:23:33Z

examples/plot_knockoff_aggregation.py

-# Number of variables
-n_features = 150
-# Correlation parameter
+n = 300


If you can add the description of the parameter as before, it should be better.

I prefer n_samples, n_features, which are self-explaining.

lionelkusch · 2025-10-31T12:26:23Z

examples/plot_knockoffs_wisconsin.py

        max_iter=1000,
    ),
    random_state=0,
-    n_repeats=1,


For example, it's sometimes better to declare some parameters even if the value is the same as the default value.

Agreed, but in that case, we just want to show the vanilla knockoff, not the aggregation.
So the user can simply ignore the existence of this parameter.

lionelkusch · 2025-10-31T12:31:57Z

src/hidimstat/knockoffs.py

        return test_statistic

+    @staticmethod
+    def knockoff_threshold(test_score, fdr=0.1):


This is a provide method.

Why do you don't add the other methods (_empirical_knockoff_pval and _empirical_knockoff_eval) into the class?

I guess you mean private? If so, I am not sure that it should be a private method. As shown in the example, having access to the knockoff threshold can be quite useful for visualization / understanding the data.

For the second comment I agree, they can all be class methods

yes, I meant private.

What is the point of using a staticmethod here ? it could be a standard class method since it needs to access test_scores anyhow ?

Same question for the next 2 static methods.
Note that it may be simply a poor understanding on my side.

lionelkusch · 2025-10-31T12:47:17Z

src/hidimstat/statistical_tools/multiple_testing.py

+    k_star = 1
+    # The for loop over all e-values could be optimized by considering a descending list
+    # and stopping when the condition is not satisfied anymore.
+    for k, e_k in enumerate(evals_sorted, start=1):


Add comments because it is quite unusual in python that the enumeration starts at 1.
Moreover, if you add the link to the equation in the paper, it will be better.

lionelkusch · 2025-10-31T12:50:30Z

src/hidimstat/statistical_tools/multiple_testing.py

-    for i in range(n_features - 1, -1, -1):
-        if evals_sorted[i] >= n_features / (fdr * (i + 1)):
-            selected_index = i
-            break


This way of finding the maximum is more optimising than your new implementation.

I know, but it's also broken. I replaced it with a non-optimized version that is easy to read and clearly reflects Equation 5 of Wang and Ramdas (2022).
This is not a critical increase in computation; it is simply a for loop with an if condition at each step of the loop. I still added a comment mentioning that it could be optimized.

lionelkusch · 2025-10-31T12:57:22Z

src/hidimstat/statistical_tools/multiple_testing.py

+    for k, e_k in enumerate(evals_sorted, start=1):
+        if k * e_k >= n_features / fdr:
+            k_star = k
+    if k_star <= n_features:


Suggested change

if k_star <= n_features:

if k_star-1 <= n_features:

See issue #520.
By the way, it will be better to do a PR only on this modification.

I think it should be k_star otherwise you can end up with k_star = n_features + 1 which is problematic.

Actually, I think this test is not necessary; the condition is always fulfilled because the possible range of values k_star can take in the for loop is always met.

bthirion

This looks great. I only have relatively minor comments left.

bthirion · 2025-11-04T21:56:15Z

examples/plot_knockoff_aggregation.py

-# Number of variables
-n_features = 150
-# Correlation parameter
+n = 300


I prefer n_samples, n_features, which are self-explaining.

examples/plot_knockoff_aggregation.py

src/hidimstat/knockoffs.py

bthirion · 2025-11-04T22:14:42Z

src/hidimstat/knockoffs.py

        return test_statistic

+    @staticmethod
+    def knockoff_threshold(test_score, fdr=0.1):


What is the point of using a staticmethod here ? it could be a standard class method since it needs to access test_scores anyhow ?

bthirion · 2025-11-04T22:16:06Z

src/hidimstat/knockoffs.py

        return test_statistic

+    @staticmethod
+    def knockoff_threshold(test_score, fdr=0.1):


Same question for the next 2 static methods.
Note that it may be simply a poor understanding on my side.

Co-authored-by: bthirion <[email protected]>

jpaillard · 2025-11-05T07:44:14Z

self.importances_ is an array of shape (n_repeats, n_features) whereas their argument, test_scores, is an array of shape (n_features,).
Each of these functions is designed to operate on a single knockoff repeat and implement formulas from the "vanilla KO" paper.

To transform them to standard class methods, we would need to move the for loops from the fdr_selection or importance methods to these functions. Let me know if you would prefer that.

bthirion · 2025-11-05T13:09:39Z

self.importances_ is an array of shape (n_repeats, n_features) whereas their argument, test_scores, is an array of shape (n_features,). Each of these functions is designed to operate on a single knockoff repeat and implement formulas from the "vanilla KO" paper.

To transform them to standard class methods, we would need to move the for loops from the fdr_selection or importance methods to these functions. Let me know if you would prefer that.

OK, I see. I think we need to brainstorm a little bit on this ---i.e. we're likely going to break the API in the future. But Let's merge the PR as is.

lionelkusch changed the title ~~Reformat Model X knockoff with version 2 of API~~ API 2: Model X knockoff Aug 29, 2025

lionelkusch mentioned this pull request Sep 8, 2025

API 2: (2/4:test) Model X knockoff #384

Draft

lionelkusch added the API 2 Refactoring following the second version of API label Sep 9, 2025

jpaillard mentioned this pull request Oct 2, 2025

Add selection with fdr and associate test #361

Merged

update knockoff with new API

edeab86

lionelkusch force-pushed the PR_knockoff branch from 6038de1 to edeab86 Compare October 20, 2025 12:30

lionelkusch added 5 commits October 20, 2025 14:33

fix format

4405a6e

fix test

30f23cc

fix examples

25233d5

[skip tests]

cdf7a1e

try fix example

dec0bcc

[skip tests]

lionelkusch requested review from bthirion and jpaillard October 20, 2025 13:21

lionelkusch marked this pull request as ready for review October 20, 2025 13:30

jpaillard reviewed Oct 20, 2025

View reviewed changes

bthirion reviewed Oct 21, 2025

View reviewed changes

rename function and update default

2ac2ec8

lionelkusch mentioned this pull request Oct 21, 2025

fdp_power should take binary vectors instead of indices #495

Open

lionelkusch added 2 commits October 21, 2025 16:06

rename model_x_knockof in test

b0bd4cd

Merge branch 'main' into PR_knockoff

19481e9

fix tests

68d5e9d

lionelkusch added 6 commits October 22, 2025 18:10

move function to knockoff

93e94a0

update the tests

1b14f6d

fix order of import

6ee6c75

Merge branch 'main' into PR_knockoff

947cc1b

fix time

ec1d19b

fix example

a93b36e

[skip tests]

change error

3590201

jpaillard changed the title ~~API 2: Model X knockoff~~ [API 2]: Model X knockoff Oct 30, 2025

fix eBH proceedure, modify

cd17fd8

jpaillard mentioned this pull request Oct 30, 2025

[BUG] e-BH always returns np.inf #520

Closed

jpaillard reviewed Oct 30, 2025

View reviewed changes

jpaillard requested a review from bthirion October 30, 2025 10:59

jpaillard added 7 commits October 30, 2025 13:10

fix test

62422b6

try limit compute for minimal test

0b08db4

sphinx doesn't like \left

15a743c

try fix rendering

ac5d334

another try for rendeting

6626230

use raw string

868541d

update thumbnail

0ee1897

This was linked to issues Oct 30, 2025

[BUG] e-BH always returns np.inf #520

Closed

rename n_repeat to n_repeats #505

Closed

lionelkusch commented Oct 31, 2025

View reviewed changes

jpaillard added 2 commits October 31, 2025 15:44

pass reviews

70e9455

make methods internal

abdde61

bthirion reviewed Nov 4, 2025

View reviewed changes

jpaillard and others added 2 commits November 5, 2025 08:37

Update src/hidimstat/knockoffs.py

485438d

Co-authored-by: bthirion <[email protected]>

n_samples

2935fbe

jpaillard requested a review from bthirion November 5, 2025 07:44

Merge branch 'main' into PR_knockoff

55c20c1

jpaillard merged commit 6067152 into mind-inria:main Nov 5, 2025
24 checks passed

jpaillard deleted the PR_knockoff branch November 5, 2025 13:56

[API 2]: Model X knockoff #367

[API 2]: Model X knockoff #367

Uh oh!

Conversation

lionelkusch commented Aug 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lionelkusch commented Oct 2, 2025

Uh oh!

jpaillard left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

bthirion left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lionelkusch commented Oct 21, 2025

Uh oh!

codecov bot commented Oct 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lionelkusch commented Aug 29, 2025 •

edited

Loading

codecov bot commented Oct 21, 2025 •

edited

Loading