-
Notifications
You must be signed in to change notification settings - Fork 12
[API 2]: CFI, PFI, LOCO #372
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #372 +/- ##
==========================================
+ Coverage 98.94% 99.06% +0.12%
==========================================
Files 23 21 -2
Lines 1424 1393 -31
==========================================
- Hits 1409 1380 -29
+ Misses 15 13 -2 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
jpaillard
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks good but the diff seems very large for this small change.
Is there a reason for all the other modifications?
|
I reorganize a bit the parameter in the init and move the docstring to the class because in all the other classes, I plan to do this. By looking into more details, I miss some parts being added. I will add it and ask you to review it after. Sorry for it. |
bthirion
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR is definitely an improvement, thx.
bthirion
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thx for the progress. Please find a few suggestions enclosed.
Co-authored-by: bthirion <[email protected]>
src/hidimstat/_utils/utils.py
Outdated
| return partial( | ||
| nadeau_bengio_ttest, | ||
| popmean=0, | ||
| test_frac=0.1 / 0.9, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One solution is to make this function a class function, only called during the .importance(X_test, y_test) and to add a fitted attribute self.n_train_ that is set during the fit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer to let the user define it for the moment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a possible solution to add a required argument test_frac to the instantiation. But since this comes for free in the fit/importance process, I would suggest that hidimstat takes care of it to avoid mistakes from users and limit the number of required arguments in the initialization of the class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not supported by the other statistical tests.
We should support the different statistical tests or choose one specific and not let the choice of the users.
I don't see the point of the moment to have a different behaviour for this specific test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It has too. You can see the section inference of the user guide.
When cross-validation (for instance, k-fold) is used to estimate CFI, the loss differences obtained from different folds are not independent. Consequently, performing a simple t-test on the loss differences is not valid. This issue can be addressed by a corrected t-test accounting for this dependence, such as the one proposed in Nadeau and Bengio[3].
Co-authored-by: Joseph Paillard <[email protected]>
bthirion
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No further comment than those raised by @jpaillard
|
Regarding the statistical test, the current issue is that the cross-validation scheme has not yet been implemented. Currently, the statistical test is performed using a single train/test split. In that case, when considering the loss values for each individual sample of the test set, they can be considered as independent and we could use ttest instead of nb-ttest as the default. The NB-t-test actually needs to be the default when CV is used, and losses over test sets cannot be considered as independent. The comment regarding the nb-ttest implementation remains valid; however, the test fraction shouldn't be hardcoded. I will follow up on #449 |
Let me know if that's good for you. Sorry for the confusion. |
|
@jpaillard I let you deal with this PR. |
|
I take the opportunity to rename the function |
bthirion
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very minor stuff pending. Thx !
Co-authored-by: bthirion <[email protected]>
|
Sorry again for the confusion regarding the default test. I explained the choice a few comments above.
|
OK, makes sense. |
|
I think it's OK for merging. |
Update the model of CFI, PFI and LOCO for API 2.