-
Notifications
You must be signed in to change notification settings - Fork 12
[DOC] Section 1 of user guide/definition of concepts #408
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 7 commits
b2a620d
240d499
9a71090
0eb1d1d
84aadca
c5f4c3a
e0bb238
2b9e618
f5b8afb
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -3,4 +3,116 @@ | |||||||||||||||||
|
|
||||||||||||||||||
| ====================== | ||||||||||||||||||
| Definition of concepts | ||||||||||||||||||
| ====================== | ||||||||||||||||||
| ====================== | ||||||||||||||||||
|
|
||||||||||||||||||
| Variable Importance | ||||||||||||||||||
| ------------------- | ||||||||||||||||||
|
|
||||||||||||||||||
| Global Variable Importance (VI) aims to assign a measure of | ||||||||||||||||||
| relevance to each feature :math:`X^j` with respect to a target :math:`y` in the | ||||||||||||||||||
| data-generating process. In Machine Learning, it can be seen as a measure | ||||||||||||||||||
| of how much a variable contributes to the predictive power of a model. We | ||||||||||||||||||
| can then define "important" variables as those whose absence degrades | ||||||||||||||||||
| the model's performance :footcite:p:`Covert2020`. | ||||||||||||||||||
|
|
||||||||||||||||||
| So if ``VI`` is a variable importance method, ``X`` a variable matrix and ``y`` | ||||||||||||||||||
| the target variable, the importance of all the variables | ||||||||||||||||||
| can be estimated as follows: | ||||||||||||||||||
|
|
||||||||||||||||||
| .. code-block:: | ||||||||||||||||||
|
|
||||||||||||||||||
| # instantiate the object | ||||||||||||||||||
| vi = VI() | ||||||||||||||||||
| # fit the models in the method | ||||||||||||||||||
| vi.fit(X, y) | ||||||||||||||||||
| # compute the importance and the pvalues | ||||||||||||||||||
| importance = vi.importance(X, y) | ||||||||||||||||||
| # get importance for each feature | ||||||||||||||||||
| importance = vi.importances_ | ||||||||||||||||||
|
|
||||||||||||||||||
| It allow us to rank the variables from more to less important. | ||||||||||||||||||
|
|
||||||||||||||||||
| Here, ``VI`` can be a variable importance method implemented in HiDimStat, | ||||||||||||||||||
| such as :class:`hidimstat.LOCO` (other methods will support the same API | ||||||||||||||||||
|
||||||||||||||||||
| soon). | ||||||||||||||||||
|
|
||||||||||||||||||
| Variable Selection | ||||||||||||||||||
| ------------------------------- | ||||||||||||||||||
|
||||||||||||||||||
| ------------------------------- | |
| ------------------ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| that are related to the output, even if it is caused by spurius correlation. They | |
| that are related to the output, even if it is caused by spurious correlation. They |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| that are related to the output, even if it is caused by spurius correlation. They | |
| are related with testing if :math:`X^j\perp\!\!\!\!\perp Y`. | |
| that are related to the output, even if it is caused by spurius correlation. They | |
| consist of testing whether :math:`X^j\perp\!\!\!\!\perp Y`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe that sounds better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is because they do not directly test whether X is independent of Y because they are variable importance measures, not just for selection. That is why I would say that implicitly they are related to this testing, but they do not consist on this testing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok makes sense!
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be useful to provide a reference for LOCI, or at least expand the abbreviation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I would also suggest the reference but I think they are not yet available.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For LOCI, I find this reference: Ewald, Fiona Katharina, Ludwig Bothmann, Marvin N. Wright, Bernd Bischl, Giuseppe Casalicchio, and Gunnar König. "A guide to feature importance methods for scientific inference." In World Conference on Explainable Artificial Intelligence, pp. 440-464. Cham: Springer Nature Switzerland, 2024.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I meant it was the reference to the implemented class, not a bibliography reference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the biblio ref should be good enough for now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reference for the implementation should be only in the docstring of the class. In this case, we can keep a more general bibliography.
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Example of such methods is LOCI. | |
| An example of such a method is LOCI. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| i.e., they contribute unique knowledge. They are related with Conditional | |
| Independence Testing, which consist in testing if | |
| :math:`X^j\perp\!\!\!\!\perp Y\mid X^{-j}`. Examples of such methods are | |
| :class:`hidimstat.LOCO` and :class:`hidimstat.CFI`. | |
| i.e., they contribute unique knowledge. They are related to Conditional | |
| Independence Testing, which consists of testing whether | |
| :math:`X^j\perp\!\!\!\!\perp Y\mid X^{-j}`. Examples of such methods are | |
| :class:`hidimstat.LOCO` and :class:`hidimstat.CFI`. |
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| ----------------------------------- | |
| ------------------------------ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might be very wrong, but isn't this section somewhat redundant to the Variable Selection section? Could it be incorporated with the Variable Selection section?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but I am not sure how. Indeed it is important to make explicit that the power of the library is to provide statistical guarantees too.
Uh oh!
There was an error while loading. Please reload this page.