Skip to content

Commit 155a0bb

Browse files
committed
filling the doc
1 parent fbc1e49 commit 155a0bb

File tree

1 file changed

+52
-2
lines changed

1 file changed

+52
-2
lines changed

docs/src/high_dimension.rst

Lines changed: 52 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -14,13 +14,63 @@ In some cases, data represent high-dimensional measurements of some phenomenon o
1414
* powerless: As dimensionality and correlation increase, it becomes harder and harder to isolate the contribution of each variable, meaning that conditional inference is ill-posed.
1515

1616
This is illustrated in the above example, where the Desparsified Lasso struggles
17-
to identify relevant features
17+
to identify relevant features::
1818

19+
n_samples = 100
20+
shape = (40, 40)
21+
n_features = shape[1] * shape[0]
22+
roi_size = 4 # size of the edge of the four predictive regions
1923

24+
# generating the data
25+
from hidimstat._utils.scenario import multivariate_simulation_spatial
26+
X_init, y, beta, epsilon = multivariate_simulation_spatial(
27+
n_samples, shape, roi_size, signal_noise_ratio=10., smooth_X=1
28+
)
29+
30+
from hidimstat.desparsified_lasso import (
31+
desparsified_lasso,
32+
desparsified_lasso_pvalue,
33+
)
34+
beta_hat, sigma_hat, precision_diagonal = desparsified_lasso(X_init, y)
35+
pval, pval_corr, one_minus_pval, one_minus_pval_corr, cb_min, cb_max = (
36+
desparsified_lasso_pvalue(X_init.shape[0], beta_hat, sigma_hat, precision_diagonal)
37+
)
38+
39+
# compute estimated support (first method)
40+
from hidimstat.statistical_tools.p_values import zscore_from_pval
41+
zscore = zscore_from_pval(pval, one_minus_pval)
42+
selected_dl = zscore > thr_nc # use the "no clustering threshold"
43+
44+
# compute estimated support (second method)
45+
selected_dl = np.logical_or(
46+
pval_corr < fwer_target / 2, one_minus_pval_corr < .05
47+
)
48+
print(f'Desparsified Lasso selected {np.sum(selected_dl)} features')
49+
print(f'among {np.sum(beta_hat > 0)} ')
2050

2151
.. topic:: **Full example**
2252

2353
See the following example for a full file running the analysis:
24-
:ref:`https://hidimstat.github.io/dev/generated/gallery/examples/plot_2D_simulation_example.html#`
54+
:ref:`sphx_glr_generated_gallery_examples_plot_2D_simulation_example.py`
55+
56+
57+
Feature Grouping and its shortcomings
58+
-------------------------------------
59+
60+
As discussed earlier, feature grouping is a meaningful solution to deal with such cases: it reduces the number of features to condition on, and generally also decreases the level of correlation between features (XXX see grouping section).
61+
As hinted in [Meinshausen XXX] an efficient way to deal with such configuration is to take the per-group average of the features: this leads to a *reduced design*. After inference, all the feature in a given group obtain the p-value of the group representative. When the inference engine is Desparsified Lasso, the resulting mùethod is called Clustered Desparsified lasso, or **CluDL**.
62+
63+
The issue is that very-high-dimensional data (biological, images, etc.) do not have any canonical grouping structure. Hence, they rely on grouping obtained from the data, typically with clustering technique. However, the resulting clusters bring some undesirable randomness. Think that imputing slightly differnt data would lead to different clusters. Since there is no globally optimal clustering, the wiser solution is to *average* the results across clusterings. Since it may not be a good idea to average p-values, an alternative *ensembling* or *aggregation* strategy is sued instead. When the inference engine is Desparsified Lasso, the resulting mùethod is called Ensemble of Clustered Desparsified lasso, or **EnCluDL**.
64+
65+
Example
66+
-------
2567

2668

69+
70+
.. topic:: **Full example**
71+
72+
See the following example for a full file running the analysis:
73+
:ref:`sphx_glr_generated_gallery_examples_plot_2D_simulation_example.py`
74+
75+
What type of Control does this Ensemble of CLustered inference come with ?
76+
--------------------------------------------------------------------------

0 commit comments

Comments
 (0)