Skip to content

Commit 06d2f23

Browse files
committed
DOC add plot_affinity
1 parent 7075d7a commit 06d2f23

File tree

8 files changed

+668
-703
lines changed

8 files changed

+668
-703
lines changed

doc/ols_and_omp.rst

Lines changed: 12 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -14,28 +14,30 @@ Here, let's briefly compare the three methods.
1414

1515
Assume we have a feature matrix :math:`X_s \in \mathbb{R}^{N\times t}`, which constains
1616
:math:`t` selected features, and a target vector :math:`y \in \mathbb{R}^{N\times 1}`.
17-
Then the residual :math:`r` of the least-squares can be found by
17+
Then the residual :math:`r \in \mathbb{R}^{N\times 1}` of the least-squares can be
18+
found by
1819

1920
.. math::
2021
r = y - X_s \beta \;\; \text{where} \;\; \beta = (X_s^\top X_s)^{-1}X_s^\top y
2122
22-
When evaluating a new feature :math:`x_i`
23+
When evaluating a new candidate feature :math:`x_i \in \mathbb{R}^{N\times 1}`
2324

24-
* for OMP, the feature which maximizes :math:`r^\top x_i` will be selected
25+
* for OMP, the feature which maximizes :math:`r^\top x_i` will be selected,
2526
* for OLS, the feature which maximizes :math:`r^\top w_i` will be selected, where
26-
:math:`w_i` is the projection of :math:`x_i` on the orthogonal subspace so that it is
27-
orthogonal to :math:`X_s`, i.e., :math:`X_s^\top w_i = \mathbf{0} \in \mathbb{R}^{N}`
27+
:math:`w_i \in \mathbb{R}^{N\times 1}` is the projection of :math:`x_i` on the
28+
orthogonal subspace so that it is orthogonal to :math:`X_s`, i.e.,
29+
:math:`X_s^\top w_i = \mathbf{0} \in \mathbb{R}^{t\times 1}`,
2830
* for :class:`FastCan` (h-correlation algorithm), it is almost same as OLS, but the
2931
difference is that in :class:`FastCan`, :math:`X_s`, :math:`y`, and :math:`x_i`
3032
are centered (i.e., zero mean in each column) before the selection.
3133

32-
The small change makes the feature ranking criterion of :class:`FastCan` is equivalent
33-
to the sum of squared canonical correlation coefficients, which gives it the following
34-
advantages over OLS and OMP:
34+
The small difference makes the feature ranking criterion of :class:`FastCan` is
35+
equivalent to the sum of squared canonical correlation coefficients, which gives
36+
it the following advantages over OLS and OMP:
3537

36-
* Affine invariant: if features are polluted by affine transformation, i.e., scaled
38+
* Affine invariance: if features are polluted by affine transformation, i.e., scaled
3739
and/or added some constants, the selection result given by :class:`FastCan` will be
38-
unchanged.
40+
unchanged. See :ref:`sphx_glr_auto_examples_plot_affinity.py`.
3941
* Multioutput: as :class:`FastCan` use canonical correlation for feature ranking, it is
4042
naturally support feature seleciton on dataset with multioutput.
4143

examples/plot_affinity.py

Lines changed: 105 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,105 @@
1+
"""
2+
=================
3+
Affine Invariance
4+
=================
5+
6+
.. currentmodule:: fastcan
7+
8+
In this examples, we will compare the robustness of the three feature
9+
selection methods on affine transformed features.
10+
"""
11+
12+
# Authors: Sikai Zhang
13+
# SPDX-License-Identifier: MIT
14+
15+
# %%
16+
# Initialize test
17+
# ---------------
18+
# The three feature selection methods, i.e., OMP, OLS, and :class:`FastCan`,
19+
# will select three features from the 10 features of `diabetes` dataset. It can be
20+
# seen, the three methods select the same features.
21+
22+
import numpy as np
23+
from sklearn.datasets import load_diabetes
24+
from sklearn.linear_model import OrthogonalMatchingPursuit
25+
26+
from fastcan import FastCan, ols
27+
28+
X, y = load_diabetes(return_X_y=True)
29+
30+
n_selected = 3
31+
omp_selector = OrthogonalMatchingPursuit(n_nonzero_coefs=n_selected)
32+
fastcan_selector = FastCan(n_features_to_select=n_selected, verbose=0)
33+
(ids_omp,) = omp_selector.fit(X, y).coef_.nonzero()
34+
ids_ols, _ = ols(X, y, n_selected)
35+
ids_fastcan = fastcan_selector.fit(X, y).indices_
36+
37+
print("Indices of features selected by:")
38+
print("OMP: ", np.sort(ids_omp))
39+
print("OLS: ", np.sort(ids_ols))
40+
print("FastCan: ", np.sort(ids_fastcan))
41+
42+
43+
44+
# %%
45+
# Affine transformation
46+
# ---------------------
47+
# In this test, the 10 features of ``diabetes`` dataset will be randomly polluted
48+
# by the affine transformation. The three feature selection methods will select
49+
# three features from the polluted features. The more stable the result, the better.
50+
51+
52+
53+
n_features = X.shape[1]
54+
rng = np.random.default_rng()
55+
56+
ids_omp_all = []
57+
ids_ols_all = []
58+
ids_fastcan_all = []
59+
for i in range(10):
60+
X_affine = X @ np.diag(rng.random(n_features)) + rng.random(n_features)
61+
62+
(ids_omp,) = omp_selector.fit(X_affine, y).coef_.nonzero()
63+
ids_ols, _ = ols(X_affine, y, n_selected)
64+
ids_fastcan = fastcan_selector.fit(X_affine, y).indices_
65+
ids_omp_all += ids_omp.tolist()
66+
ids_ols_all += ids_ols.tolist()
67+
ids_fastcan_all += ids_fastcan.tolist()
68+
69+
# %%
70+
# Plot results
71+
# ------------
72+
# It can be seen, only :class:`FastCan` has robust results when the feature
73+
# is polluted by the affine transformation.
74+
75+
import matplotlib.pyplot as plt
76+
77+
bin_lims = np.arange(n_features+1)
78+
counts_omp, _ = np.histogram(ids_omp_all, bins=bin_lims)
79+
counts_ols, _ = np.histogram(ids_ols_all, bins=bin_lims)
80+
counts_fastcan, _ = np.histogram(ids_fastcan_all, bins=bin_lims)
81+
82+
fig, axs = plt.subplots(1, 3, figsize=(8, 3))
83+
84+
axs[0].bar(bin_lims[:-1], counts_omp)
85+
axs[0].set_xticks(bin_lims[:-1])
86+
axs[0].set_ylim((0, 11))
87+
axs[0].set_title("OMP")
88+
axs[0].set_xlabel("Feature Index")
89+
axs[0].set_ylabel("Count of Selected Times")
90+
91+
92+
axs[1].bar(bin_lims[:-1], counts_ols)
93+
axs[1].set_xticks(bin_lims[:-1])
94+
axs[1].set_ylim((0, 11))
95+
axs[1].set_title("OLS")
96+
axs[1].set_xlabel("Feature Index")
97+
98+
axs[2].bar(bin_lims[:-1], counts_fastcan)
99+
axs[2].set_xticks(bin_lims[:-1])
100+
axs[2].set_ylim((0, 11))
101+
axs[2].set_title("FastCan")
102+
axs[2].set_xlabel("Feature Index")
103+
104+
plt.tight_layout()
105+
plt.show()

examples/plot_ols_omp.py

Lines changed: 0 additions & 16 deletions
This file was deleted.

examples/plot_redundancy.py

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,8 @@
33
Feature selection performance on redundant features
44
===================================================
55
6+
.. currentmodule:: fastcan
7+
68
In this examples, we will compare the performance of feature selectors on the
79
datasets, which contain redundant features.
810
Here four types of features should be distinguished:
@@ -88,7 +90,7 @@ def make_redundant(
8890
# ---------------------
8991
# This function is used to compute the number of correct features missed by selectors.
9092
#
91-
# * For independent informative features, selectors should select all of them
93+
# * For independent informative features, selectors should select all of them.
9294
# * For dependent informative features, selectors only need to select any
9395
# ``n_dep_info``-combination of the set ``dep_info_ids`` + ``redundant_ids``. That
9496
# means if the indices of dependent informative features are :math:`[0, 1]` and the
@@ -114,13 +116,13 @@ def get_n_missed(
114116
# %%
115117
# Prepare selectors
116118
# -----------------
117-
# We compare :class:`fastcan.FastCan` with eight selectors of :mod:`sklearn`, which
119+
# We compare :class:`FastCan` with eight selectors of :mod:`sklearn`, which
118120
# include the Select From a Model (SFM) algorithm, the Recursive Feature Elimination
119121
# (RFE) algorithm, the Sequential Feature Selection (SFS) algorithm, and Select K Best
120122
# (SKB) algorithm.
121123
# The list of the selectors are given below:
122124
#
123-
# * fastcan: :class:`fastcan.FastCan` selector
125+
# * fastcan: :class:`FastCan` selector
124126
# * skb_reg: is the SKB algorithm ranking features with ANOVA (analysis of variance)
125127
# F-statistic and p-values
126128
# * skb_mir: is the SKB algorithm ranking features mutual information for regression
@@ -197,7 +199,7 @@ def get_n_missed(
197199
# %%
198200
# Plot results
199201
# ------------
200-
# :class:`fastcan.FastCan` correctly selects all informative features with zero missed
202+
# :class:`FastCan` correctly selects all informative features with zero missed
201203
# features.
202204

203205
import matplotlib.pyplot as plt

examples/plot_speed.py

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,9 +3,11 @@
33
Computational speed comparison
44
==============================
55
6+
.. currentmodule:: fastcan
7+
68
In this examples, we will compare the computational speed of three different feature
7-
selection methods: h-correlation based :class:`fastcan.FastCan`, eta-cosine based
8-
:class:`fastcan.FastCan`, and baseline model based on
9+
selection methods: h-correlation based :class:`FastCan`, eta-cosine based
10+
:class:`FastCan`, and baseline model based on
911
``sklearn.cross_decomposition.CCA``.
1012
1113
"""

fastcan/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
"""
44

55
from ._fastcan import FastCan
6-
from ._utils import ssc, ols
6+
from ._utils import ols, ssc
77

88
__all__ = [
99
"FastCan",

fastcan/_utils.py

Lines changed: 3 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -81,9 +81,7 @@ def ols(X, y, t=1):
8181
The scores of selected features. The order of
8282
the scores is corresponding to the feature selection process.
8383
"""
84-
X, y = check_X_y(
85-
X, y, dtype=float, ensure_2d=True
86-
)
84+
X, y = check_X_y(X, y, dtype=float, ensure_2d=True)
8785
n_features = X.shape[1]
8886
w = X / np.linalg.norm(X, axis=0)
8987
v = y / np.linalg.norm(y)
@@ -100,11 +98,11 @@ def ols(X, y, t=1):
10098
d = np.argmax(r2)
10199
indices[i] = d
102100
scores[i] = r2[d]
103-
if i == t-1:
101+
if i == t - 1:
104102
return indices, scores
105103
mask[d] = True
106104
r2[d] = 0
107105
for j in range(n_features):
108106
if not mask[j]:
109-
w[:, j] = w[:, j] - w[:, d]*(w[:, d] @ w[:, j])
107+
w[:, j] = w[:, j] - w[:, d] * (w[:, d] @ w[:, j])
110108
w[:, j] /= np.linalg.norm(w[:, j], axis=0)

0 commit comments

Comments
 (0)