Skip to content

Commit 7075d7a

Browse files
committed
DOC add ols_omp init
1 parent 11c7c79 commit 7075d7a

File tree

11 files changed

+614
-329
lines changed

11 files changed

+614
-329
lines changed

doc/ols_and_omp.rst

Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,56 @@
1+
.. currentmodule:: fastcan
2+
3+
.. _ols_omp:
4+
5+
===========================
6+
Comparison with OLS and OMP
7+
===========================
8+
9+
:class:`FastCan` has a close relationship with Orthogonal Least Squares (OLS) [1]_
10+
and Orthogonal Matching Pursuit (OMP) [2]_.
11+
The detailed difference between OLS and OMP can be found in [3]_.
12+
Here, let's briefly compare the three methods.
13+
14+
15+
Assume we have a feature matrix :math:`X_s \in \mathbb{R}^{N\times t}`, which constains
16+
:math:`t` selected features, and a target vector :math:`y \in \mathbb{R}^{N\times 1}`.
17+
Then the residual :math:`r` of the least-squares can be found by
18+
19+
.. math::
20+
r = y - X_s \beta \;\; \text{where} \;\; \beta = (X_s^\top X_s)^{-1}X_s^\top y
21+
22+
When evaluating a new feature :math:`x_i`
23+
24+
* for OMP, the feature which maximizes :math:`r^\top x_i` will be selected
25+
* for OLS, the feature which maximizes :math:`r^\top w_i` will be selected, where
26+
:math:`w_i` is the projection of :math:`x_i` on the orthogonal subspace so that it is
27+
orthogonal to :math:`X_s`, i.e., :math:`X_s^\top w_i = \mathbf{0} \in \mathbb{R}^{N}`
28+
* for :class:`FastCan` (h-correlation algorithm), it is almost same as OLS, but the
29+
difference is that in :class:`FastCan`, :math:`X_s`, :math:`y`, and :math:`x_i`
30+
are centered (i.e., zero mean in each column) before the selection.
31+
32+
The small change makes the feature ranking criterion of :class:`FastCan` is equivalent
33+
to the sum of squared canonical correlation coefficients, which gives it the following
34+
advantages over OLS and OMP:
35+
36+
* Affine invariant: if features are polluted by affine transformation, i.e., scaled
37+
and/or added some constants, the selection result given by :class:`FastCan` will be
38+
unchanged.
39+
* Multioutput: as :class:`FastCan` use canonical correlation for feature ranking, it is
40+
naturally support feature seleciton on dataset with multioutput.
41+
42+
43+
.. rubric:: References
44+
45+
.. [1] `"Orthogonal least squares methods and their application to non-linear
46+
system identification" <https://doi.org/10.1080/00207178908953472>`_ Chen, S.,
47+
Billings, S. A., & Luo, W. International Journal of control, 50(5),
48+
1873-1896 (1989).
49+
50+
.. [2] `"Matching pursuits with time-frequency dictionaries"
51+
<https://doi.org/10.1109/78.258082>`_ Mallat, S. G., & Zhang, Z.
52+
IEEE Transactions on signal processing, 41(12), 3397-3415 (1993).
53+
54+
.. [3] `"On the difference between Orthogonal Matching Pursuit and Orthogonal Least
55+
Squares" <https://eprints.soton.ac.uk/142469/1/BDOMPvsOLS07.pdf>`_ Blumensath, T.,
56+
& Davies, M. E. Technical report, University of Edinburgh, (2007).

doc/user_guide.rst

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,4 +10,5 @@ User Guide
1010
intuitive.rst
1111
unsupervised.rst
1212
multioutput.rst
13-
redundancy.rst
13+
redundancy.rst
14+
ols_and_omp.rst

examples/plot_ols_omp.py

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
"""
2+
======================================================
3+
FastCan VS. OLS VS. OMP on Affine Transformed Features
4+
======================================================
5+
6+
In this examples, we will compare the robustness of the three feature
7+
selection methods on affine transformed features.
8+
"""
9+
10+
# Authors: Sikai Zhang
11+
# SPDX-License-Identifier: MIT
12+
13+
# %%
14+
# Define OLS
15+
# ----------
16+

examples/plot_redundancy.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -207,5 +207,5 @@ def get_n_missed(
207207
ax.bar_label(rects, n_missed.sum(0), padding=3)
208208
plt.xlabel("Selector")
209209
plt.ylabel("No. of missed features")
210-
plt.title("Performance of selectors on datasets with linear redundant features")
210+
plt.title("Performance of selectors on datasets with linearly redundant features")
211211
plt.show()

fastcan/__init__.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,9 +3,10 @@
33
"""
44

55
from ._fastcan import FastCan
6-
from ._ssc import ssc
6+
from ._utils import ssc, ols
77

88
__all__ = [
99
"FastCan",
1010
"ssc",
11+
"ols",
1112
]

fastcan/_fastcan.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -59,7 +59,7 @@ class FastCan(SelectorMixin, BaseEstimator):
5959
support_ : ndarray of shape (n_features,), dtype=bool
6060
The mask of selected features.
6161
62-
scores_: ndarray of shape (n_features_to_select,), dtype=float
62+
scores_ : ndarray of shape (n_features_to_select,), dtype=float
6363
The h-correlation/eta-cosine of selected features. The order of
6464
the scores is corresponding to the feature selection process.
6565

fastcan/_ssc.py

Lines changed: 0 additions & 47 deletions
This file was deleted.

fastcan/_utils.py

Lines changed: 110 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,110 @@
1+
"""Sum squared of correlation."""
2+
3+
from numbers import Integral
4+
5+
import numpy as np
6+
from sklearn.cross_decomposition import CCA
7+
from sklearn.utils import check_X_y
8+
from sklearn.utils._param_validation import Interval, validate_params
9+
10+
11+
@validate_params(
12+
{
13+
"X": ["array-like"],
14+
"y": ["array-like"],
15+
},
16+
prefer_skip_nested_validation=True,
17+
)
18+
def ssc(X, y):
19+
"""Sum of the squared canonical correlation coefficients.
20+
21+
Parameters
22+
----------
23+
X : array-like of shape (n_samples, n_features)
24+
Feature matrix.
25+
26+
y : array-like of shape (n_samples, n_outputs)
27+
Target matrix.
28+
29+
Returns
30+
-------
31+
ssc : float
32+
Sum of the squared canonical correlation coefficients.
33+
34+
Examples
35+
--------
36+
>>> from fastcan import ssc
37+
>>> X = [[1], [-1], [0]]
38+
>>> y = [[0], [1], [-1]]
39+
>>> ssc(X, y)
40+
np.float64(0.25)
41+
"""
42+
X, y = check_X_y(
43+
X, y, dtype=float, ensure_2d=True, multi_output=True, ensure_min_samples=2
44+
)
45+
n_components = min(X.shape[1], y.shape[1])
46+
cca = CCA(n_components=n_components)
47+
X_c, y_c = cca.fit_transform(X, y)
48+
corrcoef = np.diagonal(np.corrcoef(X_c, y_c, rowvar=False), offset=n_components)
49+
return sum(corrcoef**2)
50+
51+
52+
@validate_params(
53+
{
54+
"X": ["array-like"],
55+
"y": ["array-like"],
56+
"t": [Interval(Integral, 1, None, closed="left")],
57+
},
58+
prefer_skip_nested_validation=True,
59+
)
60+
def ols(X, y, t=1):
61+
"""Orthogonal least-squares
62+
63+
Parameters
64+
----------
65+
X : array-like of shape (n_samples, n_features)
66+
Feature matrix.
67+
68+
y : array-like of shape (n_samples,)
69+
Target vector.
70+
71+
t : int, default=1
72+
The parameter is the absolute number of features to select.
73+
74+
Returns
75+
-------
76+
indices : ndarray of shape (n_features_to_select,), dtype=int
77+
The indices of the selected features. The order of the indices
78+
is corresponding to the feature selection process.
79+
80+
scores : ndarray of shape (n_features_to_select,), dtype=float
81+
The scores of selected features. The order of
82+
the scores is corresponding to the feature selection process.
83+
"""
84+
X, y = check_X_y(
85+
X, y, dtype=float, ensure_2d=True
86+
)
87+
n_features = X.shape[1]
88+
w = X / np.linalg.norm(X, axis=0)
89+
v = y / np.linalg.norm(y)
90+
mask = np.zeros(n_features, dtype=bool)
91+
r2 = np.zeros(n_features)
92+
indices = np.zeros(t, dtype=int)
93+
scores = np.zeros(t, dtype=float)
94+
95+
for i in range(t):
96+
for j in range(n_features):
97+
if not mask[j]:
98+
r = w[:, j] @ v
99+
r2[j] = r**2
100+
d = np.argmax(r2)
101+
indices[i] = d
102+
scores[i] = r2[d]
103+
if i == t-1:
104+
return indices, scores
105+
mask[d] = True
106+
r2[d] = 0
107+
for j in range(n_features):
108+
if not mask[j]:
109+
w[:, j] = w[:, j] - w[:, d]*(w[:, d] @ w[:, j])
110+
w[:, j] /= np.linalg.norm(w[:, j], axis=0)

0 commit comments

Comments
 (0)