|
| 1 | +.. currentmodule:: fastcan |
| 2 | + |
| 3 | +.. _ols_omp: |
| 4 | + |
| 5 | +=========================== |
| 6 | +Comparison with OLS and OMP |
| 7 | +=========================== |
| 8 | + |
| 9 | +:class:`FastCan` has a close relationship with Orthogonal Least Squares (OLS) [1]_ |
| 10 | +and Orthogonal Matching Pursuit (OMP) [2]_. |
| 11 | +The detailed difference between OLS and OMP can be found in [3]_. |
| 12 | +Here, let's briefly compare the three methods. |
| 13 | + |
| 14 | + |
| 15 | +Assume we have a feature matrix :math:`X_s \in \mathbb{R}^{N\times t}`, which constains |
| 16 | +:math:`t` selected features, and a target vector :math:`y \in \mathbb{R}^{N\times 1}`. |
| 17 | +Then the residual :math:`r \in \mathbb{R}^{N\times 1}` of the least-squares can be |
| 18 | +found by |
| 19 | + |
| 20 | +.. math:: |
| 21 | + r = y - X_s \beta \;\; \text{where} \;\; \beta = (X_s^\top X_s)^{-1}X_s^\top y |
| 22 | +
|
| 23 | +When evaluating a new candidate feature :math:`x_i \in \mathbb{R}^{N\times 1}` |
| 24 | + |
| 25 | +* for OMP, the feature which maximizes :math:`r^\top x_i` will be selected, |
| 26 | +* for OLS, the feature which maximizes :math:`r^\top w_i` will be selected, where |
| 27 | + :math:`w_i \in \mathbb{R}^{N\times 1}` is the projection of :math:`x_i` on the |
| 28 | + orthogonal subspace so that it is orthogonal to :math:`X_s`, i.e., |
| 29 | + :math:`X_s^\top w_i = \mathbf{0} \in \mathbb{R}^{t\times 1}`, |
| 30 | +* for :class:`FastCan` (h-correlation algorithm), it is almost same as OLS, but the |
| 31 | + difference is that in :class:`FastCan`, :math:`X_s`, :math:`y`, and :math:`x_i` |
| 32 | + are centered (i.e., zero mean in each column) before the selection. |
| 33 | + |
| 34 | +The small difference makes the feature ranking criterion of :class:`FastCan` is |
| 35 | +equivalent to the sum of squared canonical correlation coefficients, which gives |
| 36 | +it the following advantages over OLS and OMP: |
| 37 | + |
| 38 | +* Affine invariance: if features are polluted by affine transformation, i.e., scaled |
| 39 | + and/or added some constants, the selection result given by :class:`FastCan` will be |
| 40 | + unchanged. See :ref:`sphx_glr_auto_examples_plot_affinity.py`. |
| 41 | +* Multioutput: as :class:`FastCan` use canonical correlation for feature ranking, it is |
| 42 | + naturally support feature seleciton on dataset with multioutput. |
| 43 | + |
| 44 | + |
| 45 | +.. rubric:: References |
| 46 | + |
| 47 | +.. [1] `"Orthogonal least squares methods and their application to non-linear |
| 48 | + system identification" <https://doi.org/10.1080/00207178908953472>`_ Chen, S., |
| 49 | + Billings, S. A., & Luo, W. International Journal of control, 50(5), |
| 50 | + 1873-1896 (1989). |
| 51 | +
|
| 52 | +.. [2] `"Matching pursuits with time-frequency dictionaries" |
| 53 | + <https://doi.org/10.1109/78.258082>`_ Mallat, S. G., & Zhang, Z. |
| 54 | + IEEE Transactions on signal processing, 41(12), 3397-3415 (1993). |
| 55 | +
|
| 56 | +.. [3] `"On the difference between Orthogonal Matching Pursuit and Orthogonal Least |
| 57 | + Squares" <https://eprints.soton.ac.uk/142469/1/BDOMPvsOLS07.pdf>`_ Blumensath, T., |
| 58 | + & Davies, M. E. Technical report, University of Edinburgh, (2007). |
0 commit comments