DOC add multioutput guide

MatthewSZhang · MatthewSZhang · commit c1abc204e97e · 2024-10-11T10:52:12.000+08:00
diff --git a/doc/index.rst b/doc/index.rst
@@ -23,6 +23,7 @@ API Reference
 Useful Links
 ------------
 .. toctree::
+   :maxdepth: 2
 
    User Guild <user_guide>
    Examples <auto_examples/index>
diff --git a/doc/multioutput.rst b/doc/multioutput.rst
@@ -6,4 +6,52 @@
 Multioutput feature selection
 ==============================
 
-We can use :class:`FastCan` to handle multioutput feature selection.
+We can use :class:`FastCan` to handle multioutput feature selection, which means
+target ``y`` can be a matrix. For regression, :class:`FastCan` can be used for
+MIMO (Multi-Input Multi-Output) data. For classification, it can be used for
+multilabel data. Actually, for multiclass classification, which has one output with
+multiple categories, multioutput feature selection can also be useful. The multiclass
+classification can be converted to multilabel classification by one-hot encoding
+target ``y``. The cannonical correaltion coefficient between the features ``X`` and the
+one-hot encoded target ``y`` has equivalent relationship with Fisher's criterion in
+LDA (Linear Discriminant Analysis) [1]_. Applying :class:`FastCan` to the converted
+multioutput data may result in better accuracy in the following classification task
+than applying it directly to the original single-label data. See Figure 5 in [2]_.
+
+Relationship on multiclass data
+-------------------------------
+Assume the feature matrix is :math:`X \in \mathbb{R}^{N\times n}`, the multiclass
+target vector is :math:`y \in \mathbb{R}^{N\times 1}`, and the one-hot encoded target
+matrix is :math:`Y \in \mathbb{R}^{N\times m}`. Then, the Fisher's criterion for
+:math:`X` and :math:`y` is denoted as :math:`J` and the canonical correaltion
+coefficient between :math:`X` and :math:`Y` is denoted as :math:`R`. The relationship
+between :math:`J` and :math:`R` is given by
+
+.. math::
+    J = \frac{R^2}{1-R^2}
+
+or
+
+.. math::
+    R^2 = \frac{J}{1+J}
+
+It should be noted that the number of the Fisher's criterion and the canonical
+correaltion coefficient is not only one. The number of the non-zero canonical
+correlation coefficients is no more than :math:`\min (n, m)`, and each canonical correlation
+coefficient is one-to-one correspondence to each Fisher's criterion.
+
+.. rubric:: References
+
+.. [1] `"Orthogonal least squares based fast feature selection for
+  linear classification" <https://doi.org/10.1016/j.patcog.2021.108419>`_
+  Zhang, S., & Lang, Z. Q. Pattern Recognition, 123, 108419 (2022).
+
+.. [2] `"Canonical-correlation-based fast feature selection for structural
+  health monitoring" <https://doi.org/10.1016/j.ymssp.2024.111895>`_
+  Zhang, S., Wang, T., Worden, K., Sun L., & Cross, E. J.
+  Mechanical Systems and Signal Processing, 223, 111895 (2025).
+
+.. rubric:: Examples
+
+* See :ref:`sphx_glr_auto_examples_plot_fisher.py` for an example of
+  the equivalent relationship between CCA and LDA on multiclass data.
diff --git a/doc/redundancy.rst b/doc/redundancy.rst
@@ -27,7 +27,7 @@ which gives large rounding-errors when linearly redundant features appears.
 * `"Canonical-correlation-based fast feature selection for structural
   health monitoring" <https://doi.org/10.1016/j.ymssp.2024.111895>`_
   Zhang, S., Wang, T., Worden, K., Sun L., & Cross, E. J.
-  Mechanical Systems and Signal Processing, 223:111895 (2025).
+  Mechanical Systems and Signal Processing, 223, 111895 (2025).
 
 .. rubric:: Examples
 
diff --git a/doc/user_guide.rst b/doc/user_guide.rst
@@ -6,6 +6,7 @@ User Guide
 
 .. toctree::
    :numbered:
+   :maxdepth: 1
 
    intuitive.rst
    unsupervised.rst
diff --git a/examples/plot_fisher.py b/examples/plot_fisher.py
@@ -0,0 +1,71 @@
+"""
+=========================
+Fisher's criterion in LDA
+=========================
+
+.. currentmodule:: fastcan
+
+In this examples, we will demonstrate the cannonical correaltion coefficient
+between the features ``X`` and the one-hot encoded target ``y`` has equivalent
+relationship with Fisher's criterion in LDA (Linear Discriminant Analysis).
+"""
+
+# Authors: Sikai Zhang
+# SPDX-License-Identifier: MIT
+
+# %%
+# Prepare data
+# ------------
+# We use ``iris`` dataset and transform this multiclass data to multilabel data by
+# one-hot encoding. Here, drop="first" is necessary, otherwise, the transformed target
+# is not full column rank.
+
+from sklearn import datasets
+from sklearn.preprocessing import OneHotEncoder
+
+
+X, y = datasets.load_iris(return_X_y=True)
+# drop="first" is necessary, otherwise, the transformed target is not full column rank
+y_enc = OneHotEncoder(
+    drop="first",
+    sparse_output=False,
+).fit_transform(y.reshape(-1, 1))
+
+# %%
+# Compute Fisher's criterion
+# --------------------------
+# The intermediate product of ``LinearDiscriminantAnalysis`` in ``sklearn`` is
+# Fisher's criterion, when ``solver="eigen"``. However, it does not provide an interface
+# to export it, so we reproduce it manually.
+
+import numpy as np
+from scipy import linalg
+from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
+from sklearn.covariance import empirical_covariance
+
+clf = LinearDiscriminantAnalysis(solver="eigen").fit(X, y)
+Sw = clf.covariance_  # within scatter
+St = empirical_covariance(X)  # total scatter
+Sb = St - Sw  # between scatter
+fishers_criterion, _ = linalg.eigh(Sb, Sw)
+
+fishers_criterion = np.sort(fishers_criterion)[::-1]
+n_nonzero = min(X.shape[1], clf.classes_.shape[0]-1)
+# remove the eigenvalues which are close to zero
+fishers_criterion = fishers_criterion[:n_nonzero]
+# get canonical correlation coefficients from convert Fisher's criteria
+r2 = fishers_criterion/(1+fishers_criterion)
+
+# %%
+# Compute SSC
+# -----------
+# Compute the sum of squared canonical correlation coefficients (SSC). It can be found
+# that the result obtained by :class:`FastCan`/CCA (Canonical Correlation Analysis) is
+# the same as LDA.
+
+from fastcan import FastCan
+
+ssc = FastCan(4, verbose=0).fit(X, y_enc).scores_.sum()
+
+print(f"SSC from LDA: {r2.sum():5f}")
+print(f"SSC from CCA: {ssc:5f}")