Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .github/workflows/static.yml
Original file line number Diff line number Diff line change
Expand Up @@ -30,3 +30,6 @@ jobs:
- name: Type check
run: |
pixi run type
- name: Spell check
run: |
pixi run spell
2 changes: 1 addition & 1 deletion doc/multioutput.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ MIMO (Multi-Input Multi-Output) data. For classification, it can be used for
multilabel data. Actually, for multiclass classification, which has one output with
multiple categories, multioutput feature selection can also be useful. The multiclass
classification can be converted to multilabel classification by one-hot encoding
target ``y``. The cannonical correaltion coefficient between the features ``X`` and the
target ``y``. The canonical correaltion coefficient between the features ``X`` and the
one-hot encoded target ``y`` has equivalent relationship with Fisher's criterion in
LDA (Linear Discriminant Analysis) [1]_. Applying :class:`FastCan` to the converted
multioutput data may result in better accuracy in the following classification task
Expand Down
2 changes: 1 addition & 1 deletion doc/narx.rst
Original file line number Diff line number Diff line change
Expand Up @@ -82,7 +82,7 @@ It should also be noted the different types of predictions in model training.
ARX and OE model
----------------

To better understant the two types of training, it is helpful to know two linear time series model structures,
To better understand the two types of training, it is helpful to know two linear time series model structures,
i.e., `ARX (AutoRegressive eXogenous) model <https://www.mathworks.com/help/ident/ref/arx.html>`_ and
`OE (output error) model <https://www.mathworks.com/help/ident/ref/oe.html>`_.

Expand Down
2 changes: 1 addition & 1 deletion doc/ols_and_omp.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ The detailed difference between OLS and OMP can be found in [3]_.
Here, let's briefly compare the three methods.


Assume we have a feature matrix :math:`X_s \in \mathbb{R}^{N\times t}`, which constains
Assume we have a feature matrix :math:`X_s \in \mathbb{R}^{N\times t}`, which contains
:math:`t` selected features, and a target vector :math:`y \in \mathbb{R}^{N\times 1}`.
Then the residual :math:`r \in \mathbb{R}^{N\times 1}` of the least-squares can be
found by
Expand Down
4 changes: 2 additions & 2 deletions doc/pruning.rst
Original file line number Diff line number Diff line change
Expand Up @@ -22,9 +22,9 @@ should be selected, as any additional samples can be represented by linear combi
Therefore, the number to select has to be set to small.

To solve this problem, we use :func:`minibatch` to loose the redundancy check of :class:`FastCan`.
The original :class:`FastCan` checks the redunancy within :math:`X_s \in \mathbb{R}^{n\times t}`,
The original :class:`FastCan` checks the redundancy within :math:`X_s \in \mathbb{R}^{n\times t}`,
which contains :math:`t` selected samples and n features,
and the redunancy within :math:`Y \in \mathbb{R}^{n\times m}`, which contains :math:`m` atoms :math:`y_i`.
and the redundancy within :math:`Y \in \mathbb{R}^{n\times m}`, which contains :math:`m` atoms :math:`y_i`.
:func:`minibatch` ranks samples with multiple correlation coefficients between :math:`X_b \in \mathbb{R}^{n\times b}` and :math:`y_i`,
where :math:`b` is batch size and :math:`b <= t`, instead of canonical correlation coefficients between :math:`X_s` and :math:`Y`,
which is used in :class:`FastCan`.
Expand Down
2 changes: 1 addition & 1 deletion examples/plot_fisher.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
.. currentmodule:: fastcan
In this examples, we will demonstrate the cannonical correaltion coefficient
In this examples, we will demonstrate the canonical correaltion coefficient
between the features ``X`` and the one-hot encoded target ``y`` has equivalent
relationship with Fisher's criterion in LDA (Linear Discriminant Analysis).
"""
Expand Down
10 changes: 5 additions & 5 deletions examples/plot_intuitive.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,10 +22,10 @@
# the predicted target by a linear regression model) and the target to describe its
# usefulness, the results are shown in the following figure. It can be seen that
# Feature 2 is the most useful and Feature 8 is the second. However, does that mean
# that the total usefullness of Feature 2 + Feature 8 is the sum of their R-squared
# that the total usefulness of Feature 2 + Feature 8 is the sum of their R-squared
# scores? Probably not, because there may be redundancy between Feature 2 and Feature 8.
# Actually, what we want is a kind of usefulness score which has the **superposition**
# property, so that the usefullness of each feature can be added together without
# property, so that the usefulness of each feature can be added together without
# redundancy.

import matplotlib.pyplot as plt
Expand Down Expand Up @@ -125,7 +125,7 @@ def plot_bars(ids, r2_left, r2_selected):
# Select the third feature
# ------------------------
# Again, let's compute the R-squared between Feature 2 + Feature 8 + Feature i and
# the target, and the additonal R-squared contributed by the rest of the features is
# the target, and the additional R-squared contributed by the rest of the features is
# shown in following figure. It can be found that after selecting Features 2 and 8, the
# rest of the features can provide a very limited contribution.

Expand All @@ -145,8 +145,8 @@ def plot_bars(ids, r2_left, r2_selected):
# at the RHS of the dashed lines. The fast computational speed is achieved by
# orthogonalization, which removes the redundancy between the features. We use the
# orthogonalization first to makes the rest of features orthogonal to the selected
# features and then compute their additonal R-squared values. ``eta-cosine`` uses
# the samilar idea, but has an additonal preprocessing step to compress the features
# features and then compute their additional R-squared values. ``eta-cosine`` uses
# the similar idea, but has an additional preprocessing step to compress the features
# :math:`X \in \mathbb{R}^{N\times n}` and the target
# :math:`X \in \mathbb{R}^{N\times n}` to :math:`X_c \in \mathbb{R}^{(m+n)\times n}`
# and :math:`Y_c \in \mathbb{R}^{(m+n)\times m}`.
Expand Down
2 changes: 1 addition & 1 deletion examples/plot_pruning.py
Original file line number Diff line number Diff line change
Expand Up @@ -81,7 +81,7 @@ def _fastcan_pruning(
# %%
# Compare pruning methods
# -----------------------
# 100 samples are seleced from 150 original data with ``Random`` pruning and
# 100 samples are selected from 150 original data with ``Random`` pruning and
# ``FastCan`` pruning. The results show that ``FastCan`` pruning gives a higher
# mean value of R-squared and a lower standard deviation.

Expand Down
2 changes: 1 addition & 1 deletion examples/plot_redundancy.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
datasets, which contain redundant features.
Here four types of features should be distinguished:
* Unuseful features: the features do not contribute to the target
* Useless features: the features do not contribute to the target
* Dependent informative features: the features contribute to the target and form
the redundant features
* Redundant features: the features are constructed by linear transformation of
Expand Down
2 changes: 1 addition & 1 deletion examples/plot_speed.py
Original file line number Diff line number Diff line change
Expand Up @@ -147,7 +147,7 @@ def baseline(X, y, t):
r_eta = FastCan(n_features_to_select, eta=True, verbose=0).fit(X, y).indices_
r_base, _ = baseline(X, y, n_features_to_select)

print("The indices of the seleted features:", end="\n")
print("The indices of the selected features:", end="\n")
print(f"h-correlation: {r_h}")
print(f"eta-cosine: {r_eta}")
print(f"Baseline: {r_base}")
Expand Down
2 changes: 1 addition & 1 deletion fastcan/_cancorr_fast.pyx
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
"""
Fast feature selection with sum squared canoncial correlation coefficents
Fast feature selection with sum squared canonical correlation coefficients
"""
# Authors: The fastcan developers
# SPDX-License-Identifier: MIT
Expand Down
2 changes: 1 addition & 1 deletion fastcan/_fastcan.py
Original file line number Diff line number Diff line change
Expand Up @@ -135,7 +135,7 @@ def __init__(
self.verbose = verbose

def fit(self, X, y):
"""Preprare data for h-correlation or eta-cosine methods and select features.
"""Prepare data for h-correlation or eta-cosine methods and select features.
Parameters
----------
Expand Down
6 changes: 3 additions & 3 deletions fastcan/narx.py
Original file line number Diff line number Diff line change
Expand Up @@ -275,7 +275,7 @@ def make_poly_ids(
)

const_id = np.where((ids == 0).all(axis=1))
return np.delete(ids, const_id, 0) # remove the constant featrue
return np.delete(ids, const_id, 0) # remove the constant feature


def _valiate_time_shift_poly_ids(
Expand Down Expand Up @@ -399,7 +399,7 @@ def fd2tp(feat_ids, delay_ids):
For time_shift_ids, [0, 1], [0, 2], and [1, 3] represents x0(k-1), x0(k-2),
and x1(k-3), respectively. For poly_ids, [1, 1] and [2, 3] represent the first
variable multiplying the first variable given by time_shift_ids, i.e.,
x0(k-1)*x0(k-1), and the second variable multiplying the thrid variable, i.e.,
x0(k-1)*x0(k-1), and the second variable multiplying the third variable, i.e.,
x0(k-1)*x1(k-3).
Parameters
Expand Down Expand Up @@ -475,7 +475,7 @@ def tp2fd(time_shift_ids, poly_ids):
For time_shift_ids, [0, 1], [0, 2], and [1, 3] represents x0(k-1), x0(k-2),
and x1(k-3), respectively. For poly_ids, [1, 1] and [2, 3] represent the first
variable multiplying the first variable given by time_shift_ids, i.e.,
x0(k-1)*x0(k-1), and the second variable multiplying the thrid variable, i.e.,
x0(k-1)*x0(k-1), and the second variable multiplying the third variable, i.e.,
x0(k-1)*x1(k-3).
Parameters
Expand Down
Loading