Skip to content

Commit 77fb4ea

Browse files
rvasav26cajchristianrosecers
authored
Adding Principal Covariates Classification (PCovC) Code (#248)
* Adding PCovC code along with base class, modifying PCovR to inherit from it * Finalizing/touching up docs * Fixing linting * Minor changes to examples, formatting * Fixing docstrings to address docs build errors * Fixing whitespace for linter * Adding pcovc to docs * Making PCovC accessible via API reference on docs for now. * Implementing Rosy's suggestions to code * Adding PCovC to docs examples * Updating CHANGELOG, changing PCovC fit() note * Modifying examples, docstrings * Addressing docs suggestions for inclusion of PCovC * Adding side-by-side to PCovC comparison example * Touching up pcovc vs pca example * Linting * Linting (again) * Touching up docs --------- Co-authored-by: Christian Jorgensen <[email protected]> Co-authored-by: Rose K. Cersonsky <[email protected]>
1 parent b93de9f commit 77fb4ea

19 files changed

+1796
-301
lines changed

CHANGELOG

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,10 @@ The rules for CHANGELOG file:
1313

1414
0.3.0 (XXXX/XX/XX)
1515
------------------
16+
- Add ``_BasePCov`` class (#248)
17+
- Add ``PCovC`` class that inherits shared functionality from ``_BasePCov`` (#248)
18+
- Add ``PCovC`` testing suite and examples (#248)
19+
- Modify ``PCovR`` to inherit shared functionality from ``_BasePCov_`` (#248)
1620
- Update to sklearn >= 1.6.0 and scipy >= 1.15.0 (#239)
1721
- Fixed moved function import from scipy and bump scipy dependency to 1.15.0 (#236)
1822
- Fix rendering issues for `SparseKDE` and `QuickShift` (#236)

docs/src/bibliography.rst

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -45,3 +45,9 @@ References
4545
Michele Ceriotti, "Improving Sample and Feature Selection with Principal Covariates
4646
Regression" 2021 Mach. Learn.: Sci. Technol. 2 035038.
4747
https://iopscience.iop.org/article/10.1088/2632-2153/abfe7c.
48+
49+
.. [Jorgensen2025]
50+
Christian Jorgensen, Arthur Y. Lin, Rhushil Vasavada, and Rose K. Cersonsky,
51+
"Interpretable Visualizations of Data Spaces for Classification Problems"
52+
2025 arXiv. 2503.05861.
53+
https://doi.org/10.48550/arXiv.2503.05861.

docs/src/conf.py

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -54,7 +54,14 @@
5454
"sphinx_toggleprompt",
5555
]
5656

57-
example_subdirs = ["pcovr", "selection", "regression", "reconstruction", "neighbors"]
57+
example_subdirs = [
58+
"pcovr",
59+
"pcovc",
60+
"selection",
61+
"regression",
62+
"reconstruction",
63+
"neighbors",
64+
]
5865
sphinx_gallery_conf = {
5966
"filename_pattern": "/*",
6067
"examples_dirs": [f"../../examples/{p}" for p in example_subdirs],

docs/src/getting-started.rst

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -37,10 +37,10 @@ Notebook Examples
3737
.. include:: examples/reconstruction/index.rst
3838
:start-line: 4
3939

40-
.. _getting_started-pcovr:
40+
.. _getting_started-hybrid:
4141

42-
Principal Covariates Regression
43-
-------------------------------
42+
Hybrid Mapping Techniques
43+
-------------------------
4444

4545
.. automodule:: skmatter.decomposition
4646
:noindex:
@@ -50,3 +50,5 @@ Notebook Examples
5050

5151
.. include:: examples/pcovr/index.rst
5252
:start-line: 4
53+
.. include:: examples/pcovc/index.rst
54+
:start-line: 4

docs/src/index.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -33,15 +33,15 @@
3333

3434
.. only:: html
3535

36-
:ref:`getting_started-pcovr`
36+
:ref:`getting_started-hybrid`
3737

3838
.. image:: /examples/pcovr/images/thumb/sphx_glr_PCovR_thumb.png
3939
:alt:
4040

4141
.. raw:: html
4242

4343
</h5>
44-
<p class="card-text">Utilises a combination between a PCA-like and a LR-like loss
44+
<p class="card-text">PCovR and PCovC utilize a combination between a PCA-like and a LR-like loss
4545
to determine the decomposition matrix to project feature into latent space</p>
4646
</div>
4747
</div>

docs/src/references/decomposition.rst

Lines changed: 22 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
1-
Principal Covariates Regression (PCovR)
2-
=======================================
1+
Hybrid Mapping Techniques
2+
=========================
33

44
.. _PCovR-api:
55

@@ -20,6 +20,26 @@ PCovR
2020
.. automethod:: inverse_transform
2121
.. automethod:: score
2222

23+
.. _PCovC-api:
24+
25+
PCovC
26+
-----
27+
28+
.. autoclass:: skmatter.decomposition.PCovC
29+
:show-inheritance:
30+
:special-members:
31+
32+
.. automethod:: fit
33+
34+
.. automethod:: _fit_feature_space
35+
.. automethod:: _fit_sample_space
36+
37+
.. automethod:: transform
38+
.. automethod:: predict
39+
.. automethod:: inverse_transform
40+
.. automethod:: decision_function
41+
.. automethod:: score
42+
2343
.. _KPCovR-api:
2444

2545
Kernel PCovR

docs/src/tutorials.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@
33
.. toctree::
44

55
examples/pcovr/index
6+
examples/pcovc/index
67
examples/selection/index
78
examples/regression/index
89
examples/reconstruction/index

examples/pcovc/PCovC_Comparison.py

Lines changed: 110 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,110 @@
1+
#!/usr/bin/env python
2+
# coding: utf-8
3+
4+
"""
5+
Comparing PCovC with PCA and LDA
6+
================================
7+
"""
8+
# %%
9+
#
10+
11+
import matplotlib.pyplot as plt
12+
import numpy as np
13+
from sklearn.datasets import load_breast_cancer
14+
from sklearn.decomposition import PCA
15+
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
16+
from sklearn.linear_model import LogisticRegressionCV
17+
from sklearn.preprocessing import StandardScaler
18+
19+
from skmatter.decomposition import PCovC
20+
21+
22+
plt.rcParams["image.cmap"] = "tab10"
23+
plt.rcParams["scatter.edgecolors"] = "k"
24+
25+
random_state = 0
26+
27+
# %%
28+
#
29+
# For this, we will use the :func:`sklearn.datasets.load_breast_cancer` dataset from
30+
# ``sklearn``.
31+
32+
X, y = load_breast_cancer(return_X_y=True)
33+
34+
scaler = StandardScaler()
35+
X_scaled = scaler.fit_transform(X)
36+
37+
# %%
38+
#
39+
# PCA
40+
# ---
41+
#
42+
43+
pca = PCA(n_components=2)
44+
45+
pca.fit(X_scaled, y)
46+
T_pca = pca.transform(X_scaled)
47+
48+
fig, ax = plt.subplots()
49+
scatter = ax.scatter(T_pca[:, 0], T_pca[:, 1], c=y)
50+
ax.set(xlabel="PC$_1$", ylabel="PC$_2$")
51+
ax.legend(
52+
scatter.legend_elements()[0][::-1],
53+
load_breast_cancer().target_names[::-1],
54+
loc="upper right",
55+
title="Classes",
56+
)
57+
58+
# %%
59+
#
60+
# LDA
61+
# ---
62+
#
63+
64+
lda = LinearDiscriminantAnalysis(n_components=1)
65+
lda.fit(X_scaled, y)
66+
67+
T_lda = lda.transform(X_scaled)
68+
69+
fig, ax = plt.subplots()
70+
ax.scatter(T_lda[:], np.zeros(len(T_lda[:])), c=y)
71+
ax.set(xlabel="LDA$_1$", ylabel="LDA$_2$")
72+
73+
# %%
74+
#
75+
# PCovC
76+
# -------------------
77+
#
78+
# Below, we see the map produced
79+
# by a PCovC model with :math:`\alpha` = 0.5 and a logistic
80+
# regression classifier.
81+
82+
mixing = 0.5
83+
84+
pcovc = PCovC(
85+
mixing=mixing,
86+
n_components=2,
87+
random_state=random_state,
88+
classifier=LogisticRegressionCV(),
89+
)
90+
pcovc.fit(X_scaled, y)
91+
92+
T_pcovc = pcovc.transform(X_scaled)
93+
94+
fig, ax = plt.subplots()
95+
ax.scatter(T_pcovc[:, 0], T_pcovc[:, 1], c=y)
96+
ax.set(xlabel="PCov$_1$", ylabel="PCov$_2$")
97+
98+
# %%
99+
#
100+
# A side-by-side comparison of the
101+
# three maps (PCA, LDA, and PCovC):
102+
103+
fig, axs = plt.subplots(1, 3, figsize=(18, 5))
104+
axs[0].scatter(T_pca[:, 0], T_pca[:, 1], c=y)
105+
axs[0].set_title("PCA")
106+
axs[1].scatter(T_lda, np.zeros(len(T_lda)), c=y)
107+
axs[1].set_title("LDA")
108+
axs[2].scatter(T_pcovc[:, 0], T_pcovc[:, 1], c=y)
109+
axs[2].set_title("PCovC")
110+
plt.show()
Lines changed: 158 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,158 @@
1+
#!/usr/bin/env python
2+
# coding: utf-8
3+
4+
"""
5+
PCovC Hyperparameter Tuning
6+
===========================
7+
"""
8+
# %%
9+
#
10+
11+
import matplotlib.pyplot as plt
12+
from matplotlib.colors import LinearSegmentedColormap
13+
from sklearn.datasets import load_iris
14+
from sklearn.decomposition import PCA
15+
from sklearn.inspection import DecisionBoundaryDisplay
16+
from sklearn.linear_model import LogisticRegressionCV, Perceptron, RidgeClassifierCV
17+
from sklearn.preprocessing import StandardScaler
18+
from sklearn.svm import LinearSVC
19+
20+
from skmatter.decomposition import PCovC
21+
22+
23+
plt.rcParams["image.cmap"] = "tab10"
24+
plt.rcParams["scatter.edgecolors"] = "k"
25+
26+
random_state = 10
27+
n_components = 2
28+
29+
# %%
30+
#
31+
# For this, we will use the :func:`sklearn.datasets.load_iris` dataset from
32+
# ``sklearn``.
33+
34+
X, y = load_iris(return_X_y=True)
35+
36+
scaler = StandardScaler()
37+
X_scaled = scaler.fit_transform(X)
38+
39+
# %%
40+
#
41+
# PCA
42+
# ---
43+
#
44+
45+
pca = PCA(n_components=n_components)
46+
47+
pca.fit(X_scaled, y)
48+
T_pca = pca.transform(X_scaled)
49+
50+
fig, axis = plt.subplots()
51+
scatter = axis.scatter(T_pca[:, 0], T_pca[:, 1], c=y)
52+
axis.set(xlabel="PC$_1$", ylabel="PC$_2$")
53+
axis.legend(
54+
scatter.legend_elements()[0],
55+
load_iris().target_names,
56+
loc="lower right",
57+
title="Classes",
58+
)
59+
60+
# %%
61+
#
62+
# Effect of Mixing Parameter :math:`\alpha` on PCovC Map
63+
# ------------------------------------------------------
64+
#
65+
# Below, we see how different :math:`\alpha` values for our PCovC model
66+
# result in varying class distinctions between setosa, versicolor,
67+
# and virginica on the PCovC map.
68+
69+
n_mixing = 5
70+
mixing_params = [0, 0.25, 0.50, 0.75, 1]
71+
72+
fig, axs = plt.subplots(1, n_mixing, figsize=(4 * n_mixing, 4), sharey="row")
73+
74+
for id in range(0, n_mixing):
75+
mixing = mixing_params[id]
76+
77+
pcovc = PCovC(
78+
mixing=mixing,
79+
n_components=n_components,
80+
random_state=random_state,
81+
classifier=LogisticRegressionCV(),
82+
)
83+
84+
pcovc.fit(X_scaled, y)
85+
T = pcovc.transform(X_scaled)
86+
87+
axs[id].set_xticks([])
88+
axs[id].set_yticks([])
89+
90+
axs[id].set_title(r"$\alpha=$" + str(mixing))
91+
axs[id].set_xlabel("PCov$_1$")
92+
axs[id].scatter(T[:, 0], T[:, 1], c=y)
93+
94+
axs[0].set_ylabel("PCov$_2$")
95+
96+
fig.subplots_adjust(wspace=0)
97+
98+
# %%
99+
#
100+
# Effect of PCovC Classifier on PCovC Map and Decision Boundaries
101+
# ---------------------------------------------------------------
102+
#
103+
# Here, we see how a PCovC model (:math:`\alpha` = 0.5) fitted with
104+
# different classifiers produces varying PCovC maps. In addition,
105+
# we see the varying decision boundaries produced by the
106+
# respective PCovC classifiers.
107+
108+
mixing = 0.5
109+
fig, axs = plt.subplots(1, 4, figsize=(16, 4))
110+
111+
models = {
112+
RidgeClassifierCV(): "Ridge Classification",
113+
LogisticRegressionCV(random_state=random_state): "Logistic Regression",
114+
LinearSVC(random_state=random_state): "Support Vector Classification",
115+
Perceptron(random_state=random_state): "Single-Layer Perceptron",
116+
}
117+
118+
for id in range(0, len(models)):
119+
model = list(models)[id]
120+
121+
pcovc = PCovC(
122+
mixing=mixing,
123+
n_components=n_components,
124+
random_state=random_state,
125+
classifier=model,
126+
)
127+
128+
pcovc.fit(X_scaled, y)
129+
T = pcovc.transform(X_scaled)
130+
131+
graph = axs[id]
132+
graph.set_title(models[model])
133+
134+
DecisionBoundaryDisplay.from_estimator(
135+
estimator=pcovc.classifier_,
136+
X=T,
137+
ax=graph,
138+
response_method="predict",
139+
grid_resolution=1000,
140+
)
141+
142+
scatter = graph.scatter(T[:, 0], T[:, 1], c=y)
143+
144+
graph.set_xlabel("PCov$_1$")
145+
graph.set_xticks([])
146+
graph.set_yticks([])
147+
148+
axs[0].set_ylabel("PCov$_2$")
149+
axs[0].legend(
150+
scatter.legend_elements()[0],
151+
load_iris().target_names,
152+
loc="lower right",
153+
title="Classes",
154+
fontsize=8,
155+
)
156+
157+
fig.subplots_adjust(wspace=0.04)
158+
plt.show()

examples/pcovc/README.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
PCovC
2+
=====

0 commit comments

Comments
 (0)