Skip to content

quantfinlib/skpcp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

skpcp

Robust principal component analysis via Principal Component Pursuit (PCP) with scikit-learn transformer interface. pypi codecov Tests Doc Build python3.11 License: MIT

Installation

pip install skpcp

Getting Started

Principal Component Pursuit (PCP) is a method for decomposing a data matrix X into a low-rank component L and a sparse component S, i.e., X = L + S. The skpcp package provides an implementation of PCP with a scikit-learn compatible transformer interface.

At its core the algorithm solves the following optimization problem $$ \min_{L,S} |L|_* + \lambda |S|1 \quad \text{s.t.} \quad X = L + S $$ where $|L|*$ is the nuclear norm (sum of singular values) of L, $|S|_1$ is the element-wise $\ell_1$ norm of S, and $\lambda > 0$ is a regularization parameter that controls the trade-off between the low-rank and sparse components. In practice, the user does not need to set the value of $\lambda$, as it is automatically chosen based on the dimensions of the input data matrix X. We refer the users to the original paper by Candes et al. (2011) for more details: Robust Principal Component Analysis?.

import numpy as np
from skpcp import PCP

# Generate synthetic data with low-rank and sparse components
RNG = np.random.default_rng(42)
n_samples, n_features, rank = 100, 50, 5
L = np.dot(RNG.normal(size=(n_samples, rank)), RNG.normal(size=(rank, n_features)))  # Low rank component
S = RNG.binomial(1, 0.1, size=(n_samples, n_features)) * RNG.normal(loc=0, scale=10, size=(n_samples, n_features))  # Sparse component
X = L + S

# Fit PCP model
pcp = PCP()
pcp.fit(X)
L_est = pcp.low_rank_  # Estimated low-rank component
S_est = pcp.sparse_  # Estimated sparse component

Alternatively you can use the fit_transform method to fit the model and obtain the low-rank component in one step:

L_est = pcp.fit_transform(X)

Note that the fit method decomposes the input data matrix X into its low-rank component L_est and sparse component S_est. The behavior of the transformmethod of PCP differs from that of a typical scikit-learn transformer, in that it accepts the same data matrix X that was used in fit. You cannot pass a new data matrix to transform, as the decomposition is specific to the input data used in fit.

Please see the examples and the API reference for more details.

The documentation is supported by Sphinx and it is hosted on GitHub pages.

To build the HTML pages locally, first make sure you have installed the package with its documentation dependencies:

uv pip install -e .[docs]

then run the following:

sphinx-build docs docs/_build

About

Robust PCA via Principal Component Pursuit with scikit-learn transformer interface.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages