A scikit-learn compatible implementation of Weighted Principal Component Analysis
WeightedPCA is an extension to scikit-learn that implements Weighted Principal Component Analysis.
It follows the scikit-learn API conventions, making it a drop-in replacement for
sklearn.decomposition.PCA when you need to assign different weights to your samples.
This is a simplified implementation that supports sample-wise (row) weighting only. Each sample can have a different weight, but all features within a sample share the same weight. For full element-wise weighting (where each individual measurement can have its own weight), see the wpca package, which implements the complete Delchambre (2014) algorithm.
This package is not part of scikit-learn, but is designed to be compatible with the scikit-learn ecosystem:
- Following scikit-learn's
fit/transform/fit_transformAPI
pip install weightedpcaOr install from source:
git clone https://github.com/byoungj/weightedpca.git
cd weightedpca
pip install -e .import numpy as np
from weightedpca import WeightedPCA
# Your data
X = np.array([[1, 2], [3, 4], [5, 6], [7, 8]])
# Sample weights (e.g., based on data quality or importance)
weights = np.array([1.0, 1.0, 2.0, 2.0])
# Fit weighted PCA
wpca = WeightedPCA(n_components=2)
wpca.fit(X, sample_weight=weights)
# Transform data
X_transformed = wpca.transform(X)
# Or use fit_transform
X_transformed = wpca.fit_transform(X, sample_weight=weights)WeightedPCA can be used in pipelines. Note that sample_weight should be
applied during the WeightedPCA fitting step before the pipeline:
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from weightedpca import WeightedPCA
# Fit WeightedPCA separately with sample weights
wpca = WeightedPCA(n_components=10)
X_reduced = wpca.fit_transform(X, sample_weight=weights)
# Then use pipeline for downstream processing
pipeline = Pipeline([
('scaler', StandardScaler()),
('classifier', LogisticRegression())
])
pipeline.fit(X_reduced, y)from sklearn.decomposition import PCA
from weightedpca import WeightedPCA
import numpy as np
X = np.random.randn(100, 10)
weights = np.random.uniform(0.5, 1.5, 100)
# Standard PCA (unweighted)
pca = PCA(n_components=5)
X_pca = pca.fit_transform(X)
# Weighted PCA
wpca = WeightedPCA(n_components=5)
X_wpca = wpca.fit_transform(X, sample_weight=weights)
print(f"Standard PCA explained variance: {pca.explained_variance_ratio_}")
print(f"Weighted PCA explained variance: {wpca.explained_variance_ratio_}")Weighted PCA is useful when:
- Class Imbalance: Weight samples to balance class representation
- Importance Differs: Certain observations should have more influence
- Data Quality Varies: Some samples are more reliable than others
- Uncertainty Quantification: Use inverse variance as weights
- Temporal Data: Weight recent observations more heavily
This package is an independent extension to scikit-learn, not part of the core library. We follow scikit-learn's API design principles, coding conventions, and documentation standards.
However, we are not affiliated with or endorsed by the scikit-learn project. For the official scikit-learn PCA implementation, see sklearn.decomposition.PCA.
This project is licensed under the BSD 3-Clause License - the same license as scikit-learn. See the LICENSE file for details.
- Delchambre, L. (2014). "Weighted Principal Component Analysis: A Weighted Covariance Eigendecomposition Approach" arXiv:1412.4533
- wpca package (full element-wise implementation): https://github.com/jakevdp/wpca
- scikit-learn documentation: https://scikit-learn.org