Skip to content

Commit a60c59d

Browse files
authored
Merge pull request #21 from SingleRust/feature-correction
Add batch processing and masked operations
2 parents fc109e9 + c650524 commit a60c59d

File tree

9 files changed

+1232
-18
lines changed

9 files changed

+1232
-18
lines changed

Cargo.lock

Lines changed: 1 addition & 1 deletion
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

Cargo.toml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
[package]
22
name = "single_algebra"
3-
version = "0.2.1-alpha.0"
3+
version = "0.2.2-alpha.0"
44
edition = "2021"
55
license-file = "LICENSE.md"
66
description = "A linear algebra convenience library for the single-rust library. Can be used externally as well."
@@ -26,6 +26,7 @@ clustering = ["network", "local_moving", "dep:kiddo"]
2626
network = []
2727
local_moving = ["network", "dep:ahash"]
2828
statistics = ["dep:statrs"]
29+
# correction = []
2930

3031
[dependencies]
3132
anyhow = "1.0.95"

README.md

Lines changed: 41 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -7,35 +7,72 @@ The companion algebra library for single-rust, providing powerful matrix operati
77
- Efficient operations on sparse and dense matrices
88
- Dimensionality reduction techniques
99
- Clustering algorithms including Louvain community detection
10+
- Batch processing utilities with masking support
11+
- Statistical analysis and inference
1012
- More features planned!
1113

1214
## Matrix Operations 📊
1315

1416
- SVD decomposition with parallel and LAPACK implementations
1517
- Matrix convenience functions for statistical operations
1618
- Support for both CSR and CSC sparse matrix formats
19+
- Masked operations for selective data processing
20+
- Batch-wise statistics (mean, variance) with flexible batch identifiers
1721

1822
## Clustering 🔍
1923

2024
- Louvain community detection
2125
- Similarity network construction
2226
- K-nearest neighbors graph building
2327
- Local moving algorithm for community refinement
28+
- Leiden clustering implementation (work in progress)
2429

2530
## Dimensionality Reduction ⬇️
2631

2732
- Incremental PCA implementation
2833
- Support for sparse matrices in dimensionality reduction
34+
- SVD-based compression and analysis
2935

30-
## Acknowledgments 🙏
36+
## Statistical Analysis 📈
3137

32-
The Louvain clustering implementation was adapted from [louvain-rs](https://github.com/graphext/louvain-rs/tree/master) written by Juan Morales (crispamares@gmail.com). The original implementation has been modified to better suit the needs of single-algebra.
38+
- Multiple testing correction methods
39+
- Parametric and non-parametric hypothesis testing
40+
- Effect size calculations
41+
- Batch-wise statistical comparisons
3342

3443
## Installation
3544

3645
Add this to your `Cargo.toml`:
3746

3847
```toml
3948
[dependencies]
40-
single-algebra = "0.2.0-alpha.0"
41-
```
49+
single-algebra = "0.2.2-alpha.0"
50+
```
51+
52+
## Batch Processing
53+
54+
The library now includes flexible batch processing capabilities with the `BatchIdentifier` trait, which supports common identifier types:
55+
56+
- String and string slices
57+
- Integer types (i32, u32, usize)
58+
- Custom types (by implementing the trait)
59+
60+
```rust
61+
// Example of batch-wise statistics
62+
let batches = vec!["batch1", "batch2", "batch3"];
63+
let batch_means = matrix.mean_batch_col(&batches)?;
64+
```
65+
66+
## Masked Operations
67+
68+
Selective processing is now available through masked operations:
69+
70+
```rust
71+
// Only process selected columns
72+
let mask = vec![true, false, true, true, false];
73+
let masked_sums = matrix.sum_col_masked(&mask)?;
74+
```
75+
76+
## Acknowledgments 🙏
77+
78+
The Louvain clustering implementation was adapted from [louvain-rs](https://github.com/graphext/louvain-rs/tree/master) written by Juan Morales (crispamares@gmail.com). The original implementation has been modified to better suit the needs of single-algebra.

src/correction/mod.rs

Lines changed: 59 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,59 @@
1+
use crate::utils::BatchIdentifier;
2+
use crate::NumericOps;
3+
use anyhow::Result;
4+
use std::hash::Hash;
5+
use std::ops::AddAssign;
6+
use num_traits::{Float, NumCast};
7+
8+
/// Core trait for batch correction algorithms
9+
pub trait BatchCorrection<T, B>
10+
where
11+
T: NumericOps,
12+
B: BatchIdentifier,
13+
{
14+
/// Fit the correction model to the data and batches
15+
fn fit(&mut self, data: &impl CorrectionMatrix<Item = T>, batches: &[B]) -> Result<()>;
16+
17+
/// Apply correction to data using a previously fitted model
18+
fn transform(
19+
&self,
20+
data: &impl CorrectionMatrix<Item = T>,
21+
) -> Result<impl CorrectionMatrix<Item = T>>;
22+
23+
/// Fit the model and transform the data in a single operation
24+
fn fit_transform(
25+
&mut self,
26+
data: &impl CorrectionMatrix<Item = T>,
27+
batches: &[B],
28+
) -> Result<impl CorrectionMatrix<Item = T>> {
29+
self.fit(data, batches)?;
30+
self.transform(data)
31+
}
32+
}
33+
34+
pub trait CorrectionMatrix: Sized {
35+
type Item: NumericOps + NumCast;
36+
37+
/// Center columns by subtracting column means
38+
fn center_columns<T>(&mut self, means: &[T]) -> anyhow::Result<()>
39+
where
40+
T: Float + NumCast + AddAssign + std::iter::Sum;
41+
42+
/// Center rows by subtracting row means
43+
fn center_rows<T>(&mut self, means: &[T]) -> anyhow::Result<()>
44+
where
45+
T: Float + NumCast + AddAssign + std::iter::Sum;
46+
47+
/// Scale columns by dividing by column scaling factors
48+
fn scale_columns<T>(&mut self, factors: &[T]) -> anyhow::Result<()>
49+
where
50+
T: Float + NumCast + AddAssign + std::iter::Sum;
51+
52+
/// Scale rows by dividing by row scaling factors
53+
fn scale_rows<T>(&mut self, factors: &[T]) -> anyhow::Result<()>
54+
where
55+
T: Float + NumCast + AddAssign + std::iter::Sum;
56+
57+
/// Create a new matrix with the same dimensions and structure
58+
fn like(&self) -> Self;
59+
}

src/lib.rs

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,9 @@ pub mod statistics;
88

99
pub mod dimred;
1010

11+
//#[cfg(feature="correction")]
12+
//pub mod correction;
13+
1114
#[cfg(feature = "clustering")]
1215
pub mod clustering;
1316

0 commit comments

Comments
 (0)