RustPipe-SC

Fast single-cell and spatial transcriptomics in Rust. Accelerates the compute-bound steps of scRNA-seq analysis (QC, normalization, HVG, PCA, kNN, Leiden clustering, marker detection) while maintaining concordance with Scanpy.

Designed to run downstream of Seqera / nf-core/scrnaseq pipelines that output H5AD or 10x CellRanger H5 files.

Performance

Measured on Apple M3 Pro, single-threaded where noted. All timings are wall-clock.

PBMC 3K (2,700 cells x 32,738 genes)

Step	Scanpy	RustPipe-SC	Speedup
QC + Filter	76 ms	88 ms	~1x
Normalize	11 ms	10 ms	~1x
HVG	305 ms	12 ms	25x
Scale + PCA	4,150 ms	150 ms	28x
kNN (k=15)	7,501 ms	31 ms	242x
Leiden	531 ms	17 ms	31x
Markers	—	155 ms	—
Total pipeline	12,574 ms	526 ms	24x

Scaling (synthetic data, 15K genes)

Cells	Normalize	HVG	Scale+PCA	kNN	Leiden	Total
3K	0.03s	0.03s	0.18s	0.04s	0.06s	0.38s
10K	0.09s	0.13s	0.62s	0.36s	0.23s	1.57s
20K	0.34s	0.37s	1.16s	1.41s	0.81s	4.43s
50K	0.64s	0.95s	2.58s	11.31s	3.80s	20.13s

At 10K cells, RustPipe-SC achieves 22x speedup over Scanpy on compute-bound steps (PCA, kNN, Leiden, markers). The value compounds with iterative workflows: a 20-parameter sweep at 10K cells saves 3+ minutes.

Concordance with Scanpy

Metric	Value
PCA PC1-5 Pearson \|r\|	1.000000
Leiden ARI	0.90
QC cell count (min_genes=200, max_mt=5%)	2643/2700 (exact match)

Input Formats

Format	Extension	Source
H5AD (AnnData)	`.h5ad`	Scanpy, nf-core/scrnaseq
10x CellRanger H5	`.h5`	CellRanger, cellranger-arc
CSV (cells x genes)	`.csv`	Custom pipelines
CSV (genes x cells)	`.csv`	Legacy format (auto-transposed)

Auto-detection: .h5 and .h5ad files are automatically detected as 10x or AnnData format.

HDF5 Support

H5AD and 10x H5 reading requires the hdf5 feature and a system HDF5 library:

# macOS
brew install hdf5

# Build with HDF5 support
HDF5_DIR=$(brew --prefix hdf5) cargo build --release --features hdf5

Without the hdf5 feature, CSV input is still supported.

Installation

git clone https://github.com/jwg054000/RustPipe-SC.git
cd rustpipe-sc

# Without HDF5 (CSV only)
cargo build --release

# With HDF5 support (recommended)
HDF5_DIR=$(brew --prefix hdf5) cargo build --release --features hdf5

Usage

Full Pipeline

# H5AD input (Scanpy/nf-core output)
rustpipe-sc pipeline \
  --input sample.h5ad \
  --output results/ \
  --n-hvg 2000 --n-pcs 50 --knn-k 15

# 10x CellRanger H5 input
rustpipe-sc pipeline \
  --input filtered_feature_bc_matrix.h5 \
  --output results/ \
  --max-pct-mt 20.0

# CSV input
rustpipe-sc pipeline \
  --input expression.csv \
  --output results/

Individual Steps

# QC metrics + cell filtering
rustpipe-sc qc --input raw.h5ad --output qc/ --min-genes 200 --max-pct-mt 5.0

# Normalize + log1p
rustpipe-sc normalize --input qc/filtered.csv --output norm/

# HVG selection (Seurat v3 method)
rustpipe-sc hvg --input norm/normalized.csv --output hvg/ --n-top-genes 2000

# PCA
rustpipe-sc pca --input hvg/expression_hvg.csv --output pca/ --n-components 50

# kNN graph
rustpipe-sc knn --input pca/pca_scores.csv --output knn/ --k 15

# Leiden clustering
rustpipe-sc leiden --input knn/knn.csv --output clusters/ --resolution 1.0

# Spatial autocorrelation (Moran's I)
rustpipe-sc morans-i \
  --input expression.csv \
  --coords spatial_coords.csv \
  --output spatial/

Global Options

--threads N    Number of threads (0 = all cores, default: 0)
--seed N       Random seed for reproducibility (default: 42)
-v/-vv/-vvv    Verbosity level

Pipeline Outputs

output/
  qc_metrics.csv          # Per-cell QC: n_genes, total_counts, pct_mt
  filtered_cells.csv      # Cells passing QC
  normalized.csv          # Library-size normalized, log1p transformed
  hvg_genes.csv           # Selected highly variable genes
  pca_scores.csv          # PCA embeddings (cells x PCs)
  pca_loadings.csv        # Gene loadings per PC
  pca_variance.csv        # Variance explained per PC
  knn.csv                 # k-nearest neighbor graph
  clusters.csv            # Leiden cluster assignments
  markers.csv             # Cluster marker genes (Wilcoxon rank-sum)

Architecture

src/
  main.rs       CLI + pipeline orchestration
  h5ad.rs       H5AD and 10x H5 I/O (feature-gated)
  io.rs         CSV I/O
  sparse.rs     CSR sparse matrix ops, implicit-centering matmul
  qc.rs         Per-cell QC metrics, MAD filtering
  normalize.rs  Library-size normalization + log1p
  hvg_sc.rs     Seurat v3 HVG selection
  scale.rs      Z-score scaling with clip
  pca.rs        Randomized SVD (dense path, sparse-ready)
  knn.rs        Brute-force k-nearest neighbors
  graph.rs      CSR graph, kNN→graph conversion
  leiden.rs     Leiden community detection
  markers.rs    Wilcoxon rank-sum marker gene detection
  morans_i.rs   Moran's I spatial autocorrelation
  gsea.rs       Gene set enrichment analysis
  stats_sc.rs   Statistical utilities (Welford, BH, MAD)

Seqera Integration

RustPipe-SC reads the standard output formats from nf-core/scrnaseq:

nf-core/scrnaseq (cellranger/alevin/starsolo)
  --> filtered_feature_bc_matrix.h5  (10x format)
  --> adata.h5ad                      (AnnData format)
      --> rustpipe-sc pipeline --input adata.h5ad --output results/

Testing

cargo test                    # 104 unit tests, ~0.06s
cargo test --features hdf5    # Includes H5AD I/O tests (requires HDF5)

Known Limitations

kNN is brute-force O(n^2): fast up to ~50K cells, needs approximate NN (HNSW) for larger
No UMAP embedding (use Scanpy or umap-rs downstream)
HVG Seurat v3 overlap with Scanpy is ~50% due to binning boundary sensitivity
Spatial segmentation not yet implemented

License

GPL-3.0 - see LICENSE

Citation

Prairie Bio, 2026. RustPipe-SC: Fast single-cell and spatial transcriptomics in Rust.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
src		src
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RustPipe-SC

Performance

PBMC 3K (2,700 cells x 32,738 genes)

Scaling (synthetic data, 15K genes)

Concordance with Scanpy

Input Formats

HDF5 Support

Installation

Usage

Full Pipeline

Individual Steps

Global Options

Pipeline Outputs

Architecture

Seqera Integration

Testing

Known Limitations

License

Citation

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

RustPipe-SC

Performance

PBMC 3K (2,700 cells x 32,738 genes)

Scaling (synthetic data, 15K genes)

Concordance with Scanpy

Input Formats

HDF5 Support

Installation

Usage

Full Pipeline

Individual Steps

Global Options

Pipeline Outputs

Architecture

Seqera Integration

Testing

Known Limitations

License

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages