Skip to content

Commit 9539ab4

Browse files
JGarnica22mumichaelazappi
authored
Add STACAS as new method component (#58)
* add method STACAS * add method STACAS * updata changelog * Update: base_r container Co-authored-by: Michaela Müller <[email protected]> * Update: STACAS installation with depedencies, compatible with new R container baser_r:1 Co-authored-by: Luke Zappia <[email protected]> * fix: move STACAS comment below the kBET on New functionality section * fix: remove boilerplate comments for better readability * add: method_types configuration --------- Co-authored-by: Michaela Müller <[email protected]> Co-authored-by: Luke Zappia <[email protected]>
1 parent 5331b9e commit 9539ab4

File tree

3 files changed

+95
-1
lines changed

3 files changed

+95
-1
lines changed

CHANGELOG.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,8 +3,9 @@
33
## New functionality
44

55
* Added `metrics/kbet_pg` and `metrics/kbet_pg_label` components (PR #52).
6+
* Added `methods/stacas` new method (PR #58).
7+
- Add non-supervised version of STACAS tool for integration of single-cell transcriptomics data. This functionality enables correction of batch effects while preserving biological variability without requiring prior cell type annotations.
68
* Added `method/drvi` component (PR #61).
7-
89
* Added `ARI_batch` and `NMI_batch` to `metrics/clustering_overlap` (PR #68).
910

1011
* Added `metrics/cilisi` new metric component (PR #57).

src/methods/stacas/config.vsh.yaml

Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
__merge__: ../../api/comp_method.yaml
2+
name: stacas
3+
label: STACAS
4+
summary: Accurate semi-supervised integration of single-cell transcriptomics data
5+
description: |
6+
STACAS is a method for scRNA-seq integration,
7+
especially suited to accurately integrate datasets with large cell type imbalance
8+
(e.g. in terms of proportions of distinct cell populations).
9+
Prior cell type knowledge, given as cell type labels, can be provided to the algorithm to perform
10+
semi-supervised integration, leading to increased preservation of biological variability
11+
in the resulting integrated space.
12+
STACAS is robust to incomplete cell type labels and can be applied to large-scale integration tasks.
13+
references:
14+
doi: 10.1038/s41467-024-45240-z
15+
# Andreatta M, Hérault L, Gueguen P, Gfeller D, Berenstein AJ, Carmona SJ.
16+
# Semi-supervised integration of single-cell transcriptomics data.
17+
# Nature Communications*. 2024;15(1):1-13. doi:10.1038/s41467-024-45240-z
18+
links:
19+
documentation: https://carmonalab.github.io/STACAS.demo/STACAS.demo.html
20+
repository: https://github.com/carmonalab/STACAS
21+
info:
22+
preferred_normalization: log_cp10k
23+
method_types: [embedding]
24+
resources:
25+
- type: r_script
26+
path: script.R
27+
engines:
28+
- type: docker
29+
image: openproblems/base_r:1
30+
setup:
31+
- type: r
32+
github: carmonalab/[email protected]
33+
runners:
34+
- type: executable
35+
- type: nextflow
36+
directives:
37+
label: [midtime,midmem,midcpu]

src/methods/stacas/script.R

Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,56 @@
1+
requireNamespace("anndata", quietly = TRUE)
2+
suppressPackageStartupMessages({
3+
library(STACAS)
4+
library(Matrix)
5+
library(SeuratObject)
6+
library(Seurat)
7+
})
8+
9+
## VIASH START
10+
par <- list(
11+
input = "resources_test/task_batch_integration/cxg_immune_cell_atlas/dataset.h5ad",
12+
output = "output.h5ad"
13+
)
14+
meta <- list(
15+
name = "stacas"
16+
)
17+
## VIASH END
18+
19+
cat("Reading input file\n")
20+
adata <- anndata::read_h5ad(par[["input"]])
21+
22+
cat("Create Seurat object\n")
23+
# Transpose because Seurat expects genes in rows, cells in columns
24+
counts_r <- Matrix::t(adata$layers[["counts"]])
25+
normalized_r <- Matrix::t(adata$layers[["normalized"]])
26+
# Convert to a regular sparse matrix first and then to dgCMatrix
27+
counts_c <- as(as(counts_r, "CsparseMatrix"), "dgCMatrix")
28+
normalized_c <- as(as(normalized_r, "CsparseMatrix"), "dgCMatrix")
29+
30+
# Create Seurat object with raw counts, these are needed to compute Variable Genes
31+
seurat_obj <- Seurat::CreateSeuratObject(counts = counts_c,
32+
meta.data = adata$obs)
33+
# Manually assign pre-normalized values to the "data" slot
34+
seurat_obj@assays$RNA$data <- normalized_c
35+
36+
cat("Run STACAS\n")
37+
object_integrated <- seurat_obj |>
38+
Seurat::SplitObject(split.by = "batch") |>
39+
STACAS::Run.STACAS()
40+
41+
cat("Store outputs\n")
42+
output <- anndata::AnnData(
43+
uns = list(
44+
dataset_id = adata$uns[["dataset_id"]],
45+
normalization_id = adata$uns[["normalization_id"]],
46+
method_id = meta$name
47+
),
48+
obs = adata$obs,
49+
var = adata$var,
50+
obsm = list(
51+
X_emb = object_integrated@reductions$pca@cell.embeddings
52+
)
53+
)
54+
55+
cat("Write output AnnData to file\n")
56+
output$write_h5ad(par[["output"]], compression = "gzip")

0 commit comments

Comments
 (0)