Skip to content

Commit dfbf973

Browse files
authored
Reduce sp and sc data to shared genes (#59)
1 parent 8297ead commit dfbf973

File tree

2 files changed

+10
-7
lines changed

2 files changed

+10
-7
lines changed

src/data_processors/process_dataset/script.py

Lines changed: 6 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -19,13 +19,12 @@
1919
# Load the spatial data
2020
sdata = sd.read_zarr(par["input_sp"])
2121

22-
# Subset the single-cell data to spatial genes
23-
genes_sp = []
24-
for key in sdata.tables.keys():
25-
# todo: var column names need to be updated to match the rest of openproblems
26-
genes_sp = genes_sp + sdata.tables[key].var_names.tolist()
27-
genes_sp = list(np.unique(genes_sp))
28-
adata = adata[:,adata.var["feature_name"].isin(genes_sp)].copy()
22+
# Subset single-cell and spatial data to shared genes
23+
sp_genes = sdata['transcripts']['feature_name'].unique().compute().tolist()
24+
sc_genes = adata.var["feature_name"].unique().tolist()
25+
shared_genes = list(set(sp_genes) & set(sc_genes))
26+
sdata['transcripts'] = sdata['transcripts'].loc[sdata['transcripts']['feature_name'].isin(shared_genes)]
27+
adata = adata[:,adata.var["feature_name"].isin(shared_genes)].copy()
2928

3029
# Use feature names for adata instead of feature ids. convert to str
3130
adata.var.reset_index(inplace=True, drop=True)

src/workflows/process_datasets/test.sh

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,9 @@
11
#!/bin/bash
22

3+
# NOTE: For local testing you might need to reduce the memory in src/data_processors/process_dataset/config.vsh.yaml
4+
# Don't forget to rebuild that dependency of the workflow:
5+
# viash ns build src/data_processors/process_dataset/config.vsh.yaml --setup cachedbuild
6+
37
nextflow run . \
48
-main-script target/nextflow/workflows/process_datasets/main.nf \
59
-profile docker \

0 commit comments

Comments
 (0)