Skip to content

Commit dc27e7e

Browse files
Mapmycells (#98)
* mapmycells first batch * 2nd batch of changes * code to test * Assume gene symbols in both adatas in mapmycells script * Add mapmycells to workflow and scripts --------- Co-authored-by: LouisK92 <[email protected]>
1 parent a4e177a commit dc27e7e

File tree

6 files changed

+147
-2
lines changed

6 files changed

+147
-2
lines changed

scripts/run_benchmark/run_full_seqeracloud.sh

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -49,6 +49,7 @@ celltype_annotation_methods:
4949
- ssam
5050
- tacco
5151
- moscot
52+
- mapmycells
5253
expression_correction_methods:
5354
- no_correction
5455
- gene_efficiency_correction

scripts/run_benchmark/run_test_local.sh

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -52,6 +52,7 @@ celltype_annotation_methods:
5252
- ssam
5353
# - tacco
5454
# - moscot
55+
# - mapmycells
5556
expression_correction_methods:
5657
- no_correction
5758
# - gene_efficiency_correction
Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
name: mapmycells
2+
label: "mapmycells"
3+
summary: "Mapping of annotations from single-cell to spatial using moscot"
4+
description: "Mapping of annotations from single-cell to spatial using moscot"
5+
links:
6+
documentation: 'https://github.com/AllenInstitute/cell_type_mapper'
7+
repository: 'https://github.com/AllenInstitute/cell_type_mapper'
8+
references:
9+
doi: "10.1038/s41586-023-06812-z"
10+
11+
__merge__: /src/api/comp_method_cell_type_annotation.yaml
12+
13+
14+
resources:
15+
- type: python_script
16+
path: script.py
17+
18+
engines:
19+
- type: docker
20+
image: openproblems/base_python:1
21+
__merge__:
22+
- /src/base/setup_spatialdata_partial.yaml
23+
- /src/base/setup_txsim_partial.yaml
24+
setup:
25+
- type: python
26+
pypi:
27+
- numpy
28+
- git+https://github.com/AllenInstitute/cell_type_mapper.git
29+
- type: native
30+
31+
runners:
32+
- type: executable
33+
- type: nextflow
34+
directives:
35+
label: [ hightime, midcpu, highmem]
Lines changed: 106 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,106 @@
1+
import anndata as ad
2+
import os
3+
import subprocess
4+
import json
5+
import pandas as pd
6+
from pathlib import Path
7+
## VIASH START
8+
par = {
9+
'input_spatial_normalized_counts': 'resources_test/task_ist_preprocessing/mouse_brain_combined/spatial_normalized_counts.h5ad',
10+
'input_scrnaseq_reference': 'resources_test/task_ist_preprocessing/mouse_brain_combined/scrnaseq_reference.h5ad',
11+
'celltype_key': 'cell_type',
12+
"output": 'spatial_with_celltypes.h5ad'
13+
}
14+
meta = { "temp_dir": './tmp/'}
15+
16+
## VIASH END
17+
18+
TMP_DIR = Path(meta["temp_dir"] or "/tmp/")
19+
TMP_DIR.mkdir(parents=True, exist_ok=True)
20+
21+
adata_sp = ad.read_h5ad(par['input_spatial_normalized_counts'])
22+
adata_sc = ad.read_h5ad(par['input_scrnaseq_reference'])
23+
24+
if "counts" in adata_sc.layers:
25+
adata_sc.X = adata_sc.layers["counts"]
26+
27+
adata_sp.var_names = adata_sp.var_names.astype(str)
28+
adata_sc.var_names = adata_sc.var_names.astype(str)
29+
adata_sp.var_names_make_unique()
30+
adata_sc.var_names_make_unique()
31+
32+
common_genes = list(set(adata_sp.var.index).intersection(adata_sc.var.index))
33+
34+
adata_sc = adata_sc[:, common_genes]
35+
sc_path = os.path.join(meta["temp_dir"],"sc_adata_processed.h5ad")
36+
adata_sc.write_h5ad(sc_path)
37+
sp_path = os.path.join(meta["temp_dir"],"sp_processed.h5ad")
38+
adata_sp[:, common_genes].write_h5ad(sp_path)
39+
40+
41+
42+
precomputed_path = os.path.join(meta["temp_dir"],"precomputed_stats.h5ad")
43+
44+
command = [
45+
"python",
46+
"-m",
47+
"cell_type_mapper.cli.precompute_stats_scrattch",
48+
"--h5ad_path",
49+
sc_path,
50+
"--hierarchy",
51+
"['cell_type']",
52+
"--output_path",
53+
precomputed_path
54+
]
55+
56+
subprocess.run(command)
57+
58+
data = {"None": common_genes}
59+
genes_file_path = os.path.join(meta["temp_dir"],"genes.json")
60+
with open(genes_file_path, "w") as json_file:
61+
json.dump(data, json_file, indent=2)
62+
63+
command = [
64+
"python",
65+
"-m",
66+
"cell_type_mapper.cli.from_specified_markers",
67+
"--query_path",
68+
sp_path,
69+
"--type_assignment.normalization",
70+
"log2CPM",
71+
"--precomputed_stats.path",
72+
precomputed_path,
73+
"--query_markers.serialized_lookup",
74+
genes_file_path,
75+
"--csv_result_path",
76+
os.path.join(meta["temp_dir"],"results.csv"),
77+
"--extended_result_path",
78+
os.path.join(meta["temp_dir"], "extended_results.json"),
79+
"--flatten",
80+
"True",
81+
"--type_assignment.bootstrap_iteration",
82+
"1",
83+
"--type_assignment.bootstrap_factor",
84+
"1.0"
85+
]
86+
87+
subprocess.run(command)
88+
annotation_df = pd.read_csv(os.path.join(meta["temp_dir"],"results.csv"), skiprows=3)
89+
adata_sp.obs[par['celltype_key']] = list(annotation_df['cell_type_label'])
90+
91+
92+
93+
# Delete all temporary files
94+
for file_path in [
95+
sc_path,
96+
sp_path,
97+
precomputed_path,
98+
genes_file_path,
99+
os.path.join(meta["temp_dir"],"results.csv"),
100+
os.path.join(meta["temp_dir"], "extended_results.json")
101+
]:
102+
if os.path.isfile(file_path):
103+
os.remove(file_path)
104+
105+
106+
adata_sp.write_h5ad(par['output'])

src/workflows/run_benchmark/config.vsh.yaml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -98,7 +98,7 @@ argument_groups:
9898
A list of cell type annotation methods to run.
9999
type: string
100100
multiple: true
101-
default: "ssam:tacco:moscot"
101+
default: "ssam:tacco:moscot:mapmycells"
102102
- name: "--expression_correction_methods"
103103
description: |
104104
A list of expression correction methods to run.
@@ -168,6 +168,7 @@ dependencies:
168168
- name: methods_cell_type_annotation/ssam
169169
- name: methods_cell_type_annotation/tacco
170170
- name: methods_cell_type_annotation/moscot
171+
- name: methods_cell_type_annotation/mapmycells
171172
- name: methods_expression_correction/no_correction
172173
- name: methods_expression_correction/gene_efficiency_correction
173174
- name: methods_expression_correction/resolvi_correction

src/workflows/run_benchmark/main.nf

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -374,7 +374,8 @@ workflow run_wf {
374374
cta_methods = [
375375
ssam,
376376
tacco,
377-
moscot
377+
moscot,
378+
mapmycells
378379
]
379380

380381
cta_ch = normalization_ch

0 commit comments

Comments
 (0)