Skip to content

Commit 81856f1

Browse files
jkobjectlazappi
andauthored
Un-pin scPRINT and update parameters (#51)
Co-authored-by: Luke Zappia <[email protected]>
1 parent 3794a92 commit 81856f1

File tree

4 files changed

+31
-18
lines changed

4 files changed

+31
-18
lines changed

CHANGELOG.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,9 @@
1+
# task_batch_integration devel
2+
3+
## Minor changes
4+
5+
* Un-pin the scPRINT version and update parameters (PR #51)
6+
17
# task_batch_integration 2.0.0
28

39
A major update to the OpenProblems framework, switching from a Python-based framework to a Viash + Nextflow-based framework. This update features the same concepts as the previous version, but with a new implementation that is more flexible, scalable, and maintainable.

_viash.yaml

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -91,7 +91,11 @@ authors:
9191
info:
9292
github: sainirmayi
9393
orcid: 0009-0003-6319-9803
94-
94+
- name: Jeremie Kalfon
95+
roles: [contributor]
96+
info:
97+
github: jkobject
98+
orcid: 0000-0002-2818-9728
9599
config_mods: |
96100
.runners[.type == "nextflow"].config.labels := { lowmem : "memory = 20.Gb", midmem : "memory = 50.Gb", highmem : "memory = 100.Gb", lowcpu : "cpus = 5", midcpu : "cpus = 15", highcpu : "cpus = 30", lowtime : "time = 1.h", midtime : "time = 4.h", hightime : "time = 8.h", veryhightime : "time = 24.h" }
97101

src/methods/scprint/config.vsh.yaml

Lines changed: 4 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -57,7 +57,7 @@ arguments:
5757
- name: --batch_size
5858
type: integer
5959
description: The size of the batches to be used in the DataLoader.
60-
default: 64
60+
default: 32
6161
- name: --max_len
6262
type: integer
6363
description: The maximum length of the gene sequence.
@@ -75,19 +75,15 @@ engines:
7575
setup:
7676
- type: python
7777
pip:
78-
- huggingface_hub
79-
# Can be unpinned after https://github.com/cantinilab/scPRINT/issues/14 is resolved
80-
- scprint==1.6.2
81-
- scdataloader==1.6.4
78+
- scprint
8279
- type: docker
8380
run: lamin init --storage ./main --name main --schema bionty
84-
- type: python
85-
script: import bionty as bt; bt.core.sync_all_sources_to_latest()
8681
- type: docker
8782
run: lamin load anonymous/main
8883
- type: python
8984
script: from scdataloader.utils import populate_my_ontology; populate_my_ontology()
90-
85+
- type: python
86+
script: import bionty as bt; bt.core.sync_all_sources_to_latest()
9187
runners:
9288
- type: executable
9389
- type: nextflow

src/methods/scprint/script.py

Lines changed: 16 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -58,32 +58,39 @@
5858
model_checkpoint_file = hf_hub_download(
5959
repo_id="jkobject/scPRINT", filename=f"{par['model_name']}.ckpt"
6060
)
61-
print(f"Model checkpoint file: '{model_checkpoint_file}'", flush=True)
62-
model = scPrint.load_from_checkpoint(
63-
model_checkpoint_file,
64-
transformer="normal", # Don't use this for GPUs with flashattention
65-
precpt_gene_emb=None,
66-
)
6761

6862
print("\n>>> Embedding data...", flush=True)
6963
if torch.cuda.is_available():
7064
print("CUDA is available, using GPU", flush=True)
7165
precision = "16"
7266
dtype = torch.float16
67+
transformer="flash"
7368
else:
7469
print("CUDA is not available, using CPU", flush=True)
7570
precision = "32"
7671
dtype = torch.float32
77-
n_cores_available = len(os.sched_getaffinity(0))
78-
print(f"Using {n_cores_available} worker cores")
72+
transformer="normal"
73+
74+
print(f"Model checkpoint file: '{model_checkpoint_file}'", flush=True)
75+
model = scPrint.load_from_checkpoint(
76+
model_checkpoint_file,
77+
transformer=transformer, # Don't use this for GPUs with flashattention
78+
precpt_gene_emb=None,
79+
)
80+
81+
n_cores = min(len(os.sched_getaffinity(0)), 24)
82+
print(f"Using {n_cores} worker cores")
7983
embedder = Embedder(
8084
how="random expr",
8185
batch_size=par["batch_size"],
8286
max_len=par["max_len"],
8387
add_zero_genes=0,
84-
num_workers=n_cores_available,
88+
num_workers=n_cores,
8589
doclass=False,
8690
doplot=False,
91+
pred_embedding=["cell_type_ontology_term_id"],
92+
keep_all_cls_pred=False,
93+
output_expression="none",
8794
precision=precision,
8895
dtype=dtype,
8996
)

0 commit comments

Comments
 (0)