Conversation
|
I ran through the whole workflow with the simulated data successfully, so this is now ready for review. |
sjspielman
left a comment
There was a problem hiding this comment.
This looks good to me! Not much to say about the nextflow code for a pretty standard "one process to do the thing" situation, and you got this memo ✅ #178 (comment)
FYI, I didn't carefully review the Python code since I'm assuming it was well-reviewed nextdoor in the analysis repo, let me know if you want me to have a closer look anywhere?
So in the end, my main comment is to restore the workflow bits you commented out during testing.
|
|
||
| // cell type scimilarity | ||
| cell_type_scimilarity_model = 's3://scpca-references/celltype/scimilarity_references/model_v1.1' | ||
| cell_type_scimilarity_ontology_ref_file = 'https://raw.githubusercontent.com/AlexsLemonade/OpenScPCA-analysis/refs/heads/main/analyses/cell-type-scimilarity/references/scimilarity-mapped-ontologies.tsv' |
There was a problem hiding this comment.
noting we'll want to update this one too with a tagged link, same as my NB urls above
main.nf
Outdated
|
|
||
| // Run the merge workflow | ||
| merge_sce(sample_ch) | ||
| //merge_sce(sample_ch) |
| --processed_h5ad_file \$file \ | ||
| --ontology_map_file ${ontology_map_file} \ | ||
| --predictions_tsv \$(basename \${file%_rna.h5ad}_scimilarity-celltype-assignments.tsv.gz) \ | ||
| --seed 2025 |
There was a problem hiding this comment.
This is assigned in the file so you probably don't need it, but doesn't
| { | ||
| "barcode": processed_anndata.obs_names.to_list(), | ||
| "scimilarity_celltype_annotation": predictions.values, | ||
| "min_dist": nn_stats["min_dist"], |
There was a problem hiding this comment.
Do we need this column in this workflow?
There was a problem hiding this comment.
Yes! This is a stat we are going to use to measure confidence, as recommended by SCimilarity docs, so we want to output it so we can use it for exploratory analysis.
Co-authored-by: Stephanie Spielman <stephanie.spielman@gmail.com>
|
@sjspielman I ran this through on the real data and all samples completed successfully. I also checked and the results files are now present in the staging bucket. I restored running the other modules and added a TODO about updating to use the tagged link. This should be ready for another look. Edit: I meant to say that you do not need to review the python script since it is copied exactly from the script that was reviewed in the analysis repo. |
Closes #170
Here I'm adding the module to run
SCimilarityon all of the samples. There's only one step here which is just to runSCimilarityand output the annotations as a TSV file. I copied the script that does this fromOpenScPCA-analysiswithout any modifications so the main code to review here is the addition to Nextflow.OpenScPCA-analysis. Since the model file is quite big, I added an empty folder for stub testing and added that path to thestubprofile.h5adfiles for RNA only, so I do have a step that should filter out any adt files.I am filing this as a draft because I'm having issues getting Nextflow to run the script in the container inside the conda environment. The way that Nextflow launches and runs the image means the conda environment that's installed isn't getting used by default and so it can't find the packages we use in the script. I think I found a solution to when we build the environment to set the default python to the conda environment so that it will work with Nextflow that I'll file as a PR in
OpenScPCA-analysis.Once I'm able to confirm this runs, then I'll request formal review.