Include SCimilarity when assigning consensus cell types#184
Include SCimilarity when assigning consensus cell types#184allyhawkins merged 27 commits intomainfrom
Conversation
|
Just noting that I was able to run this through on the simulated data successfully: |
jashapiro
left a comment
There was a problem hiding this comment.
Overall this looks good to me.
I had a couple questions, the main one being if we want to change the output of the script here to use the same consensus_celltype_annotation as in the final SCE files instead of consensus_annotation. I know this might require other downstream changes, but the inconsistency kind of bothers me, and took me some time to figure out.
The other is a little side question about passing empty arrays instead of NO_FILE files. I am happy to let that one pass and come back to it though!
modules/cell-type-consensus/main.nf
Outdated
| for library_id in ${library_ids.join(" ")}; do | ||
| # find files that have the appropriate library id in file name | ||
| sce_file=\$(ls ${library_files} | grep "\${library_id}") | ||
| scimilarity_file=\$(ls ${scimilarity_files} | grep "\${library_id}") |
There was a problem hiding this comment.
If scimilarity_files is just NO_FILE , then this will be ""; just checking that will work... (But also noting my suggestion to maybe not pass NO_FILE)
There was a problem hiding this comment.
I ran this through with simulated data for all libraries so just wanted to note that this does work.
| dplyr::select( | ||
| panglao_ontology, | ||
| original_panglao_name, | ||
| blueprint_ontology, | ||
| consensus_annotation, | ||
| consensus_ontology | ||
| ) | ||
| cellassign_celltype_annotation = original_panglao_name, | ||
| singler_celltype_ontology = blueprint_ontology, | ||
| scimilarity_celltype_ontology = scimilarity_ontology, | ||
| starts_with(consensus_column_prefix) | ||
| ) |> | ||
| # now just filter to join columns and get unique combinations | ||
| dplyr::select(all_of(join_columns), starts_with(consensus_column_prefix)) |> |
There was a problem hiding this comment.
I'm confused by the two select statements in a row. I assume this is because you need to rename? If so, maybe change the first one to just dplyr:rename().
| # use unknown for NA annotation but keep ontology ID as NA | ||
| # if the sample type is cell line, keep as NA | ||
| dplyr::mutate(consensus_annotation = dplyr::if_else(is.na(consensus_annotation) & (!stringr::str_detect(sample_type, "cell line")), "Unknown", consensus_annotation)) | ||
| dplyr::mutate(consensus_annotation = dplyr::if_else(is.na(consensus_annotation) & (sample_type != "cell line"), "Unknown", consensus_annotation)) |
There was a problem hiding this comment.
Is there a reason you changed this to require an exact match?
There was a problem hiding this comment.
Ah good catch! This is because I copied the script from the analysis repo, which had the old code that did not handle multiple sample types. I'll revert this change and also make sure it's up to date in the analysis repo version of this script.
| # rename old consensus cell type columns if they are present | ||
| if ("consensus_celltype_annotation" %in% colnames(all_assignments_df)) { | ||
| all_assignments_df <- all_assignments_df |> | ||
| # rename old consensus columns to avoid confusion | ||
| dplyr::rename( | ||
| singler_cellassign_consensus_annotation = consensus_celltype_annotation, | ||
| singler_cellassign_consensus_ontology = consensus_celltype_ontology | ||
| ) | ||
| } else { | ||
| # if no consensus from the object, set to NA | ||
| all_assignments_df <- all_assignments_df |> | ||
| dplyr::mutate( | ||
| singler_cellassign_consensus_annotation = NA, | ||
| singler_cellassign_consensus_ontology = NA | ||
| ) | ||
| } |
There was a problem hiding this comment.
Do we want to do this at the start instead? Part of this is me wondering if we want to make the output column from this script consensus_celltype_annotation for consistency.
There was a problem hiding this comment.
Part of this is me wondering if we want to make the output column from this script consensus_celltype_annotation for consistency.
I just want to note here, this change would break a lot of other spots in OpenScPCA-analysis, so we might want to get a sense of how much first. This is just from a real quick 'n dirty search for consensus_annotation, aka some of these hits may not be parsing cell-type-consensus output, but it's a starting point! https://github.com/search?q=repo%3AAlexsLemonade%2FOpenScPCA-analysis+consensus_annotation&type=code
There was a problem hiding this comment.
I had thought about this when working on it, but I was also concerned about breaking things downstream that use it. I do think we should prioritize how often these columns are used in scripts that get run in CI vs just exploratory notebooks if we update it though.
I'm honestly 50/50 on if we should update it or not, so if others have strong opinions I'm fine with that. I agree its annoying, but its also helpful to distinguish which column is from the processed objects vs. which column is from the module and prevents any clashes when running this module on processed objects with existing consensus cell types.
I also want to note that eventually the column will be updated in the processed objects that get read in so that it contains the consensus from all three methods. So we should probably update the column names here to be existing_consensus_celltype or something more generic just indicates its the consensus cell types from the object.
There was a problem hiding this comment.
That makes a lot of sense to me. Absolutely fine to leave consensus_annotation as the output column name here. I would still probably rename at the start though.
Co-authored-by: Joshua Shapiro <josh.shapiro@ccdatalab.org>
|
@jashapiro I made the small updates to the script you recommended, including moving renaming the older consensus cell type columns. As part of that I renamed them to I did not yet change the name from The other thing I'm going to try here is just providing the |
modules/cell-type-consensus/resources/usr/bin/assign-consensus-celltypes.R
Show resolved
Hide resolved
Co-authored-by: Joshua Shapiro <josh.shapiro@ccdatalab.org>
This reverts commit e6057a4.
|
@jashapiro I was able to succesfully run this through with the real data so this is now ready for re-review. I did have to make a small change to the scimilarity module because we hadn't run it since changing to use the |
| # by default use the lca between cellassign and singler as the consensus cell type | ||
| consensus_column_prefix <- "cellassign_singler_pair" |
There was a problem hiding this comment.
You changed this at my suggestion then changed it back, so I am confused about what exactly is going on here? I would assume that you would want to keep the existing consensus if there were modifications to be made? Or is this coming from somewhere else?
There was a problem hiding this comment.
So this prefix is used to specify which column to grab from the consensus cell type reference. If no scimilarity is present, then the consensus cell type is from the cellassign_singler_pair column in the reference file. If scimilarity is present, then the consensus cell type is from the main consensus_annotation column. So this is separate from naming the ouptut columns.
jashapiro
left a comment
There was a problem hiding this comment.
LGTM, with just a couple comment updates.
| sample_id, | ||
| project_id, | ||
| sce_files, | ||
| scimilarity_files ?: [] |
modules/cell-type-consensus/resources/usr/bin/assign-consensus-celltypes.R
Outdated
Show resolved
Hide resolved
modules/cell-type-consensus/resources/usr/bin/assign-consensus-celltypes.R
Outdated
Show resolved
Hide resolved
Co-authored-by: Joshua Shapiro <josh.shapiro@ccdatalab.org>
Closes #181
Here I'm updating the
cell-type-consensusmodule to pass in the output ofcell-type-scimilarity. This was mostly straight forward, but I did have to account for the scimilarity output file being optional. We don't have them for multiplexed samples, so I join usingremainder: trueand then if the files list is empty, I pass in a dummy file name. Within the script itself,SCimilarityonly gets used if the annotations file is found.For the Rscript, I copied over the updates from
04-assign-consensus-celltypes.R. The script is exactly the same except I updated the output columns to now have both the new and old consensus cell types.consensus_annotationandconsensus_ontology, which is the same column names we were using in this module previouslyconsensus_celltype_annotationandconsensus_celltype_ontology. If they were present I renamed those columns to besingler_cellassign_celltype_annotationandsingler_cellassign_celltype_ontology. If they aren't present then I just fill them in with NA.I also added in the new marker gene file for all consensus cell types as a parameter and argument to the script.
One other note is that I'm currently using permalinks for all the references, but I'll file an issue to update to tagged links once we have the new release of
OpenScPCA-analysis.