Include SCimilarity when assigning consensus cell types by allyhawkins · Pull Request #184 · AlexsLemonade/OpenScPCA-nf

allyhawkins · 2025-09-09T19:23:51Z

Closes #181

Here I'm updating the cell-type-consensus module to pass in the output of cell-type-scimilarity. This was mostly straight forward, but I did have to account for the scimilarity output file being optional. We don't have them for multiplexed samples, so I join using remainder: true and then if the files list is empty, I pass in a dummy file name. Within the script itself, SCimilarity only gets used if the annotations file is found.

For the Rscript, I copied over the updates from 04-assign-consensus-celltypes.R. The script is exactly the same except I updated the output columns to now have both the new and old consensus cell types.

the new consensus cell types can now be found in consensus_annotation and consensus_ontology, which is the same column names we were using in this module previously
the old consensus cell types were present in the processed objects in consensus_celltype_annotation and consensus_celltype_ontology. If they were present I renamed those columns to be singler_cellassign_celltype_annotation and singler_cellassign_celltype_ontology. If they aren't present then I just fill them in with NA.

I also added in the new marker gene file for all consensus cell types as a parameter and argument to the script.

One other note is that I'm currently using permalinks for all the references, but I'll file an issue to update to tagged links once we have the new release of OpenScPCA-analysis.

This reverts commit 9fae3e7.

This reverts commit f978b80.

allyhawkins · 2025-09-09T19:45:22Z

Just noting that I was able to run this through on the simulated data successfully:
https://cloud.seqera.io/orgs/CCDL/workspaces/OpenScPCA/watch/1NrdtwlYFlqsIr/v2/tasks

jashapiro

Overall this looks good to me.

I had a couple questions, the main one being if we want to change the output of the script here to use the same consensus_celltype_annotation as in the final SCE files instead of consensus_annotation. I know this might require other downstream changes, but the inconsistency kind of bothers me, and took me some time to figure out.

The other is a little side question about passing empty arrays instead of NO_FILE files. I am happy to let that one pass and come back to it though!

modules/cell-type-consensus/main.nf

jashapiro · 2025-09-10T14:58:42Z

modules/cell-type-consensus/main.nf

+    for library_id in ${library_ids.join(" ")}; do
+      # find files that have the appropriate library id in file name
+      sce_file=\$(ls ${library_files} | grep "\${library_id}")
+      scimilarity_file=\$(ls ${scimilarity_files} | grep "\${library_id}")


If scimilarity_files is just NO_FILE , then this will be ""; just checking that will work... (But also noting my suggestion to maybe not pass NO_FILE)

I ran this through with simulated data for all libraries so just wanted to note that this does work.

modules/cell-type-consensus/main.nf

jashapiro · 2025-09-10T15:05:05Z

modules/cell-type-consensus/resources/usr/bin/assign-consensus-celltypes.R

  dplyr::select(
    panglao_ontology,
-    original_panglao_name,
-    blueprint_ontology,
-    consensus_annotation,
-    consensus_ontology
-  )
+    cellassign_celltype_annotation = original_panglao_name,
+    singler_celltype_ontology = blueprint_ontology,
+    scimilarity_celltype_ontology = scimilarity_ontology,
+    starts_with(consensus_column_prefix)
+  ) |>
+  # now just filter to join columns and get unique combinations
+  dplyr::select(all_of(join_columns), starts_with(consensus_column_prefix)) |>


I'm confused by the two select statements in a row. I assume this is because you need to rename? If so, maybe change the first one to just dplyr:rename().

jashapiro · 2025-09-10T15:08:08Z

modules/cell-type-consensus/resources/usr/bin/assign-consensus-celltypes.R

  # use unknown for NA annotation but keep ontology ID as NA
  # if the sample type is cell line, keep as NA
-  dplyr::mutate(consensus_annotation = dplyr::if_else(is.na(consensus_annotation) & (!stringr::str_detect(sample_type, "cell line")), "Unknown", consensus_annotation))
+  dplyr::mutate(consensus_annotation = dplyr::if_else(is.na(consensus_annotation) & (sample_type != "cell line"), "Unknown", consensus_annotation))


Is there a reason you changed this to require an exact match?

Ah good catch! This is because I copied the script from the analysis repo, which had the old code that did not handle multiple sample types. I'll revert this change and also make sure it's up to date in the analysis repo version of this script.

jashapiro · 2025-09-10T15:15:31Z

modules/cell-type-consensus/resources/usr/bin/assign-consensus-celltypes.R

+# rename old consensus cell type columns if they are present
+if ("consensus_celltype_annotation" %in% colnames(all_assignments_df)) {
+  all_assignments_df <- all_assignments_df |>
+    # rename old consensus columns to avoid confusion
+    dplyr::rename(
+      singler_cellassign_consensus_annotation = consensus_celltype_annotation,
+      singler_cellassign_consensus_ontology = consensus_celltype_ontology
+    )
+} else {
+  # if no consensus from the object, set to NA
+  all_assignments_df <- all_assignments_df |>
+    dplyr::mutate(
+      singler_cellassign_consensus_annotation = NA,
+      singler_cellassign_consensus_ontology = NA
+    )
+}


Do we want to do this at the start instead? Part of this is me wondering if we want to make the output column from this script consensus_celltype_annotation for consistency.

Part of this is me wondering if we want to make the output column from this script consensus_celltype_annotation for consistency.

I just want to note here, this change would break a lot of other spots in OpenScPCA-analysis, so we might want to get a sense of how much first. This is just from a real quick 'n dirty search for consensus_annotation, aka some of these hits may not be parsing cell-type-consensus output, but it's a starting point! https://github.com/search?q=repo%3AAlexsLemonade%2FOpenScPCA-analysis+consensus_annotation&type=code

I had thought about this when working on it, but I was also concerned about breaking things downstream that use it. I do think we should prioritize how often these columns are used in scripts that get run in CI vs just exploratory notebooks if we update it though.

I'm honestly 50/50 on if we should update it or not, so if others have strong opinions I'm fine with that. I agree its annoying, but its also helpful to distinguish which column is from the processed objects vs. which column is from the module and prevents any clashes when running this module on processed objects with existing consensus cell types.

I also want to note that eventually the column will be updated in the processed objects that get read in so that it contains the consensus from all three methods. So we should probably update the column names here to be existing_consensus_celltype or something more generic just indicates its the consensus cell types from the object.

That makes a lot of sense to me. Absolutely fine to leave consensus_annotation as the output column name here. I would still probably rename at the start though.

Co-authored-by: Joshua Shapiro <josh.shapiro@ccdatalab.org>

allyhawkins · 2025-09-10T16:40:28Z

@jashapiro I made the small updates to the script you recommended, including moving renaming the older consensus cell type columns. As part of that I renamed them to existing_celltype_annotation and existing_celltype_ontology to future proof for when the objects contain all three methods. I don't want to get confused down the line if we name it singler_cellassign and it doesn't match up with what's actually in the column.

I did not yet change the name from consensus_annotation to consensus_celltype_annotation. I didn't want to do this just yet, because this is the name of the column in the reference files and everything in the existing cell-type-consensus module in OpenScPCA-nf. It also helps distinguish from what's in the processed objects. So unless you feel really strong about changing it, I think we should keep it.

The other thing I'm going to try here is just providing the [] instead of an empty file. I had run this through with the simulated data thinking that what I had here worked, but I realized the simulated data actually has _processed_rna.h5ad for multiplexed samples. So the only way to test that this works is to use the real data. I'm going to test that change with the full scpca data and see what happens.

modules/cell-type-consensus/resources/usr/bin/assign-consensus-celltypes.R

Co-authored-by: Joshua Shapiro <josh.shapiro@ccdatalab.org>

This reverts commit e6057a4.

allyhawkins · 2025-09-11T13:58:49Z

@jashapiro I was able to succesfully run this through with the real data so this is now ready for re-review.

I did have to make a small change to the scimilarity module because we hadn't run it since changing to use the path(*_scimilarity.tsv) formatting for specifying output. Previously it was running on multiplexed samples just not creating an output file and so there was no error since we define the output file. Turns out we need to actually filter out any libraries that return an empty list for the files before trying to pass it through the process.

jashapiro · 2025-09-12T14:12:10Z

modules/cell-type-consensus/resources/usr/bin/assign-consensus-celltypes.R

+# by default use the lca between cellassign and singler as the consensus cell type
+consensus_column_prefix <- "cellassign_singler_pair"


You changed this at my suggestion then changed it back, so I am confused about what exactly is going on here? I would assume that you would want to keep the existing consensus if there were modifications to be made? Or is this coming from somewhere else?

So this prefix is used to specify which column to grab from the consensus cell type reference. If no scimilarity is present, then the consensus cell type is from the cellassign_singler_pair column in the reference file. If scimilarity is present, then the consensus cell type is from the main consensus_annotation column. So this is separate from naming the ouptut columns.

jashapiro

LGTM, with just a couple comment updates.

jashapiro · 2025-09-12T17:17:21Z

modules/cell-type-consensus/main.nf

+        sample_id,
+        project_id,
+        sce_files,
+        scimilarity_files ?: []


glad to know this works!

modules/cell-type-consensus/resources/usr/bin/assign-consensus-celltypes.R

Co-authored-by: Joshua Shapiro <josh.shapiro@ccdatalab.org>

allyhawkins added 10 commits September 8, 2025 15:54

use new consensus params

db1f061

include scimilarity as input to consensus

a667b77

account for scimilarity in script for assigning

32446e9

account for no scimilarity with multiplexed

b12650c

temporarily comment out other modules for testing

9fae3e7

more commenting

f978b80

remove extra parenthesis

1da291f

account for no old consensus

cd0e5da

note addition of scimilarity in readme

c0025d7

revert accounting for multiple sample types

8922cf9

allyhawkins mentioned this pull request Sep 9, 2025

Account for multiple sample types in consensus cell type assignment AlexsLemonade/OpenScPCA-analysis#1332

Merged

allyhawkins added 2 commits September 9, 2025 14:44

Revert "temporarily comment out other modules for testing"

fbd8038

This reverts commit 9fae3e7.

Revert "more commenting"

7e15c0c

This reverts commit f978b80.

allyhawkins requested a review from jashapiro September 9, 2025 19:44

jashapiro reviewed Sep 10, 2025

View reviewed changes

allyhawkins and others added 6 commits September 10, 2025 11:03

Apply suggestions from code review

b8f76f1

Co-authored-by: Joshua Shapiro <josh.shapiro@ccdatalab.org>

Merge branch 'main' into allyhawkins/scimilarity-to-consensus

1b56967

move renaming and use existing_celltype

5b532b2

no empty file

fe7b0ca

check if scimilarity is empty

bd8b456

comment out for more testing

e6057a4

jashapiro reviewed Sep 10, 2025

View reviewed changes

modules/cell-type-consensus/resources/usr/bin/assign-consensus-celltypes.R Show resolved Hide resolved

allyhawkins and others added 6 commits September 10, 2025 11:43

use the correct column prefix

ee17e68

Co-authored-by: Joshua Shapiro <josh.shapiro@ccdatalab.org>

fix typo

d18f267

filter out any samples without a processed file

c91fabf

make sure scimilarity files is a string

06884bf

make sure to grab celltype columns and use correct prefix

4324b58

make sure things are saved to the processed file

dc89d98

Revert "comment out for more testing"

f118c8e

This reverts commit e6057a4.

allyhawkins requested a review from jashapiro September 11, 2025 13:58

make sure output file names match in stub

c14e5b0

allyhawkins mentioned this pull request Sep 11, 2025

Clean up assigning consensus cell type script AlexsLemonade/OpenScPCA-analysis#1337

Merged

jashapiro reviewed Sep 12, 2025

View reviewed changes

allyhawkins requested a review from jashapiro September 12, 2025 17:09

jashapiro approved these changes Sep 12, 2025

View reviewed changes

Apply suggestions from code review

4b69a8b

Co-authored-by: Joshua Shapiro <josh.shapiro@ccdatalab.org>

allyhawkins merged commit ecba127 into main Sep 12, 2025
3 checks passed

allyhawkins deleted the allyhawkins/scimilarity-to-consensus branch September 12, 2025 17:44

		# by default use the lca between cellassign and singler as the consensus cell type
		consensus_column_prefix <- "cellassign_singler_pair"

Conversation

allyhawkins commented Sep 9, 2025

Uh oh!

allyhawkins commented Sep 9, 2025

Uh oh!

jashapiro left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

allyhawkins commented Sep 10, 2025

Uh oh!

Uh oh!

allyhawkins commented Sep 11, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

allyhawkins Sep 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jashapiro left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

allyhawkins Sep 12, 2025 •

edited

Loading