Skip to content

Add process/script to classify with singler#178

Merged
sjspielman merged 24 commits intomainfrom
sjspielman/166-classify-singler
Sep 2, 2025
Merged

Add process/script to classify with singler#178
sjspielman merged 24 commits intomainfrom
sjspielman/166-classify-singler

Conversation

@sjspielman
Copy link
Member

Stacked on #176
Closes #166

This PR adds the R script and nextflow process to classify samples with SingleR. While working on this, I realized we actually haven't been as comprehensively testing as we maybe want to be (#177), so I'm opening this PR with SCPCP000004 added as project to run through in the stub profile so we can ensure this really is passing CI. I will remove that once this passes to request review.

Base automatically changed from sjspielman/167-train-scanvi to main August 28, 2025 18:30
@sjspielman
Copy link
Member Author

The stub workflow passed with SCPCP000004 included at this commit: ae98ecf. See run: https://github.com/AlexsLemonade/OpenScPCA-nf/actions/runs/17305059525/job/49125753586

So, I removed SCPCP000004 from the stub projects since it passed and can now request review.

@sjspielman sjspielman marked this pull request as ready for review August 28, 2025 18:52
@sjspielman sjspielman requested a review from allyhawkins August 28, 2025 18:52
Copy link
Member

@allyhawkins allyhawkins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM, but I will note that we should be testing this with actual data too and not just with the stub in CI. I might run this through with the simulated option for data just to be sure that the scripts also run as expected.

@sjspielman
Copy link
Member Author

sjspielman commented Aug 29, 2025

This LGTM, but I will note that we should be testing this with actual data too and not just with the stub in CI.

I confess I was waiting until the end for a big debug, but yeah, fair.... Spam incoming 🙃
edit - i'm going to try running locally first before running via triggering the GHA, so maybe no spam :)

@sjspielman
Copy link
Member Author

Coming along nicely with testing here, until I hit this roadblock EOD Friday at the classification step and ran away from it for the weekend! This error is...confusing:

Caused by:
  No such variable: file -- Check script '/home/ec2-user/.nextflow/assets/AlexsLemonade/OpenScPCA-nf/./modules/cell-type-neuroblastoma-04/main.nf' at line: 103


Source block:
  """
  for file in ${library_files}; do
    classify-singler.R \
      --sce_file ${file} \
      --singler_model_file ${singler_model} \
      --singler_output_tsv \$(basename \${file%.rds}_singler.tsv.gz) \
      --threads ${task.cpus}
  done
  """

The error is complaining about the file loop variable. I did a bunch of local testing to see what might be going on here, and the library files don't seem to be staged. Only if I do something like echo $library_files (directly use the library_files variable in the script block) do the files get staged and no error is thrown. After much, much staring, I can't find any syntax differences between this and, e.g. doublet-detection or scimilarity module code, in terms of reading in the libraries_ch and looping over the files.

Anyways, I'm stumped! I've just submitted a batch run with scimilarity turned on (to confirm these files are being staged - they definitely are!) and with a publishDir for the classify process as well as an emit.. I don't think this will help, but I as mentioned, am stumped! https://cloud.seqera.io/orgs/CCDL/workspaces/OpenScPCA/watch/KkqiAAa4Dm3Pn/v2/tasks

Something else I want to mention, the train_singler process takes an unbelievable amount of memory - seemingly (?) way more memory than I experience when running it locally. It gets bumped to mem 48/64 to get a successful run. Something to keep tabs on?

@jashapiro
Copy link
Member

The error is complaining about the file loop variable

Since that is a bash variable, you need a backslash:

--sce_file \${file} \

@sjspielman
Copy link
Member Author

Copy link
Member

@allyhawkins allyhawkins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for testing!

@sjspielman sjspielman merged commit 5f8d269 into main Sep 2, 2025
3 checks passed
@sjspielman sjspielman deleted the sjspielman/166-classify-singler branch September 2, 2025 15:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add process and script to classify with SingleR

3 participants