Skip to content

Initiate module for cell-type-ewings#131

Merged
allyhawkins merged 23 commits intomainfrom
allyhawkins/ewing-cell-types-part1
Feb 28, 2025
Merged

Initiate module for cell-type-ewings#131
allyhawkins merged 23 commits intomainfrom
allyhawkins/ewing-cell-types-part1

Conversation

@allyhawkins
Copy link
Member

Towards #129

Here I am initiating the cell-type-ewings module and adding the first step, which is to run AUCell on every library using a set of gene sets from MSigDB and custom gene sets.

  • I copied over the script used to run AUCell on gene signatures in the cell-type-ewings module from OpenScPCA-analysis.
    • That script was written to be able to accommodate merged objects, which we aren't going to be doing here, so I removed anything pertaining to merged. When actually doing the cell type assignment in OpenScPCA-analysis, I did use the results from running AUCell on the merged object, but because it's rank based for each cell, we should be able to do the same thing on the individual objects, assuming we use the same thresholds.
    • We previously were determining the aucMaxRank using a percentage of detected genes, rather than providing a number. In the cell-type-ewings module we ended up using the aucMaxRank equivalent to 1% of the total number of detected genes in the merged object. So I calculated that and set that as the default here, which is 425.
    • This script used to just quit if there was not enough overlap with the gene sets in the individual SCE, but Nextflow doesn't like not having the output files, so I adjusted it to output a data frame with NA values.
  • In this script we run AUCell on a set of gene sets using both custom gene sets and gene sets from MSigDB.
    • For the MSigDB gene sets, we already were using a reference table with all the gene sets to use as the input, so I kept that as is.
    • For the custom gene sets, we previously were using an argument that specified a directory that contained gene sets as individual files. Instead of using a directory, I updated this argument to take in a comma separated list of files that contain custom gene sets. I then made this optional, just in case we ever wanted to use this script not with custom gene sets. Honestly, it was probably unnecessary, so I can remove that aspect if you don't like it.
    • I then added params to use the reference files with MSigDB gene sets and custom gene sets present in OpenScPCA-analysis, similar to how we use the consensus cell type reference files.
  • Finally, I added a workflow to run the module and updated config files with the new params.

I did test this and things were successful!

@allyhawkins
Copy link
Member Author

Hmmm is there a way to get it to skip git hooks here? This change 82ae052 breaks the code...

@jashapiro
Copy link
Member

Hmmm is there a way to get it to skip git hooks here? This change 82ae052 breaks the code...

Oh, that is annoying. What we really want to do here is to add a rule for typos that leaves that alone. I will file a PR to this branch with that change (and turning pre-commit back on).

Copy link
Member

@jashapiro jashapiro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The overall structure here looks good. I had some suggestions about formatting and output.

My other question is about inputs and next steps: is this module also going to use the consensus results? I don't think anything needs to happen with that now, but I was curious, thinking about how the next steps will go.


Links to specific original scripts used in this module:

- `01-aucell.R`: <https://github.com/AlexsLemonade/OpenScPCA-analysis/blob/main/analyses/cell-type-ewings/scripts/aucell-ews-signatures/01-aucell.R>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use a tag or permalink here, please.

},
"cell_type_ewings_msigdb_list": {
"type": "string",
"default": "https://raw.githubusercontent.com/AlexsLemonade/OpenScPCA-analysis/refs/tags/v0.2.2/analyses/cell-type-ewings/references/msigdb-gene-sets.tsv"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add all of the validataion info here and for the other files? I made an attempt at the description for this one, but please update it as needed.. A description for the max_rank would also be useful.

Suggested change
"default": "https://raw.githubusercontent.com/AlexsLemonade/OpenScPCA-analysis/refs/tags/v0.2.2/analyses/cell-type-ewings/references/msigdb-gene-sets.tsv"
"default": "https://raw.githubusercontent.com/AlexsLemonade/OpenScPCA-analysis/refs/tags/v0.2.2/analyses/cell-type-ewings/references/msigdb-gene-sets.tsv"
"pattern": "\\.tsv$",
"format": "file-path",
"mimetype": "text/tab-separated-values",
"description": "Table of MSigDB gene sets"

allyhawkins and others added 6 commits February 28, 2025 12:15
Co-authored-by: Joshua Shapiro <josh.shapiro@ccdatalab.org>
Co-authored-by: Joshua Shapiro <josh.shapiro@ccdatalab.org>
@allyhawkins
Copy link
Member Author

My other question is about inputs and next steps: is this module also going to use the consensus results? I don't think anything needs to happen with that now, but I was curious, thinking about how the next steps will go.

Yes the plan is to use the consensus cell types. I think the next step will be a process that takes as input the consensus cell types and the AUCell results and actually does the cell type assignment. That process will also take in a set of params that define the AUC thresholds to use for classifying cells. The output to that will be a TSV file with the cell type annotations.

I also updated the validation sections of the schema file, fixed some of the links, and then uncommented out the other modules so this should be ready for another look.

Copy link
Member

@jashapiro jashapiro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@allyhawkins allyhawkins merged commit 2ed53d9 into main Feb 28, 2025
3 checks passed
@allyhawkins allyhawkins deleted the allyhawkins/ewing-cell-types-part1 branch February 28, 2025 19:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants