-
Notifications
You must be signed in to change notification settings - Fork 0
Module for exporting openscpca annotations #172
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 20 commits
Commits
Show all changes
23 commits
Select commit
Hold shift + click to select a range
6586dd2
process and script for exporting json
allyhawkins e237f49
add annotations bucket
allyhawkins b770305
format ewing output
allyhawkins f1dbd73
include exporting in main workflow
allyhawkins ec1c0ff
add stub to export process
allyhawkins c66365c
temp comment out other steps
allyhawkins ae540fe
date not data
allyhawkins 2ea6a74
make sure column names are strings
allyhawkins fde8ed9
add readme
allyhawkins a7b9b95
add some messages for debugging
allyhawkins 090d8da
directly export file
allyhawkins 8c7c19a
add annotations bucket to schema
allyhawkins 091d130
simplify vector to remove nested lists
allyhawkins 7e2adda
Apply suggestions from code review
allyhawkins a917201
Merge remote-tracking branch 'origin/main' into allyhawkins/export-op…
allyhawkins e738c6f
update comments to use annotation metadata
allyhawkins 4dba656
fix tuple setup for metadata
allyhawkins 37521df
set stub annotations output
allyhawkins 8d2bd16
uncomment out other modules
allyhawkins 499ae90
update comment
allyhawkins 8a33498
Update comment
allyhawkins 9fb81fa
Merge remote-tracking branch 'origin/main' into allyhawkins/export-op…
allyhawkins e8c94e8
remove duplicate infercnv module
allyhawkins File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,13 @@ | ||
| This module exports annotations from cell type modules in a uniform format to a public s3 bucket for use in other applications. | ||
| Annotations can be found in `s3://openscpca-celltype-annotations-public-access`. | ||
|
|
||
| For each library, a JSON file is exported with the following information: | ||
|
|
||
| | | | | ||
| | -- | -- | | ||
| | `barcodes` | An array of unique cell barcodes | | ||
| | `openscpca_celltype_annotation` | An array of cell type annotations assigned in `OpenScPCA-nf` | | ||
| | `openscpca_celltype_ontology` | An array of Cell Ontology identifiers associated with the cell type annotation. If no Cell Ontology identifiers are assigned, this will be `NA` | | ||
| | `module_name` | Name of the original analysis module used to assign cell type annotations in `OpenScPCA-analysis` | | ||
| | `openscpca_nf_version` | Version of `OpenScPCA-nf` | | ||
| | `release_date` | Release date of input ScPCA data | |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,55 @@ | ||
| #!/usr/bin/env nextflow | ||
|
|
||
| // Workflow to format and export openscpca annotations | ||
|
|
||
| process format_annotations { | ||
| container params.scpcatools_slim_container | ||
| tag "${sample_id}" | ||
| label 'mem_8' | ||
| publishDir "${params.annotations_bucket}/${params.release_prefix}/${project_id}/${sample_id}", mode: 'copy' | ||
| input: | ||
| tuple val(sample_id), | ||
| val(project_id), | ||
| path(annotations_tsv_files), | ||
| val(annotation_metadata) | ||
| output: | ||
| tuple val(sample_id), | ||
| val(project_id), | ||
| path("*_openscpca-annotations.json") | ||
| script: | ||
| library_ids = annotations_tsv_files.collect{(it.name =~ /SCPCL\d{6}/)[0]} | ||
| """ | ||
| for library_id in ${library_ids.join(" ")};do | ||
| # get the input files for the library id | ||
| annotations_file=\$(ls ${annotations_tsv_files} | grep "\${library_id}") | ||
|
|
||
| export-celltype-json.R \ | ||
| --annotations_tsv_file \$annotations_file \ | ||
| --annotation_column "${annotation_metadata.annotation_column}" \ | ||
| ${annotation_metadata.ontology_column ? "--ontology_column '${annotation_metadata.ontology_column}'" : ''} \ | ||
| --module_name ${annotation_metadata.module_name} \ | ||
| --release_date ${params.release_prefix} \ | ||
| --openscpca_nf_version ${workflow.manifest.version} \ | ||
| --output_json_file \${library_id}_openscpca-annotations.json | ||
| done | ||
| """ | ||
|
|
||
| stub: | ||
| library_ids = annotations_tsv_files.collect{(it.name =~ /SCPCL\d{6}/)[0]} | ||
| """ | ||
| for library_id in ${library_ids.join(" ")};do | ||
| touch \${library_id}_openscpca-annotations.json | ||
| done | ||
| """ | ||
| } | ||
|
|
||
| workflow export_annotations { | ||
| take: | ||
| celltype_ch // [sample_id, project_id, [cell type assignment files], annotation metadata] | ||
| main: | ||
| // export json | ||
| format_annotations(celltype_ch) | ||
|
|
||
| emit: | ||
| format_annotations.out // [sample id, project id, annotations json] | ||
| } |
99 changes: 99 additions & 0 deletions
99
modules/export-annotations/resources/usr/bin/export-celltype-json.R
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,99 @@ | ||
| #!/usr/bin/env Rscript | ||
|
|
||
| # This script is used to create a JSON file of annotations for a single library | ||
| # JSON file will include barcodes, annotation column, ontology column (if provided), | ||
| # openscpca-nf version, data release data, and module name | ||
|
|
||
| library(optparse) | ||
|
|
||
| option_list <- list( | ||
| make_option( | ||
| opt_str = c("--annotations_tsv_file"), | ||
| type = "character", | ||
| help = "Path to TSV file with cell type annotations" | ||
| ), | ||
| make_option( | ||
| opt_str = c("--annotation_column"), | ||
| type = "character", | ||
| help = "Name of the column containing the cell type annotations to use for openscpca_celltype_annotation" | ||
| ), | ||
| make_option( | ||
| opt_str = c("--ontology_column"), | ||
| default = "", | ||
| type = "character", | ||
| help = "Name of the column containing the cell type ontology IDs to use for openscpca_celltype_ontology" | ||
| ), | ||
| make_option( | ||
| opt_str = c("--module_name"), | ||
| type = "character", | ||
| help = "Name of original module in OpenScPCA-analysis" | ||
| ), | ||
| make_option( | ||
| opt_str = c("--release_date"), | ||
| type = "character", | ||
| help = "Release date of data used when generating annotations" | ||
| ), | ||
| make_option( | ||
| opt_str = c("--openscpca_nf_version"), | ||
| type = "character", | ||
| help = "Version of OpenScPCA-nf workflow" | ||
| ), | ||
| make_option( | ||
| opt_str = "--output_json_file", | ||
| type = "character", | ||
| help = "Path to JSON file to save cell type annotations" | ||
| ) | ||
| ) | ||
|
|
||
| # Parse options | ||
| opt <- parse_args(OptionParser(option_list = option_list)) | ||
|
|
||
| # Set up ----------------------------------------------------------------------- | ||
|
|
||
| # make sure input/output exist | ||
| stopifnot( | ||
| "annotations TSV file does not exist" = file.exists(opt$annotations_tsv_file), | ||
| "annotation column must be provided" = !is.null(opt$annotation_column), | ||
| "module name must be provided" = !is.null(opt$module_name), | ||
| "release date must be provided" = !is.null(opt$release_date), | ||
| "openscpca-nf version must be provided" = !is.null(opt$openscpca_nf_version), | ||
| "output json file must end in .json" = stringr::str_ends(opt$output_json_file, "\\.json") | ||
| ) | ||
|
|
||
| # read in annotations | ||
| annotations_df <- readr::read_tsv(opt$annotations_tsv_file) | ||
|
|
||
| # check that barcodes and annotation column exist | ||
| stopifnot( | ||
| "barcodes column must be present in provided TSV file" = "barcodes" %in% colnames(annotations_df), | ||
| "annotation column is not present in provided TSV file" = opt$annotation_column %in% colnames(annotations_df) | ||
| ) | ||
|
|
||
| # check for ontology ids if provided | ||
| if (!is.null(opt$ontology_column)) { | ||
| stopifnot( | ||
| "ontology column is not present in provided TSV file" = opt$ontology_column %in% colnames(annotations_df) | ||
| ) | ||
| ontology_ids <- annotations_df[[opt$ontology_column]] | ||
| } else { | ||
| ontology_ids <- NA | ||
| } | ||
|
|
||
| # build json contents | ||
| json_contents <- list( | ||
| module_name = opt$module_name, | ||
| openscpca_nf_version = opt$openscpca_nf_version, | ||
| release_date = opt$release_date, | ||
| barcodes = annotations_df$barcodes, | ||
| openscpca_celltype_annotation = annotations_df[[opt$annotation_column]], | ||
| openscpca_celltype_ontology = ontology_ids | ||
| ) | ||
|
|
||
| # export json file | ||
| jsonlite::write_json( | ||
| json_contents, | ||
| path = opt$output_json_file, | ||
| simplifyVector = TRUE, | ||
| auto_unbox = TRUE, | ||
| pretty = TRUE | ||
| ) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.