Merge pull request #1037 from drpatelh/updates

drpatelh · web-flow · commit cd0a95bc4488 · 2023-05-31T18:42:34.000+01:00
Fix #1018
diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
@@ -239,7 +239,8 @@ jobs:
     strategy:
       matrix:
         parameters:
-          - "--skip_qc --skip_alignment"
+          - "--skip_qc"
+          - "--skip_alignment --skip_pseudo_alignment"
           - "--salmon_index false --transcript_fasta false"
     steps:
       - name: Check out pipeline code
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -8,9 +8,20 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 ### Enhancements & fixes
 
 - [[#1011](https://github.com/nf-core/rnaseq/issues/1011)] - FastQ files from UMI-tools not being passed to fastp
+- [[#1018](https://github.com/nf-core/rnaseq/issues/1018)] - Ability to skip both alignment and pseudo-alignment to only run pre-processing QC steps.
 - [PR #1016](https://github.com/nf-core/rnaseq/pull/1016) - Updated pipeline template to [nf-core/tools 2.8](https://github.com/nf-core/tools/releases/tag/2.8)
 - [PR #1025](https://github.com/nf-core/fetchngs/pull/1025) - Add `public_aws_ecr.config` to source mulled containers when using `public.ecr.aws` Docker Biocontainer registry
 
+### Parameters
+
+| Old parameter | New parameter             |
+| ------------- | ------------------------- |
+|               | `--skip_pseudo_alignment` |
+
+> **NB:** Parameter has been **updated** if both old and new parameter information is present.
+> **NB:** Parameter has been **added** if just the new parameter information is present.
+> **NB:** Parameter has been **removed** if new parameter information isn't present.
+
 ### Software dependencies
 
 | Dependency | Old version | New version |
diff --git a/conf/modules.config b/conf/modules.config
@@ -1137,7 +1137,7 @@ if (!params.skip_multiqc) {
 // Salmon pseudo-alignment options
 //
 
-if (params.pseudo_aligner == 'salmon') {
+if (!params.skip_pseudo_alignment && params.pseudo_aligner == 'salmon') {
     process {
         withName: '.*:QUANTIFY_SALMON:SALMON_QUANT' {
             ext.args   = params.extra_salmon_quant_args ?: ''
diff --git a/docs/usage.md b/docs/usage.md
@@ -71,6 +71,8 @@ When running Salmon in mapping-based mode via `--pseudo_aligner salmon` the enti
 
 Two additional parameters `--extra_star_align_args` and `--extra_salmon_quant_args` were added in v3.10 of the pipeline that allow you to append any custom parameters to the STAR align and Salmon quant commands, respectively. Note, the `--seqBias` and `--gcBias` are not provided to Salmon quant by default so you can provide these via `--extra_salmon_quant_args '--seqBias --gcBias'` if required.
 
+> **NB:** You can use `--skip_alignment --skip_pseudo_alignment` if you only want to run the pre-processing QC steps in the pipeline like FastQ, trimming etc. This will skip alignment, pseudo-alignment and any post-alignment processing steps.
+
 ## Quantification options
 
 The current options align with STAR and quantify using either Salmon (`--aligner star_salmon`) / RSEM (`--aligner star_rsem`). You also have the option to pseudo-align and quantify your data with Salmon by providing the `--pseudo_aligner salmon` parameter.
@@ -133,7 +135,7 @@ If unique molecular identifiers were used to prepare the library, add the follow
 
 Please refer to the [nf-core website](https://nf-co.re/usage/reference_genomes) for general usage docs and guidelines regarding reference genomes.
 
-The minimum reference genome requirements for this pipeline are a FASTA and GTF file, all other files required to run the pipeline can be generated from these files. However, it is more storage and compute friendly if you are able to re-use reference genome files as efficiently as possible. It is recommended to use the `--save_reference` parameter if you are using the pipeline to build new indices (e.g. custom genomes that are unavailable on [AWS iGenomes](https://nf-co.re/usage/reference_genomes#custom-genomes)) so that you can save them somewhere locally. The index building step can be quite a time-consuming process and it permits their reuse for future runs of the pipeline to save disk space. You can then either provide the appropriate reference genome files on the command-line via the appropriate parameters (e.g. `--star_index '/path/to/STAR/index/'`) or via a custom config file.
+The minimum reference genome requirements for this pipeline are a FASTA and GTF file, all other files required to run the pipeline can be generated from these files. However, it is more storage and compute friendly if you are able to re-use reference genome files as efficiently as possible. It is recommended to use the `--save_reference` parameter if you are using the pipeline to build new indices (e.g. custom genomes that are unavailable on [AWS iGenomes](https://nf-co.re/usage/reference_genomes#custom-genomes)) so that you can save them somewhere locally. The index building step can be quite a time-consuming process and it permits their reuse for future runs of the pipeline to save disk space. You can then either provide the appropriate reference genome files on the command-line via the appropriate parameters (e.g. `--star_index '/path/to/STAR/index/'`) or via a custom config file. Another option is to run the pipeline once with `--save_reference --skip_alignment --skip_pseudo_alignment` to generate and save all of the required reference files and indices to the results directory. You can then move the reference files in `<RESULTS_DIR>/genome/` to a more permanent location and use these paths to override the relevant parameters in the pipeline e.g. `--star_index`.
 
 - If `--genome` is provided then the FASTA and GTF files (and existing indices) will be automatically obtained from AWS-iGenomes unless these have already been downloaded locally in the path specified by `--igenomes_base`.
 - If `--gff` is provided as input then this will be converted to a GTF file, or the latter will be used if both are provided.
diff --git a/lib/WorkflowRnaseq.groovy b/lib/WorkflowRnaseq.groovy
@@ -65,13 +65,10 @@ class WorkflowRnaseq {
                 Nextflow.error("Invalid option: '${params.aligner}'. Valid options for '--aligner': ${valid_params['aligners'].join(', ')}.")
             }
         } else {
-            if (!params.pseudo_aligner) {
-                Nextflow.error("--skip_alignment specified without --pseudo_aligner...please specify e.g. --pseudo_aligner ${valid_params['pseudoaligners'][0]}.")
-            }
             skipAlignmentWarn(log)
         }
 
-        if (params.pseudo_aligner) {
+        if (!params.skip_pseudo_alignment) {
             if (!valid_params['pseudoaligners'].contains(params.pseudo_aligner)) {
                 Nextflow.error("Invalid option: '${params.pseudo_aligner}'. Valid options for '--pseudo_aligner': ${valid_params['pseudoaligners'].join(', ')}.")
             } else {
diff --git a/nextflow.config b/nextflow.config
@@ -71,6 +71,7 @@ params {
     save_align_intermeds       = false
     skip_markduplicates        = false
     skip_alignment             = false
+    skip_pseudo_alignment      = false
 
     // QC
     skip_qc                    = false
diff --git a/nextflow_schema.json b/nextflow_schema.json
@@ -446,6 +446,11 @@
                     "type": "boolean",
                     "fa_icon": "fas fa-fast-forward",
                     "description": "Skip all of the alignment-based processes within the pipeline."
+                },
+                "skip_pseudo_alignment": {
+                    "type": "boolean",
+                    "fa_icon": "fas fa-fast-forward",
+                    "description": "Skip all of the pseudo-alignment-based processes within the pipeline."
                 }
             }
         },
diff --git a/workflows/rnaseq.nf b/workflows/rnaseq.nf
@@ -43,9 +43,9 @@ if (!params.skip_bbsplit && !params.bbsplit_index && params.bbsplit_fasta_list)
 
 // Check alignment parameters
 def prepareToolIndices  = []
-if (!params.skip_bbsplit)   { prepareToolIndices << 'bbsplit'             }
-if (!params.skip_alignment) { prepareToolIndices << params.aligner        }
-if (params.pseudo_aligner)  { prepareToolIndices << params.pseudo_aligner }
+if (!params.skip_bbsplit) { prepareToolIndices << 'bbsplit' }
+if (!params.skip_alignment) { prepareToolIndices << params.aligner }
+if (!params.skip_pseudo_alignment) { prepareToolIndices << params.pseudo_aligner }
 
 // Get RSeqC modules to run
 def rseqc_modules = params.rseqc_modules ? params.rseqc_modules.split(',').collect{ it.trim().toLowerCase() } : []
@@ -799,7 +799,7 @@ workflow RNASEQ {
     ch_salmon_multiqc                   = Channel.empty()
     ch_pseudoaligner_pca_multiqc        = Channel.empty()
     ch_pseudoaligner_clustering_multiqc = Channel.empty()
-    if (params.pseudo_aligner == 'salmon') {
+    if (!params.skip_pseudo_alignment && params.pseudo_aligner == 'salmon') {
         QUANTIFY_SALMON (
             ch_filtered_reads,
             PREPARE_GENOME.out.salmon_index,

Original file line number	Diff line number	Diff line change
`@@ -65,13 +65,10 @@ class WorkflowRnaseq {`
`65`	`65`	`Nextflow.error("Invalid option: '${params.aligner}'. Valid options for '--aligner': ${valid_params['aligners'].join(', ')}.")`
`66`	`66`	`}`
`67`	`67`	`} else {`
`68`		`- if (!params.pseudo_aligner) {`
`69`		`- Nextflow.error("--skip_alignment specified without --pseudo_aligner...please specify e.g. --pseudo_aligner ${valid_params['pseudoaligners'][0]}.")`
`70`		`- }`
`71`	`68`	`skipAlignmentWarn(log)`
`72`	`69`	`}`
`73`	`70`
`74`		`- if (params.pseudo_aligner) {`
	`71`	`+ if (!params.skip_pseudo_alignment) {`
`75`	`72`	`if (!valid_params['pseudoaligners'].contains(params.pseudo_aligner)) {`
`76`	`73`	`Nextflow.error("Invalid option: '${params.pseudo_aligner}'. Valid options for '--pseudo_aligner': ${valid_params['pseudoaligners'].join(', ')}.")`
`77`	`74`	`} else {`
Original file line number	Diff line number	Diff line change
`@@ -446,6 +446,11 @@`
`446`	`446`	`"type": "boolean",`
`447`	`447`	`"fa_icon": "fas fa-fast-forward",`
`448`	`448`	`"description": "Skip all of the alignment-based processes within the pipeline."`
	`449`	`+ },`
	`450`	`+ "skip_pseudo_alignment": {`
	`451`	`+ "type": "boolean",`
	`452`	`+ "fa_icon": "fas fa-fast-forward",`
	`453`	`+ "description": "Skip all of the pseudo-alignment-based processes within the pipeline."`
`449`	`454`	`}`
`450`	`455`	`}`
`451`	`456`	`},`