Merge pull request #916 from drpatelh/fixes

drpatelh · web-flow · commit 763537a58d87 · 2022-12-20T17:24:37.000Z
Bump pipeline version to v3.10 and other small tweaks
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -3,7 +3,7 @@
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/)
 and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
 
-## v3.10dev - [date]
+## [[3.10](https://github.com/nf-core/rnaseq/releases/tag/3.10)] - 2022-12-21
 
 ### Enhancements & fixes
 
@@ -18,6 +18,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 - [[#900](https://github.com/nf-core/rnaseq/issues/900)] - Add `--recursive` option to `fastq_dir_to_samplesheet.py` script
 - [[#902](https://github.com/nf-core/rnaseq/issues/902)] - `check_samplesheet.py` script doesn't output optional columns in samplesheet
 - [[#907](https://github.com/nf-core/rnaseq/issues/907)] - Add `--extra_star_align_args` and `--extra_salmon_quant_args` parameter
+- [[#912](https://github.com/nf-core/rnaseq/issues/912)] - Add UMI deduplication before quantification in tube map
 
 ### Parameters
 
@@ -31,6 +32,24 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 > **NB:** Parameter has been **added** if just the new parameter information is present.
 > **NB:** Parameter has been **removed** if new parameter information isn't present.
 
+### Software dependencies
+
+Note, since the pipeline is now using Nextflow DSL2, each process will be run with its own [Biocontainer](https://biocontainers.pro/#/registry). This means that on occasion it is entirely possible for the pipeline to be using different versions of the same tool. However, the overall software dependency changes compared to the last release have been listed below for reference.
+
+| Dependency                          | Old version | New version |
+| ----------------------------------- | ----------- | ----------- |
+| `bbmap`                             | 38.93       | 39.01       |
+| `bioconductor-dupradar`             | 1.18.0      | 1.28.0      |
+| `bioconductor-summarizedexperiment` | 1.20.0      | 1.24.0      |
+| `bioconductor-tximeta`              | 1.8.0       | 1.12.0      |
+| `fq`                                | 0.9.1       |             |
+| `salmon`                            | 1.5.2       | 1.9.0       |
+| `samtools`                          | 1.15.1      | 1.16.1      |
+
+> **NB:** Dependency has been **updated** if both old and new version information is present.
+> **NB:** Dependency has been **added** if just the new version information is present.
+> **NB:** Dependency has been **removed** if version information isn't present.
+
 ## [[3.9](https://github.com/nf-core/rnaseq/releases/tag/3.9)] - 2022-09-30
 
 ### Enhancements & fixes
diff --git a/README.md b/README.md
@@ -32,28 +32,29 @@ You can find numerous talks on the [nf-core events page](https://nf-co.re/events
 > The SRA download functionality has been removed from the pipeline (`>=3.2`) and ported to an independent workflow called [nf-core/fetchngs](https://nf-co.re/fetchngs). You can provide `--nf_core_pipeline rnaseq` when running nf-core/fetchngs to download and auto-create a samplesheet containing publicly available samples that can be accepted directly as input by this pipeline.
 
 1. Merge re-sequenced FastQ files ([`cat`](http://www.linfo.org/cat.html))
-2. Read QC ([`FastQC`](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/))
-3. UMI extraction ([`UMI-tools`](https://github.com/CGATOxford/UMI-tools))
-4. Adapter and quality trimming ([`Trim Galore!`](https://www.bioinformatics.babraham.ac.uk/projects/trim_galore/))
-5. Removal of genome contaminants ([`BBSplit`](http://seqanswers.com/forums/showthread.php?t=41288))
-6. Removal of ribosomal RNA ([`SortMeRNA`](https://github.com/biocore/sortmerna))
-7. Choice of multiple alignment and quantification routes:
+2. Sub-sample FastQ files and auto-infer strandedness ([`fq`](https://github.com/stjude-rust-labs/fq), [`Salmon`](https://combine-lab.github.io/salmon/))
+3. Read QC ([`FastQC`](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/))
+4. UMI extraction ([`UMI-tools`](https://github.com/CGATOxford/UMI-tools))
+5. Adapter and quality trimming ([`Trim Galore!`](https://www.bioinformatics.babraham.ac.uk/projects/trim_galore/))
+6. Removal of genome contaminants ([`BBSplit`](http://seqanswers.com/forums/showthread.php?t=41288))
+7. Removal of ribosomal RNA ([`SortMeRNA`](https://github.com/biocore/sortmerna))
+8. Choice of multiple alignment and quantification routes:
    1. [`STAR`](https://github.com/alexdobin/STAR) -> [`Salmon`](https://combine-lab.github.io/salmon/)
    2. [`STAR`](https://github.com/alexdobin/STAR) -> [`RSEM`](https://github.com/deweylab/RSEM)
    3. [`HiSAT2`](https://ccb.jhu.edu/software/hisat2/index.shtml) -> **NO QUANTIFICATION**
-8. Sort and index alignments ([`SAMtools`](https://sourceforge.net/projects/samtools/files/samtools/))
-9. UMI-based deduplication ([`UMI-tools`](https://github.com/CGATOxford/UMI-tools))
-10. Duplicate read marking ([`picard MarkDuplicates`](https://broadinstitute.github.io/picard/))
-11. Transcript assembly and quantification ([`StringTie`](https://ccb.jhu.edu/software/stringtie/))
-12. Create bigWig coverage files ([`BEDTools`](https://github.com/arq5x/bedtools2/), [`bedGraphToBigWig`](http://hgdownload.soe.ucsc.edu/admin/exe/))
-13. Extensive quality control:
+9. Sort and index alignments ([`SAMtools`](https://sourceforge.net/projects/samtools/files/samtools/))
+10. UMI-based deduplication ([`UMI-tools`](https://github.com/CGATOxford/UMI-tools))
+11. Duplicate read marking ([`picard MarkDuplicates`](https://broadinstitute.github.io/picard/))
+12. Transcript assembly and quantification ([`StringTie`](https://ccb.jhu.edu/software/stringtie/))
+13. Create bigWig coverage files ([`BEDTools`](https://github.com/arq5x/bedtools2/), [`bedGraphToBigWig`](http://hgdownload.soe.ucsc.edu/admin/exe/))
+14. Extensive quality control:
     1. [`RSeQC`](http://rseqc.sourceforge.net/)
     2. [`Qualimap`](http://qualimap.bioinfo.cipf.es/)
     3. [`dupRadar`](https://bioconductor.org/packages/release/bioc/html/dupRadar.html)
     4. [`Preseq`](http://smithlabresearch.org/software/preseq/)
     5. [`DESeq2`](https://bioconductor.org/packages/release/bioc/html/DESeq2.html)
-14. Pseudo-alignment and quantification ([`Salmon`](https://combine-lab.github.io/salmon/); _optional_)
-15. Present QC for raw read, alignment, gene biotype, sample similarity, and strand-specificity checks ([`MultiQC`](http://multiqc.info/), [`R`](https://www.r-project.org/))
+15. Pseudo-alignment and quantification ([`Salmon`](https://combine-lab.github.io/salmon/); _optional_)
+16. Present QC for raw read, alignment, gene biotype, sample similarity, and strand-specificity checks ([`MultiQC`](http://multiqc.info/), [`R`](https://www.r-project.org/))
 
 > **Warning**
 > Quantification isn't performed if using `--aligner hisat2` due to the lack of an appropriate option to calculate accurate expression estimates from HISAT2 derived genomic alignments. However, you can use this route if you have a preference for the alignment, QC and other types of downstream analysis compatible with the output of HISAT2.
@@ -103,6 +104,10 @@ The pipeline was re-written in Nextflow DSL2 and is primarily maintained by Hars
 The pipeline workflow diagram was designed by Sarah Guinchard ([@G-Sarah](https://github.com/G-Sarah)) and James Fellows Yates ([@jfy133](https://github.com/jfy133)).
 
 Many thanks to other who have helped out along the way too, including (but not limited to):
+[@MatthiasZepper](https://github.com/MatthiasZepper),
+[@Emiller88](https://github.com/Emiller88),
+[@maxulysse](https://github.com/maxulysse),
+[@robsyme](https://github.com/robsyme),
 [@Galithil](https://github.com/Galithil),
 [@pditommaso](https://github.com/pditommaso),
 [@orzechoj](https://github.com/orzechoj),
diff --git a/conf/modules.config b/conf/modules.config
@@ -613,7 +613,7 @@ if (!params.skip_alignment && params.aligner == 'star_salmon') {
                 ]
             }
 
-            withName: 'NFCORE_RNASEQ:RNASEQ:UMITOOLS_PREPAREFORRSEM' {
+            withName: 'NFCORE_RNASEQ:RNASEQ:UMITOOLS_PREPAREFORSALMON' {
                 ext.prefix = { "${meta.id}.umi_dedup.transcriptome.filtered" }
                 publishDir = [
                     path: { "${params.outdir}/${params.aligner}/umitools/log" },
diff --git a/nextflow.config b/nextflow.config
@@ -256,7 +256,7 @@ manifest {
     description     = """RNA sequencing analysis pipeline for gene/isoform quantification and extensive quality control."""
     mainScript      = 'main.nf'
     nextflowVersion = '!>=22.10.1'
-    version         = '3.10dev'
+    version         = '3.10'
     doi             = 'https://doi.org/10.5281/zenodo.1400710'
 }
 
diff --git a/workflows/rnaseq.nf b/workflows/rnaseq.nf
@@ -92,14 +92,14 @@ ch_biotypes_header_multiqc   = file("$projectDir/assets/multiqc/biotypes_header.
 //
 // MODULE: Loaded from modules/local/
 //
-include { UMITOOLS_PREPAREFORRSEM            } from '../modules/local/umitools_prepareforrsem.nf'
 include { BEDTOOLS_GENOMECOV                 } from '../modules/local/bedtools_genomecov'
 include { DESEQ2_QC as DESEQ2_QC_STAR_SALMON } from '../modules/local/deseq2_qc'
 include { DESEQ2_QC as DESEQ2_QC_RSEM        } from '../modules/local/deseq2_qc'
 include { DESEQ2_QC as DESEQ2_QC_SALMON      } from '../modules/local/deseq2_qc'
 include { DUPRADAR                           } from '../modules/local/dupradar'
 include { MULTIQC                            } from '../modules/local/multiqc'
 include { MULTIQC_CUSTOM_BIOTYPE             } from '../modules/local/multiqc_custom_biotype'
+include { UMITOOLS_PREPAREFORRSEM as UMITOOLS_PREPAREFORSALMON } from '../modules/local/umitools_prepareforrsem.nf'
 include { MULTIQC_TSV_FROM_LIST as MULTIQC_TSV_FAIL_MAPPED  } from '../modules/local/multiqc_tsv_from_list'
 include { MULTIQC_TSV_FROM_LIST as MULTIQC_TSV_FAIL_TRIMMED } from '../modules/local/multiqc_tsv_from_list'
 include { MULTIQC_TSV_FROM_LIST as MULTIQC_TSV_STRAND_CHECK } from '../modules/local/multiqc_tsv_from_list'
@@ -421,14 +421,14 @@ workflow RNASEQ {
 
             // Fix paired-end reads in name sorted BAM file
             // See: https://github.com/nf-core/rnaseq/issues/828
-            UMITOOLS_PREPAREFORRSEM (
+            UMITOOLS_PREPAREFORSALMON (
                 ch_umitools_dedup_bam.paired_end
             )
-            ch_versions = ch_versions.mix(UMITOOLS_PREPAREFORRSEM.out.versions.first())
+            ch_versions = ch_versions.mix(UMITOOLS_PREPAREFORSALMON.out.versions.first())
 
             ch_umitools_dedup_bam
                 .single_end
-                .mix(UMITOOLS_PREPAREFORRSEM.out.bam)
+                .mix(UMITOOLS_PREPAREFORSALMON.out.bam)
                 .set { ch_transcriptome_bam }
         }
 

Original file line number	Diff line number	Diff line change
`@@ -613,7 +613,7 @@ if (!params.skip_alignment && params.aligner == 'star_salmon') {`
`613`	`613`	`]`
`614`	`614`	`}`
`615`	`615`
`616`		`- withName: 'NFCORE_RNASEQ:RNASEQ:UMITOOLS_PREPAREFORRSEM' {`
	`616`	`+ withName: 'NFCORE_RNASEQ:RNASEQ:UMITOOLS_PREPAREFORSALMON' {`
`617`	`617`	`ext.prefix = { "${meta.id}.umi_dedup.transcriptome.filtered" }`
`618`	`618`	`publishDir = [`
`619`	`619`	`path: { "${params.outdir}/${params.aligner}/umitools/log" },`
Original file line number	Diff line number	Diff line change
`@@ -256,7 +256,7 @@ manifest {`
`256`	`256`	`description = """RNA sequencing analysis pipeline for gene/isoform quantification and extensive quality control."""`
`257`	`257`	`mainScript = 'main.nf'`
`258`	`258`	`nextflowVersion = '!>=22.10.1'`
`259`		`- version = '3.10dev'`
	`259`	`+ version = '3.10'`
`260`	`260`	`doi = 'https://doi.org/10.5281/zenodo.1400710'`
`261`	`261`	`}`
`262`	`262`