Skip to content

Commit 9605b0f

Browse files
committed
Merge branch 'kallisto_quant' of github.com:nf-core/rnaseq into kallisto_quant
2 parents 5ea79ac + 2decfdc commit 9605b0f

File tree

11 files changed

+79
-61
lines changed

11 files changed

+79
-61
lines changed

CHANGELOG.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -10,10 +10,10 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
1010
Special thanks to the following for their contributions to the release:
1111

1212
- [Adam Talbot](https://github.com/adamrtalbot)
13+
- [Jonathan Manning](https://github.com/pinin4fjords)
1314
- [Júlia Mir Pedrol](https://github.com/mirpedrol)
1415
- [Matthias Zepper](https://github.com/MatthiasZepper)
1516
- [Maxime Garcia](https://github.com/maxulysse)
16-
- [Jonathan Manning](https://github.com/pinin4fjords)
1717

1818
Thank you to everyone else that has contributed by reporting bugs, enhancements or in any other way, shape or form.
1919

@@ -38,7 +38,7 @@ Thank you to everyone else that has contributed by reporting bugs, enhancements
3838
| Dependency | Old version | New version |
3939
| ----------------------- | ----------- | ----------- |
4040
| `fastqc` | 0.11.9 | 0.12.1 |
41-
| `multiqc` | 1.14 | 1.15 |
41+
| `multiqc` | 1.14 | 1.17 |
4242
| `ucsc-bedgraphtobigwig` | 377 | 445 |
4343

4444
> **NB:** Dependency has been **updated** if both old and new version information is present.
@@ -65,7 +65,7 @@ Thank you to everyone else that has contributed by reporting bugs, enhancements
6565
### Enhancements & fixes
6666

6767
- [[#1011](https://github.com/nf-core/rnaseq/issues/1011)] - FastQ files from UMI-tools not being passed to fastp
68-
- [[#1018](https://github.com/nf-core/rnaseq/issues/1018)] - Ability to skip both alignment and pseudo-alignment to only run pre-processing QC steps.
68+
- [[#1018](https://github.com/nf-core/rnaseq/issues/1018)] - Ability to skip both alignment and pseudoalignment to only run pre-processing QC steps.
6969
- [PR #1016](https://github.com/nf-core/rnaseq/pull/1016) - Updated pipeline template to [nf-core/tools 2.8](https://github.com/nf-core/tools/releases/tag/2.8)
7070
- [PR #1025](https://github.com/nf-core/fetchngs/pull/1025) - Add `public_aws_ecr.config` to source mulled containers when using `public.ecr.aws` Docker Biocontainer registry
7171
- [PR #1038](https://github.com/nf-core/rnaseq/pull/1038) - Updated error log for count values when supplying `--additional_fasta`
@@ -813,7 +813,7 @@ Major novel changes include:
813813
- Added options to skip several steps
814814
- Skip trimming using `--skipTrimming`
815815
- Skip BiotypeQC using `--skipBiotypeQC`
816-
- Skip Alignment using `--skipAlignment` to only use pseudo-alignment using Salmon
816+
- Skip Alignment using `--skipAlignment` to only use pseudoalignment using Salmon
817817

818818
### Documentation updates
819819

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -39,7 +39,7 @@
3939
3. [`dupRadar`](https://bioconductor.org/packages/release/bioc/html/dupRadar.html)
4040
4. [`Preseq`](http://smithlabresearch.org/software/preseq/)
4141
5. [`DESeq2`](https://bioconductor.org/packages/release/bioc/html/DESeq2.html)
42-
15. Pseudo-alignment and quantification ([`Salmon`](https://combine-lab.github.io/salmon/) or ['Kallisto'](https://pachterlab.github.io/kallisto/); _optional_)
42+
15. Pseudoalignment and quantification ([`Salmon`](https://combine-lab.github.io/salmon/) or ['Kallisto'](https://pachterlab.github.io/kallisto/); _optional_)
4343
16. Present QC for raw read, alignment, gene biotype, sample similarity, and strand-specificity checks ([`MultiQC`](http://multiqc.info/), [`R`](https://www.r-project.org/))
4444

4545
> **Note**

assets/multiqc_config.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -67,6 +67,7 @@ extra_fn_clean_exts:
6767
- ".umi_dedup"
6868
- "_val"
6969
- ".markdup"
70+
- "_primary"
7071

7172
# Customise the module search patterns to speed up execution time
7273
# - Skip module sub-tools that we are not interested in

bin/summarizedexperiment.r

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
library(SummarizedExperiment)
44

5-
## Create SummarizedExperiment (se) object from Salmon counts
5+
## Create SummarizedExperiment (se) object from counts
66

77
args <- commandArgs(trailingOnly = TRUE)
88
if (length(args) < 3) {

conf/modules.config

Lines changed: 4 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -705,11 +705,9 @@ if (!params.skip_alignment && params.aligner == 'star_salmon') {
705705
if (!params.skip_qc & !params.skip_deseq2_qc) {
706706
process {
707707
withName: 'DESEQ2_QC_STAR_SALMON' {
708-
ext.prefix = "deseq2"
709708
ext.args = { [
710709
"--id_col 1",
711710
"--sample_suffix ''",
712-
"--outprefix deseq2",
713711
"--count_col 3",
714712
params.deseq2_vst ? '--vst TRUE' : ''
715713
].join(' ').trim() }
@@ -770,11 +768,9 @@ if (!params.skip_alignment && params.aligner == 'star_rsem') {
770768
if (!params.skip_qc & !params.skip_deseq2_qc) {
771769
process {
772770
withName: 'DESEQ2_QC_RSEM' {
773-
ext.prefix = "deseq2"
774771
ext.args = { [
775772
"--id_col 1",
776773
"--sample_suffix ''",
777-
"--outprefix deseq2",
778774
"--count_col 3",
779775
params.deseq2_vst ? '--vst TRUE' : ''
780776
].join(' ').trim() }
@@ -1085,10 +1081,10 @@ if (!params.skip_multiqc) {
10851081
}
10861082

10871083
//
1088-
// Salmon/ Kallisto pseudo-alignment options
1084+
// Salmon/ Kallisto pseudoalignment options
10891085
//
10901086

1091-
if (params.pseudo_aligner == 'salmon') {
1087+
if (!params.skip_pseudo_alignment && params.pseudo_aligner == 'salmon') {
10921088
process {
10931089

10941090
withName: '.*:QUANTIFY_PSEUDO_ALIGNMENT:SALMON_QUANT' {
@@ -1102,15 +1098,14 @@ if (params.pseudo_aligner == 'salmon') {
11021098
}
11031099
}
11041100

1105-
if (params.pseudo_aligner == 'kallisto') {
1101+
if (!params.skip_pseudo_alignment && params.pseudo_aligner == 'kallisto') {
11061102
process {
11071103
withName: '.*:QUANTIFY_PSEUDO_ALIGNMENT:KALLISTO_QUANT' {
11081104
ext.args = params.extra_kallisto_quant_args ?: ''
1109-
11101105
publishDir = [
11111106
path: { "${params.outdir}/${params.pseudo_aligner}" },
11121107
mode: params.publish_dir_mode,
1113-
saveAs: { filename -> filename.equals('versions.yml') || filename.endsWith('_run_info.json') ? null : filename }
1108+
saveAs: { filename -> filename.equals('versions.yml') || filename.endsWith('.run_info.json') || filename.endsWith('.log.txt') ? null : filename }
11141109
]
11151110
}
11161111
}
@@ -1150,7 +1145,6 @@ if (!params.skip_pseudo_alignment) {
11501145
ext.args = { [
11511146
"--id_col 1",
11521147
"--sample_suffix ''",
1153-
"--outprefix deseq2",
11541148
"--count_col 3",
11551149
params.deseq2_vst ? '--vst TRUE' : ''
11561150
].join(' ').trim() }

docs/output.md

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -41,9 +41,9 @@ The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes d
4141
- [featureCounts](#featurecounts) - Read counting relative to gene biotype
4242
- [DESeq2](#deseq2) - PCA plot and sample pairwise distance heatmap and dendrogram
4343
- [MultiQC](#multiqc) - Present QC for raw reads, alignment, read counting and sample similiarity
44-
- [Pseudo-alignment and quantification](#pseudo-alignment-and-quantification)
45-
- [Salmon](#salmon) - Wicked fast gene and isoform quantification relative to the transcriptome
46-
- [Kallisto](#kallisto) - Near-optimal probabilistic RNA-seq quantification
44+
- [Pseudoalignment and quantification](#pseudoalignment-and-quantification)
45+
- [Salmon](#pseudoalignment) - Wicked fast gene and isoform quantification relative to the transcriptome
46+
- [Kallisto](#pseudoalignment) - Near-optimal probabilistic RNA-seq quantification
4747
Wicked fast gene and isoform quantification relative to the transcriptome
4848
- [Workflow reporting and genomes](#workflow-reporting-and-genomes)
4949
- [Reference genome files](#reference-genome-files) - Saving reference genome indices/files
@@ -205,7 +205,7 @@ The STAR section of the MultiQC report shows a bar plot with alignment rates: go
205205

206206
![MultiQC - STAR alignment scores plot](images/mqc_star.png)
207207

208-
[Salmon](https://salmon.readthedocs.io/en/latest/salmon.html) from [Ocean Genomics](https://oceangenomics.com/) and [Kallisto](https://pachterlab.github.io/kallisto/), from the Pachter Lab, are provided as options for pseudo-alignment. Both allow quantification of reads against an index generated from a reference set of target transcripts. By default, the transcriptome-level BAM files generated by STAR are provided to Salmon for downstream quantification, and Kallisto is not an option here (it does not allow BAM file input). But you can provide FASTQ files directly as input to either Salmon or Kallisto in order to pseudo-align and quantify your data by providing the `--pseudo_aligner (salmon or kallisto)` parameter. See the [Salmon](#salmon) and (Kallisto)[#kallisto] results sections for more details.
208+
[Salmon](https://salmon.readthedocs.io/en/latest/salmon.html) from [Ocean Genomics](https://oceangenomics.com/) and [Kallisto](https://pachterlab.github.io/kallisto/), from the Pachter Lab, are provided as options for pseudoalignment. Both allow quantification of reads against an index generated from a reference set of target transcripts. By default, the transcriptome-level BAM files generated by STAR are provided to Salmon for downstream quantification, and Kallisto is not an option here (it does not allow BAM file input). But you can provide FASTQ files directly as input to either Salmon or Kallisto in order to pseudoalign and quantify your data by providing the `--pseudo_aligner salmon` or `--pseudo_aligner kallisto` parameter. See the [Salmon](#pseudoalignment) and [Kallisto](#pseudoalignment) results sections for more details.
209209

210210
### STAR via RSEM
211211

@@ -670,9 +670,9 @@ The plot on the left hand side shows the standard PC plot - notice the variable
670670

671671
Results generated by MultiQC collate pipeline QC from supported tools i.e. FastQC, Cutadapt, SortMeRNA, STAR, RSEM, HISAT2, Salmon, SAMtools, Picard, RSeQC, Qualimap, Preseq and featureCounts. Additionally, various custom content has been added to the report to assess the output of dupRadar, DESeq2 and featureCounts biotypes, and to highlight samples failing a mimimum mapping threshold or those that failed to match the strand-specificity provided in the input samplesheet. The pipeline has special steps which also allow the software versions to be reported in the MultiQC output for future traceability. For more information about how to use MultiQC reports, see <http://multiqc.info>.
672672

673-
## Pseudo-alignment and quantification
673+
## Pseudoalignment and quantification
674674

675-
### Pseudo-alignment
675+
### Pseudoalignment
676676

677677
The principal output files are the same between Salmon and Kallsto:
678678

@@ -717,10 +717,10 @@ An additional subset of files are distinct to each tool, for Salmon:
717717
- `abundance.h5`: a HDF5 binary file containing run info, abundance esimates, bootstrap estimates, and transcript length information length. This file can be read in by [sleuth](https://pachterlab.github.io/sleuth/about).
718718
- `abundance.tsv`: a plaintext file of the abundance estimates. It does not contains bootstrap estimates.
719719
- `run_info.json`: a json file containing information about the run.
720-
- `<SAMPLE>.log.txt`: standard output from the Kallisto process per sample.
721-
</details>
720+
- `kallisto_quant.log`: standard output from the Kallisto process per sample.
721+
</details>
722722

723-
As described in the [STAR and Salmon](#star-and-salmon) section, you can choose to pseudo-align and quantify your data with [Salmon](https://salmon.readthedocs.io/en/latest/salmon.html) or [Kallisto](https://pachterlab.github.io/kallisto/) by providing the `--pseudo_aligner` parameter. By default, Salmon is run in addition to the standard alignment workflow defined by `--aligner`, mainly because it allows you to obtain QC metrics with respect to the genomic alignments. However, you can provide the `--skip_alignment` parameter if you would like to run Salmon or Kallisto in isolation. If Salmon or Kallisto are run in isolation, the outputs mentioned above will be found in a folder named `salmon` or `kallisto`. If Salmon is run alongside STAR, the folder will be named `star_salmon`.
723+
As described in the [STAR and Salmon](#star-and-salmon) section, you can choose to pseudoalign and quantify your data with [Salmon](https://salmon.readthedocs.io/en/latest/salmon.html) or [Kallisto](https://pachterlab.github.io/kallisto/) by providing the `--pseudo_aligner` parameter. By default, Salmon is run in addition to the standard alignment workflow defined by `--aligner`, mainly because it allows you to obtain QC metrics with respect to the genomic alignments. However, you can provide the `--skip_alignment` parameter if you would like to run Salmon or Kallisto in isolation. If Salmon or Kallisto are run in isolation, the outputs mentioned above will be found in a folder named `salmon` or `kallisto`. If Salmon is run alongside STAR, the folder will be named `star_salmon`.
724724

725725
Transcripts with large inferential uncertainty won't be assigned the exact number of reads reproducibly, every time Salmon is run. Read more about this on the [nf-core/rnaseq](https://github.com/nf-core/rnaseq/issues/585) and [salmon](https://github.com/COMBINE-lab/salmon/issues/613) Github repos.
726726

0 commit comments

Comments
 (0)