Skip to content

Commit 8033764

Browse files
authored
Merge pull request #1475 from nf-core/umi_dedup_log_path
Fix log publishing around umitools/ umicollapse
2 parents a27ec8e + 0a21c3f commit 8033764

File tree

6 files changed

+137
-34
lines changed

6 files changed

+137
-34
lines changed

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,7 @@ Special thanks to the following for their contributions to the release:
2727
- [PR #1471](https://github.com/nf-core/rnaseq/pull/1471) - Fix prepare_genome subworkflow for sortmerna
2828
- [PR #1473](https://github.com/nf-core/rnaseq/pull/1473) - Bump STAR modules
2929
- [PR #1474](https://github.com/nf-core/rnaseq/pull/1474) - Bump versions to 3.18.0
30+
- [PR #1475](https://github.com/nf-core/rnaseq/pull/1475) - Fix log publishing around umitools/ umicollapse
3031

3132
## Parameters
3233

docs/output.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -120,7 +120,7 @@ If multiple libraries/runs have been provided for the same sample in the input s
120120

121121
</details>
122122

123-
[UMI-tools](https://github.com/CGATOxford/UMI-tools) deduplicates reads based on unique molecular identifiers (UMIs) to address PCR-bias. Firstly, the UMI-tools `extract` command removes the UMI barcode information from the read sequence and adds it to the read name. Secondly, reads are deduplicated based on UMI identifier after mapping as highlighted in the [UMI-tools dedup](#umi-tools-dedup) section.
123+
[UMI-tools](https://github.com/CGATOxford/UMI-tools) and [UMICollapse](https://github.com/Daniel-Liu-c0deb0t/UMICollapse) deduplicate reads based on unique molecular identifiers (UMIs) to address PCR-bias. Firstly, the UMI-tools `extract` command removes the UMI barcode information from the read sequence and adds it to the read name. Secondly, reads are deduplicated based on UMI identifier after mapping as highlighted in the [UMI dedup](#umi-dedup) section.
124124

125125
To facilitate processing of input data which has the UMI barcode already embedded in the read name from the start, `--skip_umi_extract` can be specified in conjunction with `--with_umi`.
126126

@@ -305,7 +305,7 @@ The original BAM files generated by the selected alignment algorithm are further
305305

306306
![MultiQC - SAMtools mapped reads per contig plot](images/mqc_samtools_idxstats.png)
307307

308-
### UMI-tools dedup
308+
### UMI dedup
309309

310310
<details markdown="1">
311311
<summary>Output files</summary>
@@ -314,7 +314,7 @@ The original BAM files generated by the selected alignment algorithm are further
314314
- `<SAMPLE>.umi_dedup.sorted.bam`: If `--save_umi_intermeds` is specified the UMI deduplicated, coordinate sorted BAM file containing read alignments will be placed in this directory.
315315
- `<SAMPLE>.umi_dedup.sorted.bam.bai`: If `--save_umi_intermeds` is specified the BAI index file for the UMI deduplicated, coordinate sorted BAM file will be placed in this directory.
316316
- `<SAMPLE>.umi_dedup.sorted.bam.csi`: If `--save_umi_intermeds --bam_csi_index` is specified the CSI index file for the UMI deduplicated, coordinate sorted BAM file will be placed in this directory.
317-
- `<ALIGNER>/umitools/`
317+
- `<ALIGNER>/umitools/` (UMI-tools only)
318318
- `*_edit_distance.tsv`: Reports the (binned) average edit distance between the UMIs at each position.
319319
- `*_per_umi.tsv`: UMI-level summary statistics.
320320
- `*_per_umi_per_position.tsv`: Tabulates the counts for unique combinations of UMI and position.
@@ -323,7 +323,7 @@ The content of the files above is explained in more detail in the [UMI-tools doc
323323

324324
</details>
325325

326-
After extracting the UMI information from the read sequence (see [UMI-tools extract](#umi-tools-extract)), the second step in the removal of UMI barcodes involves deduplicating the reads based on both mapping and UMI barcode information using the UMI-tools `dedup` command. This will generate a filtered BAM file after the removal of PCR duplicates.
326+
After extracting the UMI information from the read sequence (see [UMI-tools extract](#umi-tools-extract)), the second step in the removal of UMI barcodes involves deduplicating the reads based on both mapping and UMI barcode information. UMI deduplication can be carried out either with [UMI-tools](https://github.com/CGATOxford/UMI-tools) or [UMICollapse](https://github.com/Daniel-Liu-c0deb0t/UMICollapse), set via the `umi_dedup_tool` parameter. The output BAM files are the same, though UMI-tools has some additional outputs, as described above. Either method will generate a filtered BAM file after the removal of PCR duplicates.
327327

328328
### picard MarkDuplicates
329329

tests/.nftignore

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,7 @@ umitools/*.umi_extract.log
3232
{hisat2,star_rsem,star_salmon}/stringtie/*.ballgown/t_data.ctab
3333
{hisat2,star_rsem,star_salmon}/stringtie/*.gene.abundance.txt
3434
{hisat2,star_rsem,star_salmon}/stringtie/*.{coverage,transcripts}.gtf
35-
{hisat2,star_rsem,star_salmon}/umitools/genomic_dedup_log/*_UMICollapse.log
35+
{hisat2,star_rsem,star_salmon}/{umitools,umicollapse}/{genomic,transcriptomic}_dedup_log/*.log
3636
{multiqc,multiqc/**}/multiqc_report.html
3737
{multiqc,multiqc/**}/multiqc_report_data/fastqc_{raw,trimmed}_top_overrepresented_sequences_table.txt
3838
{multiqc,multiqc/**}/multiqc_report_data/hisat2_pe_plot.txt

tests/umi.nf.test

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@ nextflow_pipeline {
1515
umi_dedup_tool = 'umicollapse'
1616
aligner = 'hisat2'
1717
outdir = "$outputDir"
18+
save_umi_intermeds = true
1819
}
1920
}
2021

@@ -49,6 +50,7 @@ nextflow_pipeline {
4950
umitools_dedup_stats = true
5051
skip_bbsplit = true
5152
outdir = "$outputDir"
53+
save_umi_intermeds = true
5254
}
5355
}
5456

tests/umi.nf.test.snap

Lines changed: 71 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -612,6 +612,10 @@
612612
"star_salmon/RAP1_IAA_30M_REP1",
613613
"star_salmon/RAP1_IAA_30M_REP1.umi_dedup.sorted.bam",
614614
"star_salmon/RAP1_IAA_30M_REP1.umi_dedup.sorted.bam.bai",
615+
"star_salmon/RAP1_IAA_30M_REP1.umi_dedup.transcriptome.bam",
616+
"star_salmon/RAP1_IAA_30M_REP1.umi_dedup.transcriptome.filtered.bam",
617+
"star_salmon/RAP1_IAA_30M_REP1.umi_dedup.transcriptome.sorted.bam",
618+
"star_salmon/RAP1_IAA_30M_REP1.umi_dedup.transcriptome.sorted.bam.bai",
615619
"star_salmon/RAP1_IAA_30M_REP1/aux_info",
616620
"star_salmon/RAP1_IAA_30M_REP1/aux_info/ambig_info.tsv",
617621
"star_salmon/RAP1_IAA_30M_REP1/aux_info/expected_bias.gz",
@@ -629,6 +633,9 @@
629633
"star_salmon/RAP1_UNINDUCED_REP1",
630634
"star_salmon/RAP1_UNINDUCED_REP1.umi_dedup.sorted.bam",
631635
"star_salmon/RAP1_UNINDUCED_REP1.umi_dedup.sorted.bam.bai",
636+
"star_salmon/RAP1_UNINDUCED_REP1.umi_dedup.transcriptome.bam",
637+
"star_salmon/RAP1_UNINDUCED_REP1.umi_dedup.transcriptome.sorted.bam",
638+
"star_salmon/RAP1_UNINDUCED_REP1.umi_dedup.transcriptome.sorted.bam.bai",
632639
"star_salmon/RAP1_UNINDUCED_REP1/aux_info",
633640
"star_salmon/RAP1_UNINDUCED_REP1/aux_info/ambig_info.tsv",
634641
"star_salmon/RAP1_UNINDUCED_REP1/aux_info/expected_bias.gz",
@@ -646,6 +653,9 @@
646653
"star_salmon/RAP1_UNINDUCED_REP2",
647654
"star_salmon/RAP1_UNINDUCED_REP2.umi_dedup.sorted.bam",
648655
"star_salmon/RAP1_UNINDUCED_REP2.umi_dedup.sorted.bam.bai",
656+
"star_salmon/RAP1_UNINDUCED_REP2.umi_dedup.transcriptome.bam",
657+
"star_salmon/RAP1_UNINDUCED_REP2.umi_dedup.transcriptome.sorted.bam",
658+
"star_salmon/RAP1_UNINDUCED_REP2.umi_dedup.transcriptome.sorted.bam.bai",
649659
"star_salmon/RAP1_UNINDUCED_REP2/aux_info",
650660
"star_salmon/RAP1_UNINDUCED_REP2/aux_info/ambig_info.tsv",
651661
"star_salmon/RAP1_UNINDUCED_REP2/aux_info/expected_bias.gz",
@@ -663,6 +673,10 @@
663673
"star_salmon/WT_REP1",
664674
"star_salmon/WT_REP1.umi_dedup.sorted.bam",
665675
"star_salmon/WT_REP1.umi_dedup.sorted.bam.bai",
676+
"star_salmon/WT_REP1.umi_dedup.transcriptome.bam",
677+
"star_salmon/WT_REP1.umi_dedup.transcriptome.filtered.bam",
678+
"star_salmon/WT_REP1.umi_dedup.transcriptome.sorted.bam",
679+
"star_salmon/WT_REP1.umi_dedup.transcriptome.sorted.bam.bai",
666680
"star_salmon/WT_REP1/aux_info",
667681
"star_salmon/WT_REP1/aux_info/ambig_info.tsv",
668682
"star_salmon/WT_REP1/aux_info/expected_bias.gz",
@@ -680,6 +694,10 @@
680694
"star_salmon/WT_REP2",
681695
"star_salmon/WT_REP2.umi_dedup.sorted.bam",
682696
"star_salmon/WT_REP2.umi_dedup.sorted.bam.bai",
697+
"star_salmon/WT_REP2.umi_dedup.transcriptome.bam",
698+
"star_salmon/WT_REP2.umi_dedup.transcriptome.filtered.bam",
699+
"star_salmon/WT_REP2.umi_dedup.transcriptome.sorted.bam",
700+
"star_salmon/WT_REP2.umi_dedup.transcriptome.sorted.bam.bai",
683701
"star_salmon/WT_REP2/aux_info",
684702
"star_salmon/WT_REP2/aux_info/ambig_info.tsv",
685703
"star_salmon/WT_REP2/aux_info/expected_bias.gz",
@@ -1261,10 +1279,18 @@
12611279
"trimgalore/WT_REP2_trimmed_2.fastq.gz_trimming_report.txt",
12621280
"umitools",
12631281
"umitools/RAP1_IAA_30M_REP1.umi_extract.log",
1282+
"umitools/RAP1_IAA_30M_REP1.umi_extract_1.fastq.gz",
1283+
"umitools/RAP1_IAA_30M_REP1.umi_extract_2.fastq.gz",
1284+
"umitools/RAP1_UNINDUCED_REP1.umi_extract.fastq.gz",
12641285
"umitools/RAP1_UNINDUCED_REP1.umi_extract.log",
1286+
"umitools/RAP1_UNINDUCED_REP2.umi_extract.fastq.gz",
12651287
"umitools/RAP1_UNINDUCED_REP2.umi_extract.log",
12661288
"umitools/WT_REP1.umi_extract.log",
1267-
"umitools/WT_REP2.umi_extract.log"
1289+
"umitools/WT_REP1.umi_extract_1.fastq.gz",
1290+
"umitools/WT_REP1.umi_extract_2.fastq.gz",
1291+
"umitools/WT_REP2.umi_extract.log",
1292+
"umitools/WT_REP2.umi_extract_1.fastq.gz",
1293+
"umitools/WT_REP2.umi_extract_2.fastq.gz"
12681294
],
12691295
[
12701296
"genome_gfp.fasta:md5,e23e302af63736a199985a169fdac055",
@@ -1467,14 +1493,22 @@
14671493
"WT_REP2.umi_dedup.sorted_per_umi_per_position.tsv:md5,6f5656947a7f0076df446e6f40430027",
14681494
"WT_REP2.umi_dedup.transcriptome.sorted_edit_distance.tsv:md5,3e3c6a7e8996e566350742e9911366d3",
14691495
"WT_REP2.umi_dedup.transcriptome.sorted_per_umi.tsv:md5,0c986c4cb7a77f650a19e2c454b9b179",
1470-
"WT_REP2.umi_dedup.transcriptome.sorted_per_umi_per_position.tsv:md5,af9028dbdab81de3854a32cd1d19ac8b"
1496+
"WT_REP2.umi_dedup.transcriptome.sorted_per_umi_per_position.tsv:md5,af9028dbdab81de3854a32cd1d19ac8b",
1497+
"RAP1_IAA_30M_REP1.umi_extract_1.fastq.gz:md5,e83d7f738fbbfaa541a2e71fe4663447",
1498+
"RAP1_IAA_30M_REP1.umi_extract_2.fastq.gz:md5,4f2873cbf584d6e84187238a4ae2b8fa",
1499+
"RAP1_UNINDUCED_REP1.umi_extract.fastq.gz:md5,9e42242fd68baac592140f63a8a716ce",
1500+
"RAP1_UNINDUCED_REP2.umi_extract.fastq.gz:md5,5a92b642927b8603c4765e5305e23e9c",
1501+
"WT_REP1.umi_extract_1.fastq.gz:md5,f312fac9c384a889ae4f959839263604",
1502+
"WT_REP1.umi_extract_2.fastq.gz:md5,ffca24924108fd54151620b7538b9e1a",
1503+
"WT_REP2.umi_extract_1.fastq.gz:md5,c3180451a24ce51fc35c1684521ae287",
1504+
"WT_REP2.umi_extract_2.fastq.gz:md5,067ff23f8d1307ad241cd70bc186b5c1"
14711505
]
14721506
],
14731507
"meta": {
1474-
"nf-test": "0.9.0",
1475-
"nextflow": "24.10.2"
1508+
"nf-test": "0.9.2",
1509+
"nextflow": "24.10.3"
14761510
},
1477-
"timestamp": "2024-12-11T18:07:55.751564456"
1511+
"timestamp": "2024-12-20T00:02:04.611696704"
14781512
},
14791513
"Params: --aligner hisat2 --umi_dedup_tool 'umicollapse'": {
14801514
"content": [
@@ -2130,13 +2164,13 @@
21302164
"hisat2/stringtie/WT_REP2.coverage.gtf",
21312165
"hisat2/stringtie/WT_REP2.gene.abundance.txt",
21322166
"hisat2/stringtie/WT_REP2.transcripts.gtf",
2133-
"hisat2/umitools",
2134-
"hisat2/umitools/genomic_dedup_log",
2135-
"hisat2/umitools/genomic_dedup_log/RAP1_IAA_30M_REP1.umi_dedup.sorted_UMICollapse.log",
2136-
"hisat2/umitools/genomic_dedup_log/RAP1_UNINDUCED_REP1.umi_dedup.sorted_UMICollapse.log",
2137-
"hisat2/umitools/genomic_dedup_log/RAP1_UNINDUCED_REP2.umi_dedup.sorted_UMICollapse.log",
2138-
"hisat2/umitools/genomic_dedup_log/WT_REP1.umi_dedup.sorted_UMICollapse.log",
2139-
"hisat2/umitools/genomic_dedup_log/WT_REP2.umi_dedup.sorted_UMICollapse.log",
2167+
"hisat2/umicollapse",
2168+
"hisat2/umicollapse/genomic_dedup_log",
2169+
"hisat2/umicollapse/genomic_dedup_log/RAP1_IAA_30M_REP1.umi_dedup.sorted_UMICollapse.log",
2170+
"hisat2/umicollapse/genomic_dedup_log/RAP1_UNINDUCED_REP1.umi_dedup.sorted_UMICollapse.log",
2171+
"hisat2/umicollapse/genomic_dedup_log/RAP1_UNINDUCED_REP2.umi_dedup.sorted_UMICollapse.log",
2172+
"hisat2/umicollapse/genomic_dedup_log/WT_REP1.umi_dedup.sorted_UMICollapse.log",
2173+
"hisat2/umicollapse/genomic_dedup_log/WT_REP2.umi_dedup.sorted_UMICollapse.log",
21402174
"multiqc",
21412175
"multiqc/hisat2",
21422176
"multiqc/hisat2/multiqc_report.html",
@@ -2548,10 +2582,18 @@
25482582
"trimgalore/WT_REP2_trimmed_2.fastq.gz_trimming_report.txt",
25492583
"umitools",
25502584
"umitools/RAP1_IAA_30M_REP1.umi_extract.log",
2585+
"umitools/RAP1_IAA_30M_REP1.umi_extract_1.fastq.gz",
2586+
"umitools/RAP1_IAA_30M_REP1.umi_extract_2.fastq.gz",
2587+
"umitools/RAP1_UNINDUCED_REP1.umi_extract.fastq.gz",
25512588
"umitools/RAP1_UNINDUCED_REP1.umi_extract.log",
2589+
"umitools/RAP1_UNINDUCED_REP2.umi_extract.fastq.gz",
25522590
"umitools/RAP1_UNINDUCED_REP2.umi_extract.log",
25532591
"umitools/WT_REP1.umi_extract.log",
2554-
"umitools/WT_REP2.umi_extract.log"
2592+
"umitools/WT_REP1.umi_extract_1.fastq.gz",
2593+
"umitools/WT_REP1.umi_extract_2.fastq.gz",
2594+
"umitools/WT_REP2.umi_extract.log",
2595+
"umitools/WT_REP2.umi_extract_1.fastq.gz",
2596+
"umitools/WT_REP2.umi_extract_2.fastq.gz"
25552597
],
25562598
[
25572599
"genome_gfp.fasta:md5,e23e302af63736a199985a169fdac055",
@@ -2688,14 +2730,22 @@
26882730
"cmd_info.json:md5,809380ddce725a8fab75dd7741b64bf6",
26892731
"lib_format_counts.json:md5,d231ba7624b67eb654989f69530e2925",
26902732
"R_sessionInfo.log:md5,fb0da0d7ad6994ed66a8e68348b19676",
2691-
"tx2gene.tsv:md5,0e2418a69d2eba45097ebffc2f700bfe"
2733+
"tx2gene.tsv:md5,0e2418a69d2eba45097ebffc2f700bfe",
2734+
"RAP1_IAA_30M_REP1.umi_extract_1.fastq.gz:md5,e83d7f738fbbfaa541a2e71fe4663447",
2735+
"RAP1_IAA_30M_REP1.umi_extract_2.fastq.gz:md5,4f2873cbf584d6e84187238a4ae2b8fa",
2736+
"RAP1_UNINDUCED_REP1.umi_extract.fastq.gz:md5,9e42242fd68baac592140f63a8a716ce",
2737+
"RAP1_UNINDUCED_REP2.umi_extract.fastq.gz:md5,5a92b642927b8603c4765e5305e23e9c",
2738+
"WT_REP1.umi_extract_1.fastq.gz:md5,f312fac9c384a889ae4f959839263604",
2739+
"WT_REP1.umi_extract_2.fastq.gz:md5,ffca24924108fd54151620b7538b9e1a",
2740+
"WT_REP2.umi_extract_1.fastq.gz:md5,c3180451a24ce51fc35c1684521ae287",
2741+
"WT_REP2.umi_extract_2.fastq.gz:md5,067ff23f8d1307ad241cd70bc186b5c1"
26922742
]
26932743
],
26942744
"meta": {
2695-
"nf-test": "0.9.0",
2696-
"nextflow": "24.10.2"
2745+
"nf-test": "0.9.2",
2746+
"nextflow": "24.10.3"
26972747
},
2698-
"timestamp": "2024-12-11T18:01:45.228731692"
2748+
"timestamp": "2024-12-19T22:33:42.012684597"
26992749
},
27002750
"--umi_dedup_tool 'umitools - stub": {
27012751
"content": [
@@ -2804,9 +2854,9 @@
28042854
]
28052855
],
28062856
"meta": {
2807-
"nf-test": "0.9.0",
2808-
"nextflow": "24.10.2"
2857+
"nf-test": "0.9.2",
2858+
"nextflow": "24.10.3"
28092859
},
2810-
"timestamp": "2024-12-11T18:08:48.404716766"
2860+
"timestamp": "2024-12-19T23:28:01.570835895"
28112861
}
2812-
}
2862+
}

0 commit comments

Comments
 (0)