Skip to content

Commit b158336

Browse files
authored
Merge branch 'dev' into fix-de-tutorial
2 parents fa1add7 + 0b70ad7 commit b158336

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

62 files changed

+5097
-791
lines changed

.github/workflows/ci.yml

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -51,7 +51,6 @@ jobs:
5151
matrix:
5252
NXF_VER:
5353
- "24.04.2"
54-
- "latest-everything"
5554
nf_test_files: ["${{ fromJson(needs.nf-test-changes.outputs.nf_test_files) }}"]
5655
profile:
5756
- "docker"

CHANGELOG.md

Lines changed: 34 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,19 +3,52 @@
33
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/)
44
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
55

6-
# 3.18.0dev - xxxx-xx-xx
6+
# 3.19.0dev - xxxx-xx-xx
7+
8+
### Credits
9+
10+
### Enhancements & fixes
11+
12+
- [PR #1480](https://github.com/nf-core/rnaseq/pull/1480) - Bump version after release 3.18.0
13+
- [PR #1482](https://github.com/nf-core/rnaseq/pull/1482) - Update trimgalore module for save_unpaired fix
14+
- [PR #1486](https://github.com/nf-core/rnaseq/pull/1486) - Bump STAR build for multiprocessing fix
15+
- [PR #1490](https://github.com/nf-core/rnaseq/pull/1490) - Make genomic FASTA input optional
16+
17+
# 3.18.0 - 2024-12-19
718

819
### Credits
920

1021
Special thanks to the following for their contributions to the release:
1122

1223
- [Caitlin Winkler](https://github.com/oligomyeggo)
24+
- [Jonathan Manning](https://github.com/pinin4fjords)
25+
- [Lorenzo Sola](https://github.com/LorenzoS96)
26+
- [Maxime Garcia](https://github.com/maxulysse)
1327
- [Siddhartha Bagaria](https://github.com/siddharthab)
1428

1529
### Enhancements & fixes
1630

1731
- [PR #1369](https://github.com/nf-core/rnaseq/pull/1369) - Add umicollapse as an alternative to umi-tools
1832
- [PR #1461](https://github.com/nf-core/rnaseq/pull/1461) - Add FASTQ linting during preprocessing
33+
- [PR #1463](https://github.com/nf-core/rnaseq/pull/1463) - Move channel operations outside of the onComplete() block
34+
- [PR #1467](https://github.com/nf-core/rnaseq/pull/1467) - Add test suite for UMI handling functionality
35+
- [PR #1466](https://github.com/nf-core/rnaseq/pull/1466) - Factor out UMI handling
36+
- [PR #1470](https://github.com/nf-core/rnaseq/pull/1470) - Update subworkflow to account for fix to bad argument handling
37+
- [PR #1469](https://github.com/nf-core/rnaseq/pull/1469) - Minor docs fix
38+
- [PR #1459](https://github.com/nf-core/rnaseq/pull/1466) - Remove reference to unused "skip_sample_count" value in email templates
39+
- [PR #1471](https://github.com/nf-core/rnaseq/pull/1471) - Fix prepare_genome subworkflow for sortmerna
40+
- [PR #1473](https://github.com/nf-core/rnaseq/pull/1473) - Bump STAR modules
41+
- [PR #1474](https://github.com/nf-core/rnaseq/pull/1474) - Bump versions to 3.18.0
42+
- [PR #1475](https://github.com/nf-core/rnaseq/pull/1475) - Fix log publishing around umitools/ umicollapse
43+
- [PR #1447](https://github.com/nf-core/rnaseq/pull/1447) - Add tutorial series for analysing count data
44+
45+
## Parameters
46+
47+
| Old parameter | New parameter |
48+
| ------------- | --------------------- |
49+
| | `--skip_linting` |
50+
| | `--extra_fqlint_args` |
51+
| | `--umi_dedup_tool` |
1952

2053
### Software dependencies
2154

assets/email_template.html

Lines changed: 0 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -34,25 +34,6 @@ <h4 style="margin-top: 0; color: inherit">nf-core/rnaseq execution completed uns
3434
<p>The full error message was:</p>
3535
<pre style="white-space: pre-wrap; overflow: visible; margin-bottom: 0">${errorReport}</pre>
3636
</div>
37-
""" } else if(skip_sample_count > 0) { out << """
38-
<div
39-
style="
40-
color: #856404;
41-
background-color: #fff3cd;
42-
border-color: #ffeeba;
43-
padding: 15px;
44-
margin-bottom: 20px;
45-
border: 1px solid transparent;
46-
border-radius: 4px;
47-
"
48-
>
49-
<h4 style="margin-top: 0; color: inherit">nf-core/rnaseq execution completed with warnings!</h4>
50-
<p>
51-
The pipeline finished successfully, but samples were skipped. Please check warnings at the top of the MultiQC report.
52-
</p>
53-
<p></p>
54-
</div>
55-
5637
""" } else { out << """
5738
<div
5839
style="

assets/email_template.txt

Lines changed: 0 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -17,13 +17,6 @@ The full error message was:
1717

1818
${errorReport}
1919
"""
20-
} else if (skip_sample_count > 0) {
21-
out << """##################################################
22-
## nf-core/rnaseq execution completed with warnings ##
23-
##################################################
24-
The pipeline finished successfully, but samples were skipped.
25-
Please check warnings at the top of the MultiQC report.
26-
"""
2720
} else {
2821
out << "## nf-core/rnaseq execution completed successfully! ##"
2922
}

bin/filter_gtf.py

Lines changed: 8 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66
import argparse
77
import re
88
import statistics
9-
from typing import Set
9+
from typing import Optional, Set
1010

1111
# Create a logger
1212
logging.basicConfig(format="%(name)s - %(asctime)s %(levelname)s: %(message)s")
@@ -27,14 +27,15 @@ def tab_delimited(file: str) -> float:
2727
return statistics.median(line.count("\t") for line in data.split("\n"))
2828

2929

30-
def filter_gtf(fasta: str, gtf_in: str, filtered_gtf_out: str, skip_transcript_id_check: bool) -> None:
30+
def filter_gtf(fasta: Optional[str], gtf_in: str, filtered_gtf_out: str, skip_transcript_id_check: bool) -> None:
3131
"""Filter GTF file based on FASTA sequence names."""
3232
if tab_delimited(gtf_in) != 8:
3333
raise ValueError("Invalid GTF file: Expected 9 tab-separated columns.")
3434

35-
seq_names_in_genome = extract_fasta_seq_names(fasta)
36-
logger.info(f"Extracted chromosome sequence names from {fasta}")
37-
logger.debug("All sequence IDs from FASTA: " + ", ".join(sorted(seq_names_in_genome)))
35+
if (fasta is not None):
36+
seq_names_in_genome = extract_fasta_seq_names(fasta)
37+
logger.info(f"Extracted chromosome sequence names from {fasta}")
38+
logger.debug("All sequence IDs from FASTA: " + ", ".join(sorted(seq_names_in_genome)))
3839

3940
seq_names_in_gtf = set()
4041
try:
@@ -44,7 +45,7 @@ def filter_gtf(fasta: str, gtf_in: str, filtered_gtf_out: str, skip_transcript_i
4445
seq_name = line.split("\t")[0]
4546
seq_names_in_gtf.add(seq_name) # Add sequence name to the set
4647

47-
if seq_name in seq_names_in_genome:
48+
if fasta is None or seq_name in seq_names_in_genome:
4849
if skip_transcript_id_check or re.search(r'transcript_id "([^"]+)"', line):
4950
out.write(line)
5051
line_count += 1
@@ -63,7 +64,7 @@ def filter_gtf(fasta: str, gtf_in: str, filtered_gtf_out: str, skip_transcript_i
6364
if __name__ == "__main__":
6465
parser = argparse.ArgumentParser(description="Filters a GTF file based on sequence names in a FASTA file.")
6566
parser.add_argument("--gtf", type=str, required=True, help="GTF file")
66-
parser.add_argument("--fasta", type=str, required=True, help="Genome fasta file")
67+
parser.add_argument("--fasta", type=str, required=False, help="Genome fasta file")
6768
parser.add_argument("--prefix", dest="prefix", default="genes", type=str, help="Prefix for output GTF files")
6869
parser.add_argument(
6970
"--skip_transcript_id_check", action="store_true", help="Skip checking for transcript IDs in the GTF file"

conf/arm.config

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -121,11 +121,11 @@ process {
121121
}
122122

123123
withName: 'STAR_GENOMEGENERATE' {
124-
container = { workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? 'https://community-cr-prod.seqera.io/docker/registry/v2/blobs/sha256/a2/a2d5226e4ce3dee8b29154c16a87d282d96c76e75b6678d032643902591586e2/data' : 'community.wave.seqera.io/library/htslib_samtools_star_gawk:1d1b7da208684cac' }
124+
container = { workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? 'https://community-cr-prod.seqera.io/docker/registry/v2/blobs/sha256/40/40d803371e50330de0773c7cc50315e2c3b4b41dcf123823adeb0a07d71654c1/data' : 'community.wave.seqera.io/library/htslib_samtools_star_gawk:ae438e9a604351a4' }
125125
}
126126

127127
withName: 'STAR_ALIGN' {
128-
container = { workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? 'https://community-cr-prod.seqera.io/docker/registry/v2/blobs/sha256/a2/a2d5226e4ce3dee8b29154c16a87d282d96c76e75b6678d032643902591586e2/data' : 'community.wave.seqera.io/library/htslib_samtools_star_gawk:1d1b7da208684cac' }
128+
container = { workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? 'https://community-cr-prod.seqera.io/docker/registry/v2/blobs/sha256/40/40d803371e50330de0773c7cc50315e2c3b4b41dcf123823adeb0a07d71654c1/data' : 'community.wave.seqera.io/library/htslib_samtools_star_gawk:ae438e9a604351a4' }
129129
}
130130

131131
withName: 'TXIMETA_TXIMPORT' {

docs/images/mqc_fastqc_adapter.png

22.9 KB
Loading

docs/images/mqc_fastqc_counts.png

33.1 KB
Loading

docs/images/mqc_fastqc_quality.png

54.5 KB
Loading

docs/output.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -74,7 +74,7 @@ The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes d
7474

7575
If multiple libraries/runs have been provided for the same sample in the input samplesheet (e.g. to increase sequencing depth) then these will be merged at the very beginning of the pipeline in order to have consistent sample naming throughout the pipeline. Please refer to the [usage documentation](https://nf-co.re/rnaseq/usage#samplesheet-input) to see how to specify these samples in the input samplesheet.
7676

77-
# fq lint
77+
### fq lint
7878

7979
<details markdown="1">
8080
<summary>Output files</summary>
@@ -120,7 +120,7 @@ If multiple libraries/runs have been provided for the same sample in the input s
120120

121121
</details>
122122

123-
[UMI-tools](https://github.com/CGATOxford/UMI-tools) deduplicates reads based on unique molecular identifiers (UMIs) to address PCR-bias. Firstly, the UMI-tools `extract` command removes the UMI barcode information from the read sequence and adds it to the read name. Secondly, reads are deduplicated based on UMI identifier after mapping as highlighted in the [UMI-tools dedup](#umi-tools-dedup) section.
123+
[UMI-tools](https://github.com/CGATOxford/UMI-tools) and [UMICollapse](https://github.com/Daniel-Liu-c0deb0t/UMICollapse) deduplicate reads based on unique molecular identifiers (UMIs) to address PCR-bias. Firstly, the UMI-tools `extract` command removes the UMI barcode information from the read sequence and adds it to the read name. Secondly, reads are deduplicated based on UMI identifier after mapping as highlighted in the [UMI dedup](#umi-dedup) section.
124124

125125
To facilitate processing of input data which has the UMI barcode already embedded in the read name from the start, `--skip_umi_extract` can be specified in conjunction with `--with_umi`.
126126

@@ -305,7 +305,7 @@ The original BAM files generated by the selected alignment algorithm are further
305305

306306
![MultiQC - SAMtools mapped reads per contig plot](images/mqc_samtools_idxstats.png)
307307

308-
### UMI-tools dedup
308+
### UMI dedup
309309

310310
<details markdown="1">
311311
<summary>Output files</summary>
@@ -314,7 +314,7 @@ The original BAM files generated by the selected alignment algorithm are further
314314
- `<SAMPLE>.umi_dedup.sorted.bam`: If `--save_umi_intermeds` is specified the UMI deduplicated, coordinate sorted BAM file containing read alignments will be placed in this directory.
315315
- `<SAMPLE>.umi_dedup.sorted.bam.bai`: If `--save_umi_intermeds` is specified the BAI index file for the UMI deduplicated, coordinate sorted BAM file will be placed in this directory.
316316
- `<SAMPLE>.umi_dedup.sorted.bam.csi`: If `--save_umi_intermeds --bam_csi_index` is specified the CSI index file for the UMI deduplicated, coordinate sorted BAM file will be placed in this directory.
317-
- `<ALIGNER>/umitools/`
317+
- `<ALIGNER>/umitools/` (UMI-tools only)
318318
- `*_edit_distance.tsv`: Reports the (binned) average edit distance between the UMIs at each position.
319319
- `*_per_umi.tsv`: UMI-level summary statistics.
320320
- `*_per_umi_per_position.tsv`: Tabulates the counts for unique combinations of UMI and position.
@@ -323,7 +323,7 @@ The content of the files above is explained in more detail in the [UMI-tools doc
323323

324324
</details>
325325

326-
After extracting the UMI information from the read sequence (see [UMI-tools extract](#umi-tools-extract)), the second step in the removal of UMI barcodes involves deduplicating the reads based on both mapping and UMI barcode information using the UMI-tools `dedup` command. This will generate a filtered BAM file after the removal of PCR duplicates.
326+
After extracting the UMI information from the read sequence (see [UMI-tools extract](#umi-tools-extract)), the second step in the removal of UMI barcodes involves deduplicating the reads based on both mapping and UMI barcode information. UMI deduplication can be carried out either with [UMI-tools](https://github.com/CGATOxford/UMI-tools) or [UMICollapse](https://github.com/Daniel-Liu-c0deb0t/UMICollapse), set via the `umi_dedup_tool` parameter. The output BAM files are the same, though UMI-tools has some additional outputs, as described above. Either method will generate a filtered BAM file after the removal of PCR duplicates.
327327

328328
### picard MarkDuplicates
329329

0 commit comments

Comments
 (0)