Skip to content

Commit 88e23f4

Browse files
Merge branch 'dev' into nf-core-template-merge-3.4.1
2 parents 879f712 + 86a011b commit 88e23f4

File tree

109 files changed

+8537
-990
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

109 files changed

+8537
-990
lines changed

.github/workflows/awsfulltest.yml

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,6 @@ jobs:
2424
2525
- name: Launch workflow via Seqera Platform
2626
uses: seqeralabs/action-tower-launch@v2
27-
# TODO nf-core: You can customise AWS full pipeline tests as required
2827
# Add full size test data (but still relatively small datasets for few samples)
2928
# on the `test_full.config` test runs with only one set of parameters
3029
with:

CHANGELOG.md

Lines changed: 148 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -3,14 +3,159 @@
33
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/)
44
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
55

6-
## v2.2.1 - [date]
6+
## [v2.2.1](https://github.com/nf-core/pairgenomealign/releases/tag/2.2.1) "C’est quoi ça?" - [August 5th 2025]
77

8-
Initial release of nf-core/pairgenomealign, created with the [nf-core](https://nf-co.re/) template.
8+
### `Fixed`
9+
10+
- Conforms to nf-core template version 3.3.2, hopefully fixing AWS tests ([#85](https://github.com/nf-core/pairgenomealign/pull/85)) ([#83](https://github.com/nf-core/pairgenomealign/pull/83)).
11+
- Added missing pipeline and subworkflow test snapshots and stabilise line order in some output files ([#84](https://github.com/nf-core/pairgenomealign/pull/84)).
12+
- Update modules to latest version, thereby pulling an important fix for a race condition in `last/mafconvert` ([#87](https://github.com/nf-core/pairgenomealign/pull/87)), ([#88](https://github.com/nf-core/pairgenomealign/pull/88)).
13+
- Report `jq` version used in `MULTIQC_ASSEMBLYSCAN_PLOT_DATA` ([#81](https://github.com/nf-core/pairgenomealign/pull/81)).
14+
- Document module names in tube map ([#74](https://github.com/nf-core/pairgenomealign/pull/74)).
15+
- Add mising modules in tube map ([#68](https://github.com/nf-core/pairgenomealign/pull/68)).
16+
- Materialise output files in tube map ([#75](https://github.com/nf-core/pairgenomealign/pull/75)).
17+
18+
### `Dependencies`
19+
20+
| Dependency | Old version | New version |
21+
| ---------- | ----------- | ----------- |
22+
| `MultiQC` | 1.28 | 1.30 |
23+
24+
## [v2.2.0](https://github.com/nf-core/pairgenomealign/releases/tag/2.2.0) "Chagara ponzu" - [May 29th 2025]
25+
26+
### `Added`
27+
28+
- Support for export to BAM and CRAM formats ([#31](https://github.com/nf-core/pairgenomealign/issues/31)) ([#43](https://github.com/nf-core/pairgenomealign/issues/43)).
29+
- SAM/BAM/CRAM alignments files are sorted and their header features all sequences of the _target_ genome.
30+
- Report ungapped percent identity ([#46](https://github.com/nf-core/pairgenomealign/issues/46)).
31+
- Update full-size test genomes to feature more T2T assemblies ([#59](https://github.com/nf-core/pairgenomealign/issues/59)).
32+
- Use a single mulled container for LAST, Samtools and open-fonts, to save ~280 Mb of downloads ([#58](https://github.com/nf-core/pairgenomealign/issues/58)).
33+
- Allow export to multiple formats (comma-separated list) ([#42](https://github.com/nf-core/pairgenomealign/issues/42)).
34+
- Allow skipping of the assembly QC with `--skip_assembly_qc` ([#53](https://github.com/nf-core/pairgenomealign/issues/53)).
35+
36+
### `Dependencies`
37+
38+
| Dependency | Old version | New version |
39+
| ---------------- | ----------- | ----------- |
40+
| `SAMTOOLS_BGZIP` | | 1.21 |
41+
| `SAMTOOLS_DICT` | | 1.21 |
42+
| `SAMTOOLS_FAIDX` | | 1.21 |
43+
44+
### `Parameters`
45+
46+
| Old parameter | New parameter |
47+
| ------------- | -------------------- |
48+
| | `--skip_assembly_qc` |
49+
50+
### `Fixed`
51+
52+
- Remove noisy tag in the `MULTIQC_ASSEMBLYSCAN_PLOT_DATA` local module ([#64](https://github.com/nf-core/pairgenomealign/issues/64)).
53+
- Restore BED format support ([#56](https://github.com/nf-core/pairgenomealign/issues/56)).
54+
- Document the `multiqc_train.txt` and `multiqc_last_o2o.txt` aggregating alignment statistics ([#52](https://github.com/nf-core/pairgenomealign/issues/52)).
55+
- Point the test configs samplesheets to `nf-core/test-datasets` in order to run the AWS full tests ([#62](https://github.com/nf-core/pairgenomealign/issues/62)).
56+
- Update metro map, in white background ([#71](https://github.com/nf-core/pairgenomealign/issues/71)).
57+
- Removed the `last/mafswap` module, which was actually not used.
58+
59+
## [v2.1.0](https://github.com/nf-core/pairgenomealign/releases/tag/2.1.0) "Goya champuru" - [May 16th 2025]
60+
61+
### `Added`
62+
63+
- New `--dotplot_filter` paramater to produce extra alignment plots where small off-diagonal signal is filtered out ([#35](https://github.com/nf-core/pairgenomealign/issues/35)).
64+
- New `--dotplot_width`, `--dotplot_height` and `--dotplot_font_size` parameters to control alignment plot size ([#38](https://github.com/nf-core/pairgenomealign/issues/38)).
65+
66+
### `Fixed`
67+
68+
- In alignment plots, contig names are now written with a nice scalable font instead of being pixellised ([#44](https://github.com/nf-core/pairgenomealign/issues/44)).
69+
- Conforms to nf-core template version 3.2.1 ([#54](https://github.com/nf-core/pairgenomealign/pull/54)).
70+
- Removed some old linting exceptions.
71+
- Removed the `gfastats` modules, which was actually not used.
72+
- Make sure the subworkflows collect all module versions.
73+
- Fix plot IDs for comptatibility with MultiQC 1.28.
74+
75+
### `Parameters`
76+
77+
| Old parameter | New parameter |
78+
| ------------- | --------------------- |
79+
| | `--dotplot_filter` |
80+
| | `--dotplot_font_size` |
81+
| | `--dotplot_height` |
82+
| | `--dotplot_width` |
83+
84+
### `Dependencies`
85+
86+
| Dependency | Old version | New version |
87+
| ---------- | ----------- | ----------- |
88+
| `LAST` | 1608 | 1611 |
89+
| `MultiQC` | 1.27 | 1.28 |
90+
91+
## [v2.0.0](https://github.com/nf-core/pairgenomealign/releases/tag/2.0.0) "Naga imo" - [February 5th, 2025]
92+
93+
### `Breaking changes`
94+
95+
- The LAST software was updated and it has new defaults for some of its
96+
parameters. The alignments ran with this pipeline will not be identical to
97+
the ones from older versions.
998

1099
### `Added`
11100

101+
- The `alignment/lastdb` directory is not output anymore. It consumed space,
102+
is not usually needed for downstream analysis, and can be re-computed
103+
identically if needed.
104+
- The _many-to-one_ alignment file is not output anymore by default, to save
105+
space. To keep this file, you can run the pipeline in `many-to-many` mode
106+
with the `--m2m` parameter.
107+
- The `--seed` parameter allows for all the existing values in the `lastdb`
108+
program.
109+
- Errors caused by absence of alignments at training or plotting steps
110+
are now ignored.
111+
- New parameter `--export_aln_to` that creates additional files containing
112+
the alignments in a different format such as Axt, Chain, GFF or SAM.
113+
12114
### `Fixed`
13115

116+
- Incorrect detection of regions with 10 or more `N`s was corrected ([#18](https://github.com/nf-core/pairgenomealign/issues/18)).
117+
- The `--lastal_params` now works as intended instead of being ignored ([#22](https://github.com/nf-core/pairgenomealign/issues/22)).
118+
- The _workflow summary_ is now properly sorted at the end of the MultiQC report ([#32](https://github.com/nf-core/pairgenomealign/issues/32)).
119+
- Conforms to nf-core template version 3.2.0 ([#40](https://github.com/nf-core/pairgenomealign/pull/40)).
120+
121+
### `Parameters`
122+
123+
| Old parameter | New parameter |
124+
| ------------- | ----------------- |
125+
| | `--export_aln_to` |
126+
14127
### `Dependencies`
15128

16-
### `Deprecated`
129+
| Dependency | Old version | New version |
130+
| ---------- | ----------- | ----------- |
131+
| `LAST` | 1542 | 1608 |
132+
| `MultiQC` | 1.25.1 | 1.27 |
133+
134+
## [v1.1.1](https://github.com/nf-core/pairgenomealign/releases/tag/1.1.1) "Kani nabe" - [December 17th, 2024]
135+
136+
### `Broken`
137+
138+
- In retrospect it was found that this version (only) is not compatible with
139+
Nextflow 25.04 or higher. Please use `v1.1.0` instead if you need the same
140+
functionality and software version numbers.
141+
142+
### `Fixed`
143+
144+
- This release brings the pipeline to the standards of Nextflow 24.10.1 and
145+
nf-core 3.1.0.
146+
147+
## [v1.1.0](https://github.com/nf-core/pairgenomealign/releases/tag/1.1.0) "Nattou maki" - [September 27th, 2024]
148+
149+
### `Added`
150+
151+
- Added a new `softmask` parameter, to optionally keep original softmasking.
152+
153+
### `Parameters`
154+
155+
| Old parameter | New parameter |
156+
| ------------- | ------------- |
157+
| | `--softmask` |
158+
159+
## [v1.0.0](https://github.com/nf-core/pairgenomealign/releases/tag/1.0.0) "Sweet potato" - [August 27th, 2024]
160+
161+
Initial release of nf-core/pairgenomealign, created with the [nf-core](https://nf-co.re/) template.

CITATIONS.md

Lines changed: 19 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -8,15 +8,31 @@
88

99
> Di Tommaso P, Chatzou M, Floden EW, Barja PP, Palumbo E, Notredame C. Nextflow enables reproducible computational workflows. Nat Biotechnol. 2017 Apr 11;35(4):316-319. doi: 10.1038/nbt.3820. PubMed PMID: 28398311.
1010
11+
## Pipeline design
12+
13+
> Charles Plessy, Michael J. Mansfield, Aleksandra Bliznina, Aki Masunaga, Charlotte West, Yongkai Tan, Andrew W. Liu, Jan Grašič, María Sara del Río Pisula, Gaspar Sánchez-Serna, Marc Fabrega-Torrus, Alfonso Ferrández-Roldán, Vittoria Roncalli, Pavla Navratilova, Eric M. Thompson, Takeshi Onuma, Hiroki Nishida, Cristian Cañestro, Nicholas M. Luscombe. Extreme genome scrambling in marine planktonic Oikopleura dioica cryptic species. Genome Res. 2024. 34: 426-440; doi: [10.1101/2023.05.09.539028](https://doi.org/10.1101/gr.278295.123). PubMed ID: [38621828](https://pubmed.ncbi.nlm.nih.gov/38621828/)
14+
1115
## Pipeline tools
1216

13-
- [FastQC](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/)
17+
- [LAST](https://gitlab.com/mcfrith/last/)
18+
19+
> Kiełbasa SM, Wan R, Sato K, Horton P, Frith MC. Adaptive seeds tame genomic sequence comparison. Genome Res. 2011 21(3):487-93. doi: 10.1101/gr.113985.110. PubMed PMID: 21209072 (This describes the main algorithms used by LAST.)
20+
21+
> Frith MC, Noé L. Improved search heuristics find 20,000 new alignments between human and mouse genomes. doi: 10.1093/nar/gku104 PubMed PMID: 24493737 (This describes sensitive DNA seeding (MAM8 and MAM4)
22+
23+
> Frith MC, Kawaguchi R. Split-alignment of genomes finds orthologies more accurately. Genome Biology. 2015 16:106. doi: 10.1186/s13059-015-0670-9 PubMed PMID: 25994148 (Describes the split alignment algorithm, and its application to whole genome alignment.)
24+
25+
> Hamada M, Ono Y, Asai K Frith MC. Training alignment parameters for arbitrary sequencers with LAST-TRAIN. Bioinformatics. 2017 33(6):926-928. doi: 10.1093/bioinformatics/btw742 PubMed PMID: 28039163 (Describes last-train.)
26+
27+
> Frith MC, Shaw J, Spouge JL. How to optimally sample a sequence for rapid analysis. doi: 10.1093/bioinformatics/btad057 PubMed PMID: 36702468 (Describes the lastdb -u RY sparsity options.)
28+
29+
- [SAMtools](https://pubmed.ncbi.nlm.nih.gov/19505943/)
1430

15-
> Andrews, S. (2010). FastQC: A Quality Control Tool for High Throughput Sequence Data [Online].
31+
> Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R; 1000 Genome Project Data Processing Subgroup. The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009 Aug 15;25(16):2078-9. doi: 10.1093/bioinformatics/btp352. Epub 2009 Jun 8. PubMed PMID: 19505943; PubMed Central PMCID: PMC2723002.
1632
1733
- [MultiQC](https://pubmed.ncbi.nlm.nih.gov/27312411/)
1834

19-
> Ewels P, Magnusson M, Lundin S, Käller M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics. 2016 Oct 1;32(19):3047-8. doi: 10.1093/bioinformatics/btw354. Epub 2016 Jun 16. PubMed PMID: 27312411; PubMed Central PMCID: PMC5039924.
35+
> Ewels P, Magnusson M, Lundin S, Käller M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics. 2016 Oct 1;32(19):3047-8. doi: 10.1093/bioinformatics/btw354. Epub 2016 Jun 16. PubMed PMID: 27312411; PubMed Central PMCID: PMC5039924.
2036
2137
## Software packaging/containerisation tools
2238

README.md

Lines changed: 33 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77

88
[![Open in GitHub Codespaces](https://github.com/codespaces/badge.svg)](https://github.com/codespaces/new/nf-core/pairgenomealign)
99
[![GitHub Actions CI Status](https://github.com/nf-core/pairgenomealign/actions/workflows/nf-test.yml/badge.svg)](https://github.com/nf-core/pairgenomealign/actions/workflows/nf-test.yml)
10-
[![GitHub Actions Linting Status](https://github.com/nf-core/pairgenomealign/actions/workflows/linting.yml/badge.svg)](https://github.com/nf-core/pairgenomealign/actions/workflows/linting.yml)[![AWS CI](https://img.shields.io/badge/CI%20tests-full%20size-FF9900?labelColor=000000&logo=Amazon%20AWS)](https://nf-co.re/pairgenomealign/results)[![Cite with Zenodo](http://img.shields.io/badge/DOI-10.5281/zenodo.XXXXXXX-1073c8?labelColor=000000)](https://doi.org/10.5281/zenodo.XXXXXXX)
10+
[![GitHub Actions Linting Status](https://github.com/nf-core/pairgenomealign/actions/workflows/linting.yml/badge.svg)](https://github.com/nf-core/pairgenomealign/actions/workflows/linting.yml)[![AWS CI](https://img.shields.io/badge/CI%20tests-full%20size-FF9900?labelColor=000000&logo=Amazon%20AWS)](https://nf-co.re/pairgenomealign/results)[![Cite with Zenodo](http://img.shields.io/badge/DOI-10.5281/zenodo.13910535-1073c8?labelColor=000000)](https://doi.org/10.5281/zenodo.13910535)
1111
[![nf-test](https://img.shields.io/badge/unit_tests-nf--test-337ab7.svg)](https://www.nf-test.com)
1212

1313
[![Nextflow](https://img.shields.io/badge/version-%E2%89%A525.04.0-green?style=flat&logo=nextflow&logoColor=white&color=%230DC09D&link=https%3A%2F%2Fnextflow.io)](https://www.nextflow.io/)
@@ -21,46 +21,45 @@
2121

2222
## Introduction
2323

24-
**nf-core/pairgenomealign** is a bioinformatics pipeline that ...
24+
**nf-core/pairgenomealign** is a bioinformatics pipeline that aligns one or more _query_ genomes to a _target_ genome, and plots pairwise representations.
2525

26-
<!-- TODO nf-core:
27-
Complete this sentence with a 2-3 sentence summary of what types of data the pipeline ingests, a brief overview of the
28-
major pipeline sections and the types of output it produces. You're giving an overview to someone new
29-
to nf-core here, in 15-20 seconds. For an example, see https://github.com/nf-core/rnaseq/blob/master/README.md#introduction
30-
-->
26+
![Tubemap workflow summary](docs/images/pairgenomealign-tubemap.png "Tubemap workflow summary")
3127

32-
<!-- TODO nf-core: Include a figure that guides the user through the major workflow steps. Many nf-core
33-
workflows use the "tube map" design for that. See https://nf-co.re/docs/guidelines/graphic_design/workflow_diagrams#examples for examples. -->
34-
<!-- TODO nf-core: Fill in short bullet-pointed list of the default steps in the pipeline -->1. Read QC ([`FastQC`](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/))2. Present QC for raw reads ([`MultiQC`](http://multiqc.info/))
28+
The main steps of the pipeline are:
29+
30+
1. Genome QC ([`assembly-scan`](https://github.com/rpetit3/assembly-scan)).
31+
2. Genome indexing ([`lastdb`](https://gitlab.com/mcfrith/last/-/blob/main/doc/lastdb.rst)).
32+
3. Genome pairwise alignments ([`lastal`](https://gitlab.com/mcfrith/last/-/blob/main/doc/lastal.rst)).
33+
4. Alignment plotting ([`last-dotplot`](https://gitlab.com/mcfrith/last/-/blob/main/doc/last-dotplot.rst)).
34+
5. Alignment export to various formats with [`maf-convert`](https://gitlab.com/mcfrith/last/-/blob/main/doc/maf-convert.rst), plus [`Samtools`](https://www.htslib.org/) for SAM/BAM/CRAM.
35+
36+
The pipeline can generate four kinds of outputs, called _many-to-many_, _many-to-one_, _one-to-many_ and _one-to-one_, depending on whether sequences of one genome are allowed match the other genome multiple times or not.
37+
38+
These alignments are output in [MAF](https://genome.ucsc.edu/FAQ/FAQformat.html#format5) format, and optional line plot representations are output in PNG format.
3539

3640
## Usage
3741

3842
> [!NOTE]
3943
> If you are new to Nextflow and nf-core, please refer to [this page](https://nf-co.re/docs/usage/installation) on how to set-up Nextflow. Make sure to [test your setup](https://nf-co.re/docs/usage/introduction#how-to-run-a-pipeline) with `-profile test` before running the workflow on actual data.
4044
41-
<!-- TODO nf-core: Describe the minimum required steps to execute the pipeline, e.g. how to prepare samplesheets.
42-
Explain what rows and columns represent. For instance (please edit as appropriate):
43-
4445
First, prepare a samplesheet with your input data that looks as follows:
4546

4647
`samplesheet.csv`:
4748

4849
```csv
49-
sample,fastq_1,fastq_2
50-
CONTROL_REP1,AEG588A1_S1_L002_R1_001.fastq.gz,AEG588A1_S1_L002_R2_001.fastq.gz
50+
sample,fasta
51+
query_1,path-to-query-genome-file-one.fasta
52+
query_2,path-to-query-genome-file-two.fasta
5153
```
5254

53-
Each row represents a fastq file (single-end) or a pair of fastq files (paired end).
54-
55-
-->
55+
Each row represents a fasta file, this can also contain multiple rows to accomodate multiple query genomes in fasta format.
5656

5757
Now, you can run the pipeline using:
5858

59-
<!-- TODO nf-core: update the following command to include all required parameters for a minimal example -->
60-
6159
```bash
6260
nextflow run nf-core/pairgenomealign \
6361
-profile <docker/singularity/.../institute> \
62+
--target sequencefile.fa \
6463
--input samplesheet.csv \
6564
--outdir <OUTDIR>
6665
```
@@ -78,11 +77,15 @@ For more details about the output files and reports, please refer to the
7877

7978
## Credits
8079

81-
nf-core/pairgenomealign was originally written by charles-plessy.
80+
`nf-core/pairgenomealign` was originally written by [charles-plessy](https://github.com/charles-plessy); the original versions are available at <https://github.com/oist/plessy_pairwiseGenomeComparison>.
8281

8382
We thank the following people for their extensive assistance in the development of this pipeline:
8483

85-
<!-- TODO nf-core: If applicable, make list of people who have also contributed -->
84+
- [Mahdi Mohammed](https://github.com/U13bs1125) ported the original pipeline to _nf-core_ template 2.14.x.
85+
- [Martin Frith](https://github.com/mcfrith/), the author of LAST, gave us extensive feedback and advices.
86+
- [Michael Mansfield](https://github.com/mjmansfi) tested the pipeline and provided critical comments.
87+
- [Aleksandra Bliznina](https://github.com/aleksandrabliznina) contributed to the creation of the initial `last/*` modules.
88+
- [Jiashun Miao](https://github.com/miaojiashun) and [Huyen Pham](https://github.com/ngochuyenpham) tested the pipeline on vertebrate genomes.
8689

8790
## Contributions and Support
8891

@@ -92,10 +95,15 @@ For further information or help, don't hesitate to get in touch on the [Slack `#
9295

9396
## Citations
9497

95-
<!-- TODO nf-core: Add citation for pipeline after first release. Uncomment lines below and update Zenodo doi and badge at the top of this file. -->
96-
<!-- If you use nf-core/pairgenomealign for your analysis, please cite it using the following doi: [10.5281/zenodo.XXXXXX](https://doi.org/10.5281/zenodo.XXXXXX) -->
98+
If you use this pipeline, please cite:
99+
100+
> **Extreme genome scrambling in marine planktonic Oikopleura dioica cryptic species.**
101+
> Charles Plessy, Michael J. Mansfield, Aleksandra Bliznina, Aki Masunaga, Charlotte West, Yongkai Tan, Andrew W. Liu, Jan Grašič, María Sara del Río Pisula, Gaspar Sánchez-Serna, Marc Fabrega-Torrus, Alfonso Ferrández-Roldán, Vittoria Roncalli, Pavla Navratilova, Eric M. Thompson, Takeshi Onuma, Hiroki Nishida, Cristian Cañestro, Nicholas M. Luscombe.
102+
> _Genome Res._ 2024. 34: 426-440; doi: [10.1101/2023.05.09.539028](https://doi.org/10.1101/gr.278295.123). PubMed ID: [38621828](https://pubmed.ncbi.nlm.nih.gov/38621828/)
103+
104+
[OIST research news article](https://www.oist.jp/news-center/news/2024/4/25/oikopleura-who-species-identity-crisis-genome-community)
97105

98-
<!-- TODO nf-core: Add bibliography of tools and data used in your pipeline -->
106+
And also please cite the [LAST papers](https://gitlab.com/mcfrith/last/-/blob/main/doc/last-papers.rst).
99107

100108
An extensive list of references for the tools used by the pipeline can be found in the [`CITATIONS.md`](CITATIONS.md) file.
101109

assets/methods_description_template.yml

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,6 @@ description: "Suggested text and references to use when describing pipeline usag
33
section_name: "nf-core/pairgenomealign Methods Description"
44
section_href: "https://github.com/nf-core/pairgenomealign"
55
plot_type: "html"
6-
## TODO nf-core: Update the HTML below to your preferred methods description, e.g. add publication citation for this pipeline
76
## You inject any metadata in the Nextflow '${workflow}' object
87
data: |
98
<h4>Methods</h4>

0 commit comments

Comments
 (0)