Feature/pathofact2 subwf by Ales-ibt · Pull Request #141 · EBI-Metagenomics/nf-modules

Ales-ibt · 2026-02-03T12:41:24Z

This is the final piece that brings all the PathoFact modules together into a single subworkflow 🎉
I’ve attached a diagram to help illustrate the overall logic:
pathofact2_subwf.pdf

The InterProScan (IPS) TSV is an optional input. It’s particularly relevant for MGnify, since generating this file is already part of our analysis pipelines. When no IPS file is provided, CDD annotations are generated instead.

For this PR, I’d need a review approval from @ortisjulia or @lfdelzam, just to make sure we’re all aligned.

Note: linting is still failing for subworkflows that depend on nf-core modules — this is expected for now.

Thanks a lot for reviewing! 🙌

vagkaratzas

Minor suggestions --do not see anything inherently wrong with the subworkflow. If possible, I would just create an additional nf-test with multiple input samples (even copies of the same input data with different meta) to make sure the joins and mixes towards the end of the subworkflow work properly.

subworkflows/ebi-metagenomics/pathofact2/main.nf

vagkaratzas · 2026-02-03T16:18:55Z

subworkflows/ebi-metagenomics/pathofact2/main.nf

+    PATHOFACT2_INTEGRATOR(ch_for_integrator)
+
+    emit:
+    gff  =  PATHOFACT2_INTEGRATOR.out.gff         // channel: tuple( val(meta), path(gff) )


make sure that PATHOFACT2_INTEGRATOR will always run/produce a gff, even if empty. Else you will need to initialize an empty ch_gff at the top, assign the result of PATHOFACT2_INTEGRATOR.out.gff, and use that while emiting

I will add a test with no prediction to see how it goes. The integrator won't generate an output if there's no prediction. @mberacochea suggested me on another PR that modules should't generate empty files

Then you might need to initialize that ch_gff channel, instead of using directly PATHOFACT2_INTEGRATOR.out.gff during emit.

yeah, empty files float the file system IMO and make it hard to figure out if a tool worked or not.
You can do PATHOFACT2_INTEGRATOR.out.gff.ifEmpty([]) I think

I added a copule of lines to handle the empty output

subworkflows/ebi-metagenomics/pathofact2/main.nf

subworkflows/ebi-metagenomics/pathofact2/tests/main.nf.test

Co-authored-by: Evangelos Karatzas <32259775+vagkaratzas@users.noreply.github.com>

mberacochea

I gave it a quick read.. it looks pretty nice @Ales-ibt 🎖️

subworkflows/ebi-metagenomics/pathofact2/main.nf

mberacochea · 2026-02-04T11:59:48Z

subworkflows/ebi-metagenomics/pathofact2/main.nf

+    PATHOFACT2_INTEGRATOR(ch_for_integrator)
+
+    emit:
+    gff  =  PATHOFACT2_INTEGRATOR.out.gff         // channel: tuple( val(meta), path(gff) )


yeah, empty files float the file system IMO and make it hard to figure out if a tool worked or not.
You can do PATHOFACT2_INTEGRATOR.out.gff.ifEmpty([]) I think

Ales-ibt · 2026-02-05T10:01:26Z

From @lfdelzam:

I saw the PathoFact2 subworkflow diagram and it looks good, but it's important to remember that the most important aspect of PathoFact2, more than predicting Virulence Factors and Toxin-related proteins (not just Toxins), is the ability to see everything in context. This means including BGCs (biosynthetic Gene Clusters), MGEs (Mobile Genetic Elements), signal peptides, and ARGs (Antimicrobial Resistance Genes) alongside VF and TOX predictions. Since you're putting everything in a .gff file, these factors should also be included. Therefore, a gene can be VF, TOX, ARG, part of BGC, or present on a plasmid or with signal peptide, or in combinations (VF-TOX, VF-ARG, TOX-BGC).
Here is a file, Genomes_for_PathoFact2_Test_Nextflow.tar 1.gz, containing genomes we used to test PathoFact2; some are known pathogens, others are known non-pathogens. Here you can see the importance of putting everything in context, since many non-pathogens exhibit viral pathogens or toxins (which is known). Therefore, a simple prediction without the genomic context is not very informative.
It would be interesting to see the output you get with these genomes so we can compare it with our output. Does that sound good?

The output of this subworkflow is a GFF that will be integrated with the MGE prediction output in the mobilome annotation pipeline (MAP). I believe we have signal P prediction with InterProscan, but if not, I can add the module to the MAP. We handle BGC prediction in MGnify in another pipeline called mettannotator, but I still have a pending task to create a BGC annotator subworkflow that generates a consensus annotation from three predictors: Gecco, Antismash, and Sanntis (the latter developed within the group). As for ARG, the subworkflow is ready; it just needs to be added to the MAP.

Once I have the subwfs set up in the MAP, I'll generate the outputs with the genomes you shared and send you the results for comparison.

subworkflows/ebi-metagenomics/pathofact2/main.nf

mberacochea

I left two tiny comments... formatting.

Looks good to me 🥇

subworkflows/ebi-metagenomics/pathofact2/meta.yml

Co-authored-by: Martín Beracochea <mbc@ebi.ac.uk>

Ales-ibt added 2 commits February 3, 2026 09:22

Fix subworkflow tests and pre-commit

bd17d60

Debugging when no ips file

b1495cb

Ales-ibt requested review from lfdelzam, mberacochea, ortisjulia and vagkaratzas February 3, 2026 12:41

vagkaratzas approved these changes Feb 3, 2026

View reviewed changes

Ales-ibt and others added 5 commits February 4, 2026 09:46

Update subworkflows/ebi-metagenomics/pathofact2/main.nf

b254ba3

Co-authored-by: Evangelos Karatzas <32259775+vagkaratzas@users.noreply.github.com>

Update subworkflows/ebi-metagenomics/pathofact2/main.nf

a61d0dd

Co-authored-by: Evangelos Karatzas <32259775+vagkaratzas@users.noreply.github.com>

Update subworkflows/ebi-metagenomics/pathofact2/tests/main.nf.test

f2e0ae8

Co-authored-by: Evangelos Karatzas <32259775+vagkaratzas@users.noreply.github.com>

Update subworkflows/ebi-metagenomics/pathofact2/tests/main.nf.test

3820f82

Co-authored-by: Evangelos Karatzas <32259775+vagkaratzas@users.noreply.github.com>

Adding negative test

f0dde08

mberacochea requested changes Feb 4, 2026

View reviewed changes

Move hardcoded values to params and improve empty outputs handling

3946ea0

Ales-ibt requested a review from mberacochea February 4, 2026 13:09

mberacochea reviewed Feb 5, 2026

View reviewed changes

subworkflows/ebi-metagenomics/pathofact2/main.nf Outdated Show resolved Hide resolved

mberacochea approved these changes Feb 5, 2026

View reviewed changes

subworkflows/ebi-metagenomics/pathofact2/meta.yml Outdated Show resolved Hide resolved

Ales-ibt and others added 2 commits February 5, 2026 12:36

Update subworkflows/ebi-metagenomics/pathofact2/meta.yml

bc46a38

Co-authored-by: Martín Beracochea <mbc@ebi.ac.uk>

Update subworkflows/ebi-metagenomics/pathofact2/main.nf

a49e612

Co-authored-by: Martín Beracochea <mbc@ebi.ac.uk>

Ales-ibt merged commit b3e8397 into main Feb 5, 2026
7 checks passed

Ales-ibt deleted the feature/pathofact2_subwf branch February 5, 2026 14:06

Conversation

Ales-ibt commented Feb 3, 2026

Uh oh!

vagkaratzas left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

vagkaratzas Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

Ales-ibt Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

vagkaratzas Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

mberacochea Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

Ales-ibt Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mberacochea left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mberacochea Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

Ales-ibt commented Feb 5, 2026

Uh oh!

Uh oh!

mberacochea left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants