Skip to content

Feature/pathofact2 subwf#141

Merged
Ales-ibt merged 10 commits intomainfrom
feature/pathofact2_subwf
Feb 5, 2026
Merged

Feature/pathofact2 subwf#141
Ales-ibt merged 10 commits intomainfrom
feature/pathofact2_subwf

Conversation

@Ales-ibt
Copy link
Contributor

@Ales-ibt Ales-ibt commented Feb 3, 2026

This is the final piece that brings all the PathoFact modules together into a single subworkflow 🎉
I’ve attached a diagram to help illustrate the overall logic:
pathofact2_subwf.pdf

The InterProScan (IPS) TSV is an optional input. It’s particularly relevant for MGnify, since generating this file is already part of our analysis pipelines. When no IPS file is provided, CDD annotations are generated instead.

For this PR, I’d need a review approval from @ortisjulia or @lfdelzam, just to make sure we’re all aligned.

Note: linting is still failing for subworkflows that depend on nf-core modules — this is expected for now.

Thanks a lot for reviewing! 🙌

Copy link
Contributor

@vagkaratzas vagkaratzas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor suggestions --do not see anything inherently wrong with the subworkflow. If possible, I would just create an additional nf-test with multiple input samples (even copies of the same input data with different meta) to make sure the joins and mixes towards the end of the subworkflow work properly.

PATHOFACT2_INTEGRATOR(ch_for_integrator)

emit:
gff = PATHOFACT2_INTEGRATOR.out.gff // channel: tuple( val(meta), path(gff) )
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make sure that PATHOFACT2_INTEGRATOR will always run/produce a gff, even if empty. Else you will need to initialize an empty ch_gff at the top, assign the result of PATHOFACT2_INTEGRATOR.out.gff, and use that while emiting

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will add a test with no prediction to see how it goes. The integrator won't generate an output if there's no prediction. @mberacochea suggested me on another PR that modules should't generate empty files

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then you might need to initialize that ch_gff channel, instead of using directly PATHOFACT2_INTEGRATOR.out.gff during emit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, empty files float the file system IMO and make it hard to figure out if a tool worked or not.
You can do PATHOFACT2_INTEGRATOR.out.gff.ifEmpty([]) I think

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a copule of lines to handle the empty output

Ales-ibt and others added 5 commits February 4, 2026 09:46
Co-authored-by: Evangelos Karatzas <32259775+vagkaratzas@users.noreply.github.com>
Co-authored-by: Evangelos Karatzas <32259775+vagkaratzas@users.noreply.github.com>
Co-authored-by: Evangelos Karatzas <32259775+vagkaratzas@users.noreply.github.com>
Co-authored-by: Evangelos Karatzas <32259775+vagkaratzas@users.noreply.github.com>
Copy link
Member

@mberacochea mberacochea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I gave it a quick read.. it looks pretty nice @Ales-ibt 🎖️

PATHOFACT2_INTEGRATOR(ch_for_integrator)

emit:
gff = PATHOFACT2_INTEGRATOR.out.gff // channel: tuple( val(meta), path(gff) )
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, empty files float the file system IMO and make it hard to figure out if a tool worked or not.
You can do PATHOFACT2_INTEGRATOR.out.gff.ifEmpty([]) I think

@Ales-ibt Ales-ibt requested a review from mberacochea February 4, 2026 13:09
@Ales-ibt
Copy link
Contributor Author

Ales-ibt commented Feb 5, 2026

From @lfdelzam:

I saw the PathoFact2 subworkflow diagram and it looks good, but it's important to remember that the most important aspect of PathoFact2, more than predicting Virulence Factors and Toxin-related proteins (not just Toxins), is the ability to see everything in context. This means including BGCs (biosynthetic Gene Clusters), MGEs (Mobile Genetic Elements), signal peptides, and ARGs (Antimicrobial Resistance Genes) alongside VF and TOX predictions. Since you're putting everything in a .gff file, these factors should also be included. Therefore, a gene can be VF, TOX, ARG, part of BGC, or present on a plasmid or with signal peptide, or in combinations (VF-TOX, VF-ARG, TOX-BGC).
Here is a file, Genomes_for_PathoFact2_Test_Nextflow.tar 1.gz, containing genomes we used to test PathoFact2; some are known pathogens, others are known non-pathogens. Here you can see the importance of putting everything in context, since many non-pathogens exhibit viral pathogens or toxins (which is known). Therefore, a simple prediction without the genomic context is not very informative.
It would be interesting to see the output you get with these genomes so we can compare it with our output. Does that sound good?

The output of this subworkflow is a GFF that will be integrated with the MGE prediction output in the mobilome annotation pipeline (MAP). I believe we have signal P prediction with InterProscan, but if not, I can add the module to the MAP. We handle BGC prediction in MGnify in another pipeline called mettannotator, but I still have a pending task to create a BGC annotator subworkflow that generates a consensus annotation from three predictors: Gecco, Antismash, and Sanntis (the latter developed within the group). As for ARG, the subworkflow is ready; it just needs to be added to the MAP.

Once I have the subwfs set up in the MAP, I'll generate the outputs with the genomes you shared and send you the results for comparison.

Copy link
Member

@mberacochea mberacochea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left two tiny comments... formatting.

Looks good to me 🥇

Ales-ibt and others added 2 commits February 5, 2026 12:36
Co-authored-by: Martín Beracochea <mbc@ebi.ac.uk>
Co-authored-by: Martín Beracochea <mbc@ebi.ac.uk>
@Ales-ibt Ales-ibt merged commit b3e8397 into main Feb 5, 2026
7 checks passed
@Ales-ibt Ales-ibt deleted the feature/pathofact2_subwf branch February 5, 2026 14:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants