Conversation
vagkaratzas
left a comment
There was a problem hiding this comment.
Minor suggestions --do not see anything inherently wrong with the subworkflow. If possible, I would just create an additional nf-test with multiple input samples (even copies of the same input data with different meta) to make sure the joins and mixes towards the end of the subworkflow work properly.
| PATHOFACT2_INTEGRATOR(ch_for_integrator) | ||
|
|
||
| emit: | ||
| gff = PATHOFACT2_INTEGRATOR.out.gff // channel: tuple( val(meta), path(gff) ) |
There was a problem hiding this comment.
make sure that PATHOFACT2_INTEGRATOR will always run/produce a gff, even if empty. Else you will need to initialize an empty ch_gff at the top, assign the result of PATHOFACT2_INTEGRATOR.out.gff, and use that while emiting
There was a problem hiding this comment.
I will add a test with no prediction to see how it goes. The integrator won't generate an output if there's no prediction. @mberacochea suggested me on another PR that modules should't generate empty files
There was a problem hiding this comment.
Then you might need to initialize that ch_gff channel, instead of using directly PATHOFACT2_INTEGRATOR.out.gff during emit.
There was a problem hiding this comment.
yeah, empty files float the file system IMO and make it hard to figure out if a tool worked or not.
You can do PATHOFACT2_INTEGRATOR.out.gff.ifEmpty([]) I think
There was a problem hiding this comment.
I added a copule of lines to handle the empty output
Co-authored-by: Evangelos Karatzas <32259775+vagkaratzas@users.noreply.github.com>
Co-authored-by: Evangelos Karatzas <32259775+vagkaratzas@users.noreply.github.com>
Co-authored-by: Evangelos Karatzas <32259775+vagkaratzas@users.noreply.github.com>
Co-authored-by: Evangelos Karatzas <32259775+vagkaratzas@users.noreply.github.com>
mberacochea
left a comment
There was a problem hiding this comment.
I gave it a quick read.. it looks pretty nice @Ales-ibt 🎖️
| PATHOFACT2_INTEGRATOR(ch_for_integrator) | ||
|
|
||
| emit: | ||
| gff = PATHOFACT2_INTEGRATOR.out.gff // channel: tuple( val(meta), path(gff) ) |
There was a problem hiding this comment.
yeah, empty files float the file system IMO and make it hard to figure out if a tool worked or not.
You can do PATHOFACT2_INTEGRATOR.out.gff.ifEmpty([]) I think
|
From @lfdelzam:
The output of this subworkflow is a GFF that will be integrated with the MGE prediction output in the mobilome annotation pipeline (MAP). I believe we have signal P prediction with InterProscan, but if not, I can add the module to the MAP. We handle BGC prediction in MGnify in another pipeline called mettannotator, but I still have a pending task to create a BGC annotator subworkflow that generates a consensus annotation from three predictors: Gecco, Antismash, and Sanntis (the latter developed within the group). As for ARG, the subworkflow is ready; it just needs to be added to the MAP. Once I have the subwfs set up in the MAP, I'll generate the outputs with the genomes you shared and send you the results for comparison. |
mberacochea
left a comment
There was a problem hiding this comment.
I left two tiny comments... formatting.
Looks good to me 🥇
Co-authored-by: Martín Beracochea <mbc@ebi.ac.uk>
Co-authored-by: Martín Beracochea <mbc@ebi.ac.uk>
This is the final piece that brings all the PathoFact modules together into a single subworkflow 🎉
I’ve attached a diagram to help illustrate the overall logic:
pathofact2_subwf.pdf
The InterProScan (IPS) TSV is an optional input. It’s particularly relevant for MGnify, since generating this file is already part of our analysis pipelines. When no IPS file is provided, CDD annotations are generated instead.
For this PR, I’d need a review approval from @ortisjulia or @lfdelzam, just to make sure we’re all aligned.
Note: linting is still failing for subworkflows that depend on nf-core modules — this is expected for now.
Thanks a lot for reviewing! 🙌