Conversation
This ADR proposes interface modules - abstract module definitions that
declare input/output contracts without implementation. Concrete modules
can implement these interfaces, and users select the target implementation
at runtime via configuration.
Key features:
- type: interface field in meta.yaml for abstract contracts
- implements: field for declaring interface compliance
- Resolution via modules.interfaces {} block in nextflow.config
- Support for static, parameter-based, and per-sample binding
Related: nextflow-io/schemas#11
Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>
✅ Deploy Preview for nextflow-docs-staging canceled.
|
|
It would be good to explore existing approaches and whether they are "good enough". For example:
|
|
in https://github.com/nf-core/deepmodeloptim our way of doing this is quite hacky, very much looking forward to try the new way! |
|
@mathysgrapotte can you elaborate? we like to highlight current approaches in the ADR so that the current downsides are clear |
|
@bentsherman our current only way is to modify the pipeline to get support (for example, add a preprocessing step in nextflow before deepmodeloptim) OR add a python function in the stimulus-py (the package used by deepmodeloptim) so that it can be integrated. So we actually do not have much of a solution to the problem. I know that nf-core/differentialabundance is moving towards a version 2.0 where they are passing tool params in a config with the data to determine which tool is being ran with which params, and nf-core/multiplesequencealign already uses a similar paradigm. |
|
Hi! To do so in |
|
Hi! I am happy to see that this idea is of interest :) multiplesequencealign by @luisas is using subworkflows with all the interchangeable modules, the input channels contain the tool that must be run, and are branched (see example). This allows running multiple tools in parallel. I have been working on a way to automate adding additional tools. As part of this, I have a POC for multiplesequence align, which uses a cleaner approach. For example, this is the same subworkflow I copied before, but in a cleaner format. Regarding differentialabundance @suzannejin is contributing to, they are working on a way to code the pipeline using subworkflows in a similar approach, and allowing the selection of parameters through a config file. Suzanne can provide links to the relevant PRs and discussions :) I will be happy to expand on any of this if needed! Overall, we found a good solution using nf-core as a base, but it would be really nice if Nextflow provided a way of running several modules that belong to an interface without the need to update the pipeline code, detect the relevant nf-core modules, and make them available at run time. |
|
Thanks everyone for the use cases. Interface modules are just one possible solution, which is why I'm trying to focus on the use case and not get too attached to any particular solution. With #6650 we are planning to add a # local module
nextflow module run ./modules/nf-core/tcoffee/align --fasta ... --tree ...
# remote module (download and run on-the-fly)
nextflow module run nf-core/tcoffee/align --fasta ... --tree ...It can print the process outputs as a JSON, allowing to you retrieve them programmatically. So I think you could build a benchmarking or optimization pipeline by wrapping this command in a generic process that can run arbitrary modules with arbitrary inputs: process MODULE_RUN {
input:
tuple val(module), val(params)
output:
tuple val(module), path('output.json')
script:
"""
cat << EOF > params.json
$params
EOF
nextflow -q module run $module -params-file params.json > output.json
"""
}
workflow {
ch_modules = channel.of(
tuple('nf-core/clustalo/align', params.aligners['clustalo']),
tuple('nf-core/famsa/align', params.aligners['famsa']),
tuple('nf-core/kalign/align', params.aligners['kalign']),
tuple('nf-core/learnmsa/align', params.aligners['learnmsa']),
tuple('nf-core/magus/align', params.aligners['magus']),
tuple('nf-core/muscle5/super5', params.aligners['muscle5']),
tuple('nf-core/tcoffee/align', params.aligners['tcoffee']),
)
ch_results = MODULE_RUN( ch_modules )
// ...
}This is a minimal example, but hopefully it's clear how you could parameterize each module for benchmarking / optimization. There is also the idea of detecting and running all modules with a common interface, such as all "aligner" modules. For this you would need to query the module registry -- we would need to provide some way to query for modules that satisfy an input/output spec -- and then you could pass the results directly to this That could be where the "module interface" fits in, as a way to define an input/output spec as an entity that other modules can implement. It would primarily be a registry concept. Anyway, I know it's a rough sketch, but let me know if this seems like a viable solution. |
|
Hey @bentsherman, thanks a lot for the work!
I think this is very cool especially for alternative modules with optional input/output (see example subworkflow, with modules 1 and 2). In this example subworkflow, with this approach, one can freely group various module sets based on input/output overlap. However, often the differences in optional input/output for these modules happened arbitrarily because they were developed by different people, etc. The advantage of having a defined interface is it will guide developers to provide modules under standardized interfaces. |
|
Indeed. The central point of this feature is to make it possible to define an interface declaration and register then through in a repository (nextflow registry). This will allow pulling any compatible module without hardcoding in the pipeline code |
d9fa5cd to
d752bc2
Compare
Summary
This ADR proposes interface modules - abstract module definitions that declare input/output contracts without implementation. Concrete modules can implement these interfaces, and users select the target implementation at runtime via configuration.
This enables:
Key Design Decisions
meta.yamlfieldstype: interfaceinmeta.yamlmarks a module as an abstract contractimplements: <interface>declares that a module complies with an interfacemodules.interfaces {}block innextflow.configExample
Related
🤖 Generated with Claude Code