Skip to content

ADR: interface modules#6737

Open
pditommaso wants to merge 1 commit intomasterfrom
interface-modules-adr
Open

ADR: interface modules#6737
pditommaso wants to merge 1 commit intomasterfrom
interface-modules-adr

Conversation

@pditommaso
Copy link
Member

Summary

This ADR proposes interface modules - abstract module definitions that declare input/output contracts without implementation. Concrete modules can implement these interfaces, and users select the target implementation at runtime via configuration.

This enables:

  • Tool benchmarking (run same data through multiple tools)
  • User customization (swap implementations without modifying workflow code)
  • LLM-assisted pipeline construction (semantic discovery of interchangeable tools)

Key Design Decisions

  • Minimal syntax impact: No new DSL keywords; uses meta.yaml fields
  • type: interface in meta.yaml marks a module as an abstract contract
  • implements: <interface> declares that a module complies with an interface
  • Resolution via config: modules.interfaces {} block in nextflow.config
  • Three resolution strategies: static, parameter-based, and per-sample closure

Example

# Interface module meta.yaml
name: nf-core/msa-alignment
type: interface
input: [...]
output: [...]
# Implementation module meta.yaml
name: nf-core/clustalo-align
implements: nf-core/msa-alignment@>=1.0.0
// nextflow.config
modules {
    interfaces {
        'nf-core/msa-alignment' = params.aligner
    }
}

Related

🤖 Generated with Claude Code

This ADR proposes interface modules - abstract module definitions that
declare input/output contracts without implementation. Concrete modules
can implement these interfaces, and users select the target implementation
at runtime via configuration.

Key features:
- type: interface field in meta.yaml for abstract contracts
- implements: field for declaring interface compliance
- Resolution via modules.interfaces {} block in nextflow.config
- Support for static, parameter-based, and per-sample binding

Related: nextflow-io/schemas#11

Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>
@netlify
Copy link

netlify bot commented Jan 20, 2026

Deploy Preview for nextflow-docs-staging canceled.

Name Link
🔨 Latest commit 403eac3
🔍 Latest deploy log https://app.netlify.com/projects/nextflow-docs-staging/deploys/696f66e67674080008658680

@bentsherman
Copy link
Member

It would be good to explore existing approaches and whether they are "good enough". For example:

  1. An agentic workflow that runs each module and compares results, given a benchmarking spec
  2. A Nextflow pipeline that wraps nextflow module run as a process (similar to https://github.com/nf-core/deepmodeloptim) (@mathysgrapotte)

@bentsherman bentsherman changed the title Add ADR for interface modules ADR: interface modules Jan 28, 2026
@mathysgrapotte
Copy link

in https://github.com/nf-core/deepmodeloptim our way of doing this is quite hacky, very much looking forward to try the new way!

@bentsherman
Copy link
Member

@mathysgrapotte can you elaborate? we like to highlight current approaches in the ADR so that the current downsides are clear

@mathysgrapotte
Copy link

@bentsherman our current only way is to modify the pipeline to get support (for example, add a preprocessing step in nextflow before deepmodeloptim) OR add a python function in the stimulus-py (the package used by deepmodeloptim) so that it can be integrated. So we actually do not have much of a solution to the problem.

I know that nf-core/differentialabundance is moving towards a version 2.0 where they are passing tool params in a config with the data to determine which tool is being ran with which params, and nf-core/multiplesequencealign already uses a similar paradigm.

@suzannejin
Copy link

Hi!
Beyond benchmarking tools of the same class (interface), different classes can be connected to form a workflow, and therefore run/benchmark multiple versions of the workflow (with different combinations of tools) in parallel.

To do so in nf-core/differentialabundance we basically made it possible to parse multiple params, and made them available as part of channel meta to be used dynamically within the pipeline, so -> [[id:..., params:...], data]

@mirpedrol
Copy link

Hi! I am happy to see that this idea is of interest :)
I have been working on something similar using nf-core subworkflows and can provide some of the current examples we are working on in the lab.

multiplesequencealign by @luisas is using subworkflows with all the interchangeable modules, the input channels contain the tool that must be run, and are branched (see example). This allows running multiple tools in parallel.
As input, it uses the typical samplesheet for samples and a toolsheet to select all the tools and arguments that must be run.

I have been working on a way to automate adding additional tools. As part of this, I have a POC for multiplesequence align, which uses a cleaner approach. For example, this is the same subworkflow I copied before, but in a cleaner format.
The idea is to have a definition of a "class" or "interface" in the modules repository, which defines the metadata of the modules (operation, input, and output). The best approach was to also write the modules that belong to this interface in the definition, because we wanted to use the nf-core modules, and we needed a way to automatically create the subworkflow from this. This is the meta.yml we use to describe one of the interfaces.
The subworkflows are updated by a GitHub Action which detects when a new module is added to the interface description. I have developed a Python package which creates the subworkflows.

Regarding differentialabundance @suzannejin is contributing to, they are working on a way to code the pipeline using subworkflows in a similar approach, and allowing the selection of parameters through a config file. Suzanne can provide links to the relevant PRs and discussions :)

I will be happy to expand on any of this if needed!

Overall, we found a good solution using nf-core as a base, but it would be really nice if Nextflow provided a way of running several modules that belong to an interface without the need to update the pipeline code, detect the relevant nf-core modules, and make them available at run time.

@bentsherman
Copy link
Member

Thanks everyone for the use cases.

Interface modules are just one possible solution, which is why I'm trying to focus on the use case and not get too attached to any particular solution.

With #6650 we are planning to add a nextflow module run command, which would allow you to run a module directly, supplying process inputs as params:

# local module
nextflow module run ./modules/nf-core/tcoffee/align --fasta ... --tree ...

# remote module (download and run on-the-fly)
nextflow module run nf-core/tcoffee/align --fasta ... --tree ...

It can print the process outputs as a JSON, allowing to you retrieve them programmatically.

So I think you could build a benchmarking or optimization pipeline by wrapping this command in a generic process that can run arbitrary modules with arbitrary inputs:

process MODULE_RUN {
    input:
    tuple val(module), val(params)

    output:
    tuple val(module), path('output.json')

    script:
    """
    cat << EOF > params.json
    $params
    EOF

    nextflow -q module run $module -params-file params.json > output.json
    """
}

workflow {
    ch_modules = channel.of(
        tuple('nf-core/clustalo/align', params.aligners['clustalo']),
        tuple('nf-core/famsa/align', params.aligners['famsa']),
        tuple('nf-core/kalign/align', params.aligners['kalign']),
        tuple('nf-core/learnmsa/align', params.aligners['learnmsa']),
        tuple('nf-core/magus/align', params.aligners['magus']),
        tuple('nf-core/muscle5/super5', params.aligners['muscle5']),
        tuple('nf-core/tcoffee/align', params.aligners['tcoffee']),
    )

    ch_results = MODULE_RUN( ch_modules )

    // ...
}

This is a minimal example, but hopefully it's clear how you could parameterize each module for benchmarking / optimization.

There is also the idea of detecting and running all modules with a common interface, such as all "aligner" modules. For this you would need to query the module registry -- we would need to provide some way to query for modules that satisfy an input/output spec -- and then you could pass the results directly to this MODULE_RUN process.

That could be where the "module interface" fits in, as a way to define an input/output spec as an entity that other modules can implement. It would primarily be a registry concept.

Anyway, I know it's a rough sketch, but let me know if this seems like a viable solution.

@suzannejin
Copy link

Hey @bentsherman, thanks a lot for the work!

So I think you could build a benchmarking or optimization pipeline by wrapping this command in a generic process that can run arbitrary modules with arbitrary inputs:

I think this is very cool especially for alternative modules with optional input/output (see example subworkflow, with modules 1 and 2). In this example subworkflow, with this approach, one can freely group various module sets based on input/output overlap.

However, often the differences in optional input/output for these modules happened arbitrarily because they were developed by different people, etc. The advantage of having a defined interface is it will guide developers to provide modules under standardized interfaces.

@pditommaso
Copy link
Member Author

Indeed. The central point of this feature is to make it possible to define an interface declaration and register then through in a repository (nextflow registry).

This will allow pulling any compatible module without hardcoding in the pipeline code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants