Skip to content

Conversation

@pditommaso
Copy link
Member

@pditommaso pditommaso commented Dec 11, 2025

Summary

Architecture Decision Record (ADR) for the Nextflow Module System, documenting:

  • Remote module inclusion via registry with @scope/name syntax
  • Semantic versioning with dependency resolution
  • Unified Nextflow Registry (rebrand existing plugin registry)
  • First-class CLI support (pull, push, search, run)
  • Module metadata schema (meta.yaml) with JSON Schema validation
  • Structured tool arguments replacing ext.args pattern

Key Files

  • adr/20251114-module-system.md - Main ADR document with full specification
  • adr/module-spec-schema.json - JSON Schema for meta.yaml validation

Updates

Version 2.2 (2025-01-06)

  • Structured tool arguments: Added args property to tools section for type-safe argument configuration
  • New implicit variables: tools.<toolname>.args.<argname> returns formatted flag+value; tools.<toolname>.args returns all args concatenated
  • Deprecation: ext.args, ext.args2, ext.args3 pattern deprecated in favor of structured tool arguments

Version 2.1 (2025-12-11)

  • Unified dependencies: Consolidated components, dependencies, and requires into single requires field
  • New sub-properties: requires.modules and requires.workflows for declaring module dependencies
  • Unified version syntax: [scope/]name[@constraint] format across plugins, modules, and workflows
  • Deprecation: components field deprecated (use requires.modules instead)

Test plan

  • Review ADR document for completeness and clarity
  • Validate JSON Schema against example meta.yaml files
  • Review tool arguments specification for nf-core compatibility
  • Validate design decisions align with ecosystem needs

🤖 Generated with Claude Code

pditommaso and others added 4 commits November 17, 2025 15:09
Introduce comprehensive development constitution documenting core principles
and practices for Nextflow development including modular architecture,
test-driven quality assurance, dataflow programming model, licensing
compliance, DCO requirements, semantic versioning, and Groovy code standards.

The constitution codifies existing best practices from CLAUDE.md and
CONTRIBUTING.md to provide clear governance and quality standards for
the project.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>
Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>
Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>
Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>
@netlify
Copy link

netlify bot commented Dec 11, 2025

Deploy Preview for nextflow-docs-staging canceled.

Name Link
🔨 Latest commit d86c211
🔍 Latest deploy log https://app.netlify.com/projects/nextflow-docs-staging/deploys/69774aee6cb59e00087f2983

pditommaso and others added 6 commits December 11, 2025 16:22
Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>
- Add comprehensive documentation for all module CLI commands
- Add `nextflow module run` command for standalone module execution
- Remove `module update` command to simplify the design
- Use single-dash prefix for Nextflow options, double-dash for module inputs
- Remove @ prefix from scope in CLI commands (keep only in DSL syntax)

Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>
- Remove separate `dependencies` and `components` fields
- Expand `requires` to include:
  - `nextflow`: version constraint (unchanged)
  - `plugins`: array with name@constraint syntax
  - `modules`: array of module dependencies
  - `workflows`: array of workflow dependencies
- Unified version constraint syntax: `[scope/]name[@constraint]`
- Mark `components` as deprecated (use requires.modules)
- Update all examples in ADR and schema

Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>
- Add `args` property to tools section for type-safe argument configuration
- Define toolArgSpec with flag, type, description, default, enum, required
- Support implicit variable `tools.<toolname>.args.<argname>` returning
  formatted flag+value (e.g., "-K 100000000")
- Support `tools.<toolname>.args` to return all args concatenated
- Document deprecation of ext.args/ext.args2/ext.args3 pattern
- Update ADR with Tool Arguments Configuration section and appendix

[ci skip]

Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>
@pditommaso pditommaso force-pushed the 251117-module-system branch from c956ca7 to 01c777a Compare January 6, 2026 12:07
@pditommaso pditommaso marked this pull request as ready for review January 6, 2026 12:09
@bentsherman
Copy link
Member

The module spec schema will be defined in the nextflow-io/schemas repo. Here is the PR for it: nextflow-io/schemas#10

Comment on lines +39 to +48
### 1. Remote Module Inclusion

**DSL Syntax**:
```groovy
// Import from registry (scoped module name, detected by @scope prefix)
include { BWA_ALIGN } from '@nf-core/bwa-align'
// Existing file-based includes remain supported
include { MY_PROCESS } from './modules/my-process.nf'
```
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be out-of-scope for the first iteration. Let's focus on the modules command and module spec generation

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I have understood correctly, at parse time @nf-core/bwa-align will be translated to modules/@scope/name@version/main.nf.
If it is not supported in first iteration, users must write the include statements providing the module path taking into account the version and keep the coherence of these include statements with the versions of nextflow.config.
I think it is also required for defining modules importing other modules. If this is not supported, modules must be written using include with paths.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, likely module versioning would also be out of scope. The primary motivation initially was to allow remote module execution:

nextflow module run @nf-core/bwa-align

This can be done without remote module inclusion. We can probably still implement module versioning just for installing and executing modules

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no need to delay module versioning, there's already an initial implementation for the registry backend

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think remote module inclusion should be excluded for now. This way we don't have to update the language / runtime to deal with remote includes

I have attempted to remove these bits in #6723

The module CLI commands all work the same way. The module run command can use a global cache to download and execute remote modules on-the-fly

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have done quite a good work to distill the requirements to core needs and use cases. I'm not getting why it should stripped the support remote module. Without that it would become just a re-implementation of existing nf-core module tool. Remote inclusion is an essential part of this feature.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it is not essential. It has nothing to do with the original motivation of remote module execution.

For years we said we wouldn't do remote module includes, and now suddenly it's the most important thing we should be building? I have still not heard a good argument for why we need to do it right now.

pditommaso and others added 6 commits January 7, 2026 22:59
Co-authored-by: Jorge Ejarque <jorgee@users.noreply.github.com>
Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>
Co-authored-by: Jorge Ejarque <jorgee@users.noreply.github.com>
Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>
Co-authored-by: Jorge Ejarque <jorgee@users.noreply.github.com>
Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>
Expand deprecation notice to cover all ext.* custom directives

Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>
- Use consistent module path format with version: modules/@scope/name@version/
- Fix directory structure example: samtools-view -> samtools/view
- Standardize on 'license' spelling (American English)
- Fix author -> authors (plural array format)

Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>
Remove @ prefix from scope in API path to match API definition

Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>
pditommaso and others added 2 commits January 15, 2026 12:54
Specification for Nextflow module system client implementation based on
ADR 20251114-module-system.md. Covers:

P1 (Core):
- Install and use registry modules via @scope/name syntax
- Run modules directly from CLI without wrapper workflow
- Structured tool arguments replacing ext.args pattern

P2 (Important):
- Module version management and freeze command
- Module integrity protection with checksum validation

P3 (Nice to have):
- Remove module command
- Search and discover modules
- Publish module to registry

Registry backend is out of scope (assumed implemented).

Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>
- Remove `freeze` command from CLI
- Remove transitive dependency install behavior
- Remove orphaned transitive dependency removal from `remove` command
- Update rationale and consequences sections
- Simplify dependency resolution flow
- Update ADR to version 2.4

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>
@pditommaso
Copy link
Member Author

Version 2.4 Changes

Updated the ADR to remove transitive dependency resolution from the initial implementation scope. This simplifies the module system by requiring explicit dependency declarations.

Summary of Changes

  • Removed freeze command from CLI - no longer needed without transitive dependency management
  • Simplified install behavior - removed automatic recursive installation of transitive dependencies
  • Updated remove command - removed orphaned transitive dependency cleanup
  • Simplified dependency resolution flow - modules are resolved directly from nextflow.config declarations only
  • Updated rationale and consequences sections to reflect the simplified model

Rationale

Transitive dependency resolution adds significant complexity and will not be implemented in this initial stage. Each module's dependencies must be explicitly declared in nextflow.config, giving users full control and visibility over what modules are installed.

This change also affects the spec at specs/251117-module-system/spec.md which was updated to remove:

  • Transitive dependency acceptance scenarios
  • FR-010, FR-011 (transitive dependency requirements)
  • freeze command from FR-019
  • SC-008 (transitive dependencies success criteria)

Comment on lines +99 to +103
// Module versions (exact versions only, no ranges)
modules {
'@nf-core/salmon' = '1.1.0'
'@nf-core/bwa-align' = '1.2.0'
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with specifying modules in the config is that the user can override config at runtime

I have altered the ADR to store these in nextflow_spec.json instead in #6723

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree config is not the best place, but it's not the same that's happening for plugins? My take is the we should managed in the same manner

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Plugins need to be able to be customized at runtime, modules do not. That is the difference

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think you raised a point, however I believe compared to other lang with nextflow is desirable to being able to control module version at config time. Even more with interface modules we are going toward a system in which modules (and therefore versions) can be fully decided at runtime.

This makes me thing nextflow.config is the right place to keep them.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not base the module design on other proposed features that are still speculative. The bottom line is that module versions are currently baked into the pipeline definition and we have not committed to changing that in any way.

I don't think I have seen a programming language where code dependencies are mixed with runtime configuration, except perhaps some crazy dotnet stuff. So the idea of hot-swapping modules at runtime needs a lot more scrutiny

**Example**:
```bash
nextflow module install # Install all from config
nextflow module install nf-core/bwa-align # Install specific module (latest)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This downloads the files, but presumably doesn't do anything with imports, so they won't be used - is that right? I wonder how much value this brings.

In nf-core we give this option partly because nf-core module install gives a fuzzy search so it's a quick way to find a module that you know exists. Then it prompts you post-install with the include text to copy + paste into your nextflow script.

If this command doesn't have the fuzzy interactive prompt, I'm not sure that specifying any arguments at all really brings much value. Might be easier to just have the install from config (behaviour more like npm install then).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if we exclude remote module inclusion for now then there is no need for nextflow module install, because it is intended to be used in a setting where module code is not baked into the repo

But what do you mean about "not doing anything with imports"?


### New Pattern: Structured Tool Arguments

Modules declare available arguments in `meta.yaml` under each tool's `args` property:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be good to be explicit on this point. I have described this to a few people now and almost everyone has had the same question / fear, that maintainers will be forced to maintain 1:1 parity with the upstream tool.

Suggested change
Modules declare available arguments in `meta.yaml` under each tool's `args` property:
Modules declare available arguments in `meta.yaml` under each tool's `args` property.
This list does _not_ need to be exhaustive. It should include any arguments known to be used by pipelines or that could be expected to be used by users. However, arguments can still be specified in the config even if not defined in this file, so absence does not prevent use.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies Phil, I removed However, arguments can still be specified in the config even if not defined in this file, so absence does not prevent use.. I believe we need go beyond ext.args in the config. it's really a bad back that makes it impossible to have visibility/control over the module parametrisation

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Phil's point is that tools.<tool>.args can still be specified in the config, and it can specify args that aren't documented in the module spec. Not sure what is hacky about that. These two things are absolutely necessary if you want anyone to use this feature

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think params solve this. Do you agree?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, at least to replace the tools.<tools>.args syntax.

The documenting tool CLI args in the module spec as a hint could still be useful on its own, even if it's not an exhaustive list. But the tool spec can also specify a documentation URL, so maybe users and agents could just use that to find CLI args.

pditommaso and others added 4 commits January 21, 2026 12:23
Co-authored-by: Jorge Ejarque <jorgee@users.noreply.github.com>
Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>
Co-authored-by: Phil Ewels <phil.ewels@seqera.io>
Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>
Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>
Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>
- Remove plugins, modules, subworkflows from requires block
- Spec now focused on process modules only
- Dependencies managed via nextflow.config

Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>
pditommaso and others added 2 commits January 23, 2026 11:33
- Module parameters defined in meta.yaml params section with name, type, description, and example attributes
- Removed tools args property and specification
- Updated schema and examples throughout

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>
Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>
@pditommaso
Copy link
Member Author

Version 2.5 Updates (2026-01-23)

Major Changes

Module Parameters - Replaced the structured tool arguments (tools.<toolname>.args) approach with a simpler, more general module parameters system:

  • Parameters are now defined in meta.yaml under a params section
  • Each parameter has: name (required), type (optional), description (optional), example (optional)
  • Parameters use standard Nextflow params syntax in scripts and CLI (--<param_name>)
  • Removes the complexity of tool-specific argument mapping

Example:

params:
  - name: batch_size
    type: integer
    description: "Process INT input bases in each batch"
    example: 100000000

  - name: use_soft_clipping
    type: boolean
    description: "Use soft clipping for supplementary alignments"

Simplified Tools Section - The tools section now only documents tool metadata (description, homepage, license, identifier). The args property has been removed.

Simplified requires Block - Now only contains nextflow version constraint. Removed plugins, modules, and subworkflows sub-properties.

Process Modules Focus - Spec is now focused on process modules only; sub-workflow references removed.

Rationale

The module parameters approach is simpler and more aligned with existing Nextflow patterns:

  • Uses familiar params variable instead of introducing new tools.*.args syntax
  • Parameters are module-level, not tool-level, giving module authors flexibility
  • Cleaner separation between tool documentation and module configuration

```groovy
// Module versions (exact versions only, no ranges)
modules {
'@nf-core/salmon' = '1.1.0'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The use of @ seems a bit unclear. It is used in config, storage, and in the remote module inclusion, but not in commands and the registry API.

It seems the jCommander does not allow passing a parameter starting with @. It is reserved for some expansion stuff. When I use it, jCommander uses the string after @ as a path to read.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I don't think we actually need the @ prefix to signify scope. You could just use nf-core/salmon to refer to a remote module and ./path/to/local/module for local modules

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However it's needed in the include syntax to distinguish from existing modules eg. include { FASTQC } from '@nf-core/fastqc'. Alongside it should be used in the local path ./modules/@nf-core/fastqc.

We can drop in the CLI commands since implicitly will refers to "managed" modules

I'd keep in the config for consistency with include syntax


**Key Behaviors**:
- **Version change**: When the declared version differs from the installed version (and local is unmodified), the local module is automatically replaced with the declared version
- **Local modification**: When the local module content was manually changed (checksum mismatch with `.checksum`), Nextflow warns and does NOT override to prevent accidental loss of local changes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this checksum validation could be tricky. In the registry, the checksum is calculated in the tar.gz bundle file. The local download is the uncompressed tar.gz bundle in a folder.
To make both checksums comparable, we should control how bundles are created to apply the checksum on a reproducible tar.gz package. I think the order on how file are packed in the tar and some headers could change even if the content is not modified.

Copy link
Contributor

@jorgee jorgee Jan 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we cannot rely in the way that a user created a the tar.gz bundle when publishing the package. It would be better to use the registry checksum just to validate the integrity in the download and store in the .checksum the hash of the files content once uncompressed and a in a sorted order

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The nextflow command should create the tar, and likely use the same library both nextflow side and backend side. Think the trick is to have a predictable traversal order and ignore file timestamps

3. Validates command-line arguments against the process input schema
4. Validates parameters against the `params` schema in `meta.yaml`
5. Generates an implicit workflow that wires CLI arguments to process inputs
6. Executes the workflow using standard Nextflow runtime
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
6. Executes the workflow using standard Nextflow runtime
6. Executes the workflow using standard Nextflow runtime
7. Prints JSON of process outputs

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, tho I would not require explicitly the output format. Likely json it should be the an output option

- All commands respect the `registry.url` configuration for custom registries
- Modules are automatically downloaded on `nextflow run` if missing but configured

## Module Structure
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
## Module Structure
## Module Structure
A module should define exactly one process. It may optionally define an entry workflow and any number of helper functions.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An entry workflow to run itself? could not this conflict with the automatic process inputs mapping?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My idea was that if the module defines an entry workflow and params block, then the module run command can just use that, no need to generate implicit ones. That way users can customize the standalone module execution if they want

In fact I think this is what it already does, just a matter of saying that it is allowed in a module

Signed-off-by: Ben Sherman <bentshermann@gmail.com>
@bentsherman
Copy link
Member

Trying to resolve some differences between the module schema in this PR and the one in nextflow-io/schemas#10

I think the only substantive change is adding the topics section to this PR. Everything else is just refactoring the JSON schema definitions for clarity

Overall, the schema in this PR seems to have better descriptions and validation constraints, as well as examples. So I'm happy to absorb those changes in the schemas repo once the ADR is merged.

Co-authored-by: Ben Sherman <bentshermann@gmail.com>
Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>
@pditommaso
Copy link
Member Author

I think the only substantive change is adding the topics section to this PR. Everything else is just refactoring the JSON schema definitions for clarity

Good, let's add it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants