Skip to content

Workflow output definitions and schemaΒ #4670

@bentsherman

Description

@bentsherman

Spun off from #4042 (comment)

Currently, the publishDir directive is used to define "workflow outputs". It is a "push" model in which individual processes are responsible for deciding which of their outputs should be workflow outputs, and how to publish them. It was created in the days of DSL1 when there was no concept of workflows / subworkflows.

With DSL2 and the introduction of workflows, publishDir is an unwieldy way to define workflow outputs. For one thing, it is extremely difficult to get a full picture of a workflow's outputs because they are scattered across the process definitions. Additionally, a process could be invoked many different times by different workflows, and each workflow might have a different idea of its own outputs. All in all, it no longer makes sense to define workflow outputs in the process definition.

Instead, it should be possible to specify an output schema for a workflow, which defines the full set of published outputs, as well as metadata, in terms of the processes (and subworkflows) that the workflow invokes.

The two options discussed so far are:

  1. define the workflow outputs as part of the workflow definition (e.g. an output block) which can reference the outputs of subcomponents
  2. define the workflow outputs in a separate file similar to what is done for nf-core modules

See also #4661 for discussion of where to put additional publish options such as mode and saveAs. The publish mode could become a config option or process directive (e.g. publishOptions.mode) to allow for process-specific configuration, or they could become part of the output schema.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions