Skip to content

Flag files for downstream reporting via publishDirΒ #4042

@ewels

Description

@ewels

New feature

An emerging standard for Nextflow pipelines is a root tower.yml file, used for providing reports to Tower.

A potential alternative is to instead define this metadata as part of publishDir, within the Nextflow config. This has a few advantages:

  • Removes the need for yet-another-config-file in the repository root
  • Keeps configuration of published files in a single location, not spread across multiple files
  • Less Tower-specific, more community friendly

In this location, Nextflow will know about the report status of files during the publish step and could potentially match patterns against actual files created, allowing some kind of metadata with precise file paths + report status to be generated in memory / in some kind of report.

Suggest implementation

My suggestion is to add a new directive: report (int). Non-zero values (or >0) could include that files should be shown within downstream reporting functionality. The integer value itself could then be used as a weighting factor when sorting that list.

The directive should be paired together with the ability to filter the published files for a given process based on filename / a closure.

Usage scenario

Based on the publishDir config for a process in the nf-core/rnaseq pipeline, syntax / usage could potentially look something like this:

  withName: '.*:BAM_RSEQC:RSEQC_READDISTRIBUTION' {
      publishDir = [
          path: { "${params.outdir}/${params.aligner}/rseqc/read_distribution" },
          mode: params.publish_dir_mode,
          saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
+         report: { filename -> filename ==~ '.*\.pdf' ? 10 : null }
      ]
  }

Here, any PDF files published by this process would be given a report priority of 10. The integer > 0 indicates that they should be shown in a report interface, value 10 gives weighting score for sorting the list of files there.

The results of of this directive then need to be handled somehow. I expect this to be the most contentious part of this suggestion! My suggestion would be a new optional output file, similar to reports and trace files. This could potentially tie into future efforts for provenance tracking of published files.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions