Skip to content

Custom workflow config merging #28

@victorlin

Description

@victorlin

Background

Pathogen workflows are parameterized by a set of configuration values. The default set of parameters should work out of the box, and is typically stored in a YAML file such as phylogenetic/defaults/config.yaml. The parameters can be adjusted by:

  1. Modifying the default configuration file directly
  2. Overlaying additional YAML configuration files using Snakemake's --config, --configfile, or --configfiles options

Current status

The default configuration file is modified directly when running things locally. Config overlays are commonly used by automation and external users.

However, Snakemake's merging behavior for config overlays does not always align with our needs, such as in the case of subsampling.

Proposal

  • What? Custom config merging code.

  • Where? At the start of the Snakemake workflow.

  • How?

    Options described by @tsibley in augur subsample: merging of external & default configs rsv#106 (comment):

    • What we're doing now: pick a single consistent merge approach that's Good Enough and apply it globally, even though it causes us problems like this one and others.
    • Customize the merge approach used depending on the config context, e.g. when merging subsampling configs, it may be more useful more of the time to override all sample definitions for a build. This is better for the "typical" use cases, but worse for the "unexpected" cases and harder to explain/document to boot.
    • Support explicitly denoting in the overlaying config what should happen at each merge level, with some default behaviour based on either approach above. This is more complicated, but provides an escape hatch when the default merging behaviour hurts more than it helps.

Additional notes

  • It would be good to solve this before propagating augur subsample to more pathogen repos – that way, we can avoid the workaround of using custom_subsample in place of subsample.
  • A config schema would improve error handling, but these efforts can happen independently.
  • Config pre-processing can still be done later.

Progress

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions