Skip to content

Allow users to define custom nucleotide coordinate offsets for clades for custom references #210

@huddlej

Description

@huddlej

Context

We allow users to define their own reference FASTA and annotation GFF in their build configurations. However, clade and subclade definitions use nucleotide positions from a specific reference sequence. When the user's reference sequence differs in coordinates from the reference used to define clades and subclades, they need a way to offset the clade coordinates for their custom reference. Since the workflow downloads clades from GitHub, we should not encourage users to locally modify clade definition files that will be likely be overwritten by subsequent runs of the workflow.

The Nextclade workflow deals with this same issue by introducing a custom rule to offset the clade coordinates using a configuration-based offset value (e.g., the offset for H3N2 HA reference A/Darwin/6/2021).

This issue arose through discussion of #208

Description

We should allow users to define custom nucleotide offsets for their custom references, so clade and subclade definitions work as expected for their builds.

Possible solution

One solution would be to copy the offset_clades rule from Nextclade into the core workflow such that we always generate the offset clades files even when no offset is defined by the user. We could allow users to define a build-level offset per reference sequence. For example, if we used this pattern I've recommended for defining segment-specific parameters, we could support an optional build-level field for reference_offset that could look like this:

reference:
  ha: "nextclade/dataset_config/h3n2/ha/EPI1857216/reference.fasta"
  na: "nextclade/dataset_config/h3n2/na/EPI1857215/reference.fasta"
reference_offset:
  ha: -17

Since the default offset would be 0, users would only need to define nonzero offsets in this way.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions