Skip to content

htsget: spec should indicate (dictate?) whether .vcf.gz is preferred (mandatory?) over plaintext .vcf? #837

@brainstorm

Description

@brainstorm

As a result of this discussion, the htsget-rs team at @umccr gave some thought to it and decided to favor indexed (and compressed) variant formats with htsget (.vcf.gz{.tbi}) for at least a couple of reasons:

  1. Sets a bad precedent w.r.t the use of compression in production. It should be a given that assets are stored compressed at rest: it just makes economic sense at scale.
  2. Supporting plaintext VCF unnecessarily complicates resolution logic on htsget server(s).

I'm sure there are more pros, cons, opinions, corner cases, workarounds and preferences about this topic?

EDIT: Tangentially related issue on how to handle ID resolvers and the requested format's file extensions (or lack thereof): umccr/htsget-rs#127

/cc @jrobinso @mmalenic @ohofmann @andrewpatto

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions