Skip to content

Documented VCZ writer API #412

@jeromekelleher

Description

@jeromekelleher

We should be able to expose a fairly simple writer API for straighforward cases where the size of the data is know ahead of time and we can consume the data "variant by variant". Something like

from bio2zarr import vcz

ts = # a tskit tree sequence for example

with vcz.SequentialWriter(num_variants=ts.num_sites, num_samples=ts.num_samples, ploidy=1) as writer:
    for tsk_var in ts.variants():
          vcz_var = vcz.Variant(alleles=...,
               genotypes=tsk_val.genotypes.reshape(..., 2))

          writer.write(
                 alleles=...,
                position=tsk_var.site.position,
                genotypes=tsk_var.genotypes.reshape(..., 2)
                # etc.
          )

I think this would be useful because there's a lot of cases where we just want to do something fairly simple to an existing dataset which isn't massive.

This could be used in conjunction with sgkit-dev/vcztools#112 to really simplify the process of iteratively generating datasets.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions