-
Notifications
You must be signed in to change notification settings - Fork 10
Open
Labels
enhancementNew feature or requestNew feature or request
Description
We should be able to expose a fairly simple writer API for straighforward cases where the size of the data is know ahead of time and we can consume the data "variant by variant". Something like
from bio2zarr import vcz
ts = # a tskit tree sequence for example
with vcz.SequentialWriter(num_variants=ts.num_sites, num_samples=ts.num_samples, ploidy=1) as writer:
for tsk_var in ts.variants():
vcz_var = vcz.Variant(alleles=...,
genotypes=tsk_val.genotypes.reshape(..., 2))
writer.write(
alleles=...,
position=tsk_var.site.position,
genotypes=tsk_var.genotypes.reshape(..., 2)
# etc.
)
I think this would be useful because there's a lot of cases where we just want to do something fairly simple to an existing dataset which isn't massive.
This could be used in conjunction with sgkit-dev/vcztools#112 to really simplify the process of iteratively generating datasets.
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request