-
Notifications
You must be signed in to change notification settings - Fork 10
Finalise tskit #395
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Finalise tskit #395
Conversation
d467261
to
c4cdd47
Compare
I think this is nearly ready to go. There's some details about how we deal with coordinate systems between tskit and VCZ which I'm going to punt on, but I think we should be able to handle the vast majority of inputs well with this. I've added an initial framework for the Python API, just documenting a single function |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks, good. One nit in the docs.
One concern I have is that a key argument to convert
is a tskit VcfModelMapping
(or something that duck-types to a VCFModelMapping
) that is currently undocumented in tskit. We should add docs in tskit and make its constructor part of the public API.
bio2zarr/tskit.py
Outdated
contig_id=None, | ||
isolated_as_missing=False, | ||
): | ||
import tskit | ||
|
||
self._path = ts_path | ||
self.ts = tskit.load(ts_path) | ||
self._path = None # Not sure what we're using this for? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To answer the comment question: The base class defines Source.path
which is used when we process out-of-core to put the path in metadata. This isn't used for tskit (at least for now) so I guess TskitFormat.path
could just raise NotImplemented
and the path not be stored.
899f244
to
d941091
Compare
d941091
to
bf317ff
Compare
Can tsinfer.infer(tsinfer.VariantData(returned_handle, ancestral_state="ancestral_allele")) |
No, that's out of scope because would greatly complicate the semantics (should the returned object be a Zarr hierarchy, or an Xarray dataset? Should it be in read-only mode?). Bio2zarr just deals with the on-disk formats, not providing python APIs for working with them. Bio2zarr will always create an on-disk representation, so returning a handle would just be abstracting the details of where the temporary files are kept. I think tsinfer should probably write a wrapper which creates a temporary directory and does the conversion so that you can use it like this:
That'll take care of a lot of the messiness. |
Ah, OK. I had gathered that bio2zarr might also create in-memory representations, which I use quite a lot. But yes, a wrapper would be fine, I guess, assuming temporary directories etc are available. |
Closes #232
Fixes #377