-
Notifications
You must be signed in to change notification settings - Fork 49
Add .to_icechunk()
method to ManifestGroup
#591
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add .to_icechunk()
method to ManifestGroup
#591
Conversation
|
||
return cls(arrays=manifestarrays, attributes=attributes) | ||
|
||
def to_icechunk( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The point of adding this on ManifestGroup
instead of ManifestStore
initially is to make it easier to compose to deal with virtual datatrees later.
@classmethod | ||
def from_virtual_dataset( | ||
cls, | ||
vds: xr.Dataset, | ||
) -> "ManifestGroup": | ||
""" | ||
Create a new ManifestGroup from a virtual dataset object. | ||
The virtual dataset should contain only virtual variables, i.e. those backed by ManifestArrays. | ||
Parameters | ||
---------- | ||
vds: xr.Dataset | ||
Virtual dataset, containing only virtual variables. | ||
""" | ||
|
||
for name, var in vds.variables.items(): | ||
if not isinstance(var.data, ManifestArray): | ||
raise TypeError( | ||
f"Cannot convert a dataset containing a loadable variable directly to a ManifestGroup, but found variable {name} has type {type(var.data)}" | ||
) | ||
|
||
manifestarrays = {name: var.data for name, var in vds.variables.items()} | ||
|
||
attributes = vds.attrs | ||
# TODO test this is correct | ||
attributes["dimension_names"] = " ".join(list(vds.dims)) | ||
# TODO test this constructor round-trips coordinates | ||
attributes["coordinates"] = " ".join(list(vds.coords)) | ||
|
||
return cls(arrays=manifestarrays, attributes=attributes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this functionality be moved to a different PR to constrain the scope of the ManifestGroup.to_icechunk feature?
This (WIP) PR adds a
to_icechunk()
method directly onManifestGroup
, and refactors the.to_icechunk()
virtual dataset accessor method to go via the newManifestGroup
method.The point is
ManifestStore
to virtual dataset, by adding the ability to go back from a virtual dataset to a ManifestStore,After this PR we should do the same for
.to_kerchunk()
- at that point we could actually make xarray an optional dependency if we want to.docs/releases.rst
api.rst