Skip to content

Conversation

TomNicholas
Copy link
Member

@TomNicholas TomNicholas commented May 11, 2025

This (WIP) PR adds a to_icechunk() method directly on ManifestGroup, and refactors the .to_icechunk() virtual dataset accessor method to go via the new ManifestGroup method.

The point is

  1. To further modularize the code by providing the dual of going from ManifestStore to virtual dataset, by adding the ability to go back from a virtual dataset to a ManifestStore,
  2. To therefore allow users to create and persist virtual references for a single file without ever using xarray, i.e. enable Make xarray an optional dependency? #521.

After this PR we should do the same for .to_kerchunk() - at that point we could actually make xarray an optional dependency if we want to.

  • Closes first step of Make xarray an optional dependency? #521
  • Tests added
  • Tests passing
  • Full type hint coverage
  • Changes are documented in docs/releases.rst
  • New functions/methods are listed in api.rst
  • New functionality has documentation


return cls(arrays=manifestarrays, attributes=attributes)

def to_icechunk(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The point of adding this on ManifestGroup instead of ManifestStore initially is to make it easier to compose to deal with virtual datatrees later.

@maxrjones maxrjones added this to the v2.0.1 milestone Jul 13, 2025
Comment on lines +139 to +169
@classmethod
def from_virtual_dataset(
cls,
vds: xr.Dataset,
) -> "ManifestGroup":
"""
Create a new ManifestGroup from a virtual dataset object.
The virtual dataset should contain only virtual variables, i.e. those backed by ManifestArrays.
Parameters
----------
vds: xr.Dataset
Virtual dataset, containing only virtual variables.
"""

for name, var in vds.variables.items():
if not isinstance(var.data, ManifestArray):
raise TypeError(
f"Cannot convert a dataset containing a loadable variable directly to a ManifestGroup, but found variable {name} has type {type(var.data)}"
)

manifestarrays = {name: var.data for name, var in vds.variables.items()}

attributes = vds.attrs
# TODO test this is correct
attributes["dimension_names"] = " ".join(list(vds.dims))
# TODO test this constructor round-trips coordinates
attributes["coordinates"] = " ".join(list(vds.coords))

return cls(arrays=manifestarrays, attributes=attributes)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this functionality be moved to a different PR to constrain the scope of the ManifestGroup.to_icechunk feature?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Icechunk 🧊 Relates to Icechunk library / spec internals
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants