Skip to content

Conversation

ilan-gold
Copy link

@ilan-gold ilan-gold commented Aug 28, 2025

Some potential TODOs:

  1. More complex h5 objects from currently parsed file formats that fit the nested paradigm and would otherwise have not worked
  2. Along the lines of 1., maybe bring in anndata as an optional dep to test the behavior of this more deeply I think this raises separate issues because anndata has semantics around object dtype that would need to be handled by a parser, I think
  3. More pathological examples and/or other places where this feature might not work. I'm not too familiar with this library/space so would be very appreciative of some investigative directions! Here are a few potential ones:
  • open_virtual_dataset seems to return an xarray.Dataset but I think in the case where things are nested, it should return a DataTree maybe? or error out? construct_virtual_dataset and thus ManifestStore.to_virtual_dataset suffer the same issue

  • Can the non-hdf5 formats handle nested structures, like netCDF or fits?

  • Closes hdf parser limited to level-0 h5py.Dataset #664

  • Tests added

  • Tests passing

  • Full type hint coverage

  • Changes are documented in docs/releases.rst

  • New functions/methods are listed in api.rst

  • New functionality has documentation

@TomNicholas
Copy link
Member

open_virtual_dataset seems to return an xarray.Dataset but I think in the case where things are nested, it should return a DataTree maybe? or error out? construct_virtual_dataset and thus ManifestStore.to_virtual_dataset suffer the same issue

Sorry I think I should have linked to #84. What we want is to add open_virtual_datatree and ManifestStore.to_virtual_datatree, and have .to_virtual_dataset/tree implemented on a ManifestStore class which supports nested groups (that's what this PR adds, which is great).

Can the non-hdf5 formats handle nested structures, like netCDF or fits?

NetCDF4 is HDF5 in a trenchcoat 🙂 NetCDF3 cannot handle groups. TIFF can. FITS IDK.

@ilan-gold
Copy link
Author

ilan-gold commented Sep 7, 2025

TIFF can

Good point, but I don't see any tiff tests at the moment. I'd be happy to add them if you want. But then I would need to edit the Tiff parser so perhaps this PR should be limited to the store + hdf5

Sorry I think I should have linked to #84.

Hmm, it seems adding a check for subgroups caused tests to fail. It looks like opening a group with a subgroup actually does work but just creates a dataset object at the root group if present, ignoring the other subkeys. Judging by the tests, this is intentional. So I can remove the check then that I added here. I guess the vibes well with having a new API in open_virtual_datatree

EDIT: done!

Copy link

codecov bot commented Sep 7, 2025

Codecov Report

❌ Patch coverage is 94.11765% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 87.78%. Comparing base (c1db925) to head (fc0e999).

Files with missing lines Patch % Lines
virtualizarr/manifests/store.py 92.30% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #790      +/-   ##
==========================================
+ Coverage   87.75%   87.78%   +0.03%     
==========================================
  Files          35       35              
  Lines        1886     1891       +5     
==========================================
+ Hits         1655     1660       +5     
  Misses        231      231              
Files with missing lines Coverage Δ
virtualizarr/manifests/group.py 93.47% <100.00%> (+1.81%) ⬆️
virtualizarr/parsers/hdf/hdf.py 95.41% <100.00%> (+0.07%) ⬆️
virtualizarr/xarray.py 85.43% <ø> (ø)
virtualizarr/manifests/store.py 87.39% <92.30%> (-0.33%) ⬇️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@ilan-gold ilan-gold changed the title feat: nested group feat: nested group in ManifestStore + HDFParser Sep 7, 2025
@ilan-gold ilan-gold marked this pull request as ready for review September 7, 2025 17:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

hdf parser limited to level-0 h5py.Dataset

2 participants