-
Notifications
You must be signed in to change notification settings - Fork 137
Description
Is this issue already tracked somewhere, or is this a new report?
- I've reviewed existing issues and couldn't find a duplicate for this problem.
Have you checked the status of Earthdata services?
- I've executed
earthaccess.status()and both CMR and EDL returned'OK'.
Current Behavior
I am trying to virtualize a collection of HDF5 granules that are on the MODIS sinusoidal grid. There are several issues with doing this, one of which appears to be in virtualizarr itself, but this ticket will cover an issue in earthaccess. There was an issue with remote_protocol not being passed correctly but that appears to be fixed in 0.16.0
Aside from that, the collection I'm working with has a "Projection" variable that doesn't play nicely with the DMRPP parser. So if I try to open it like this:
# 1. some earthaccess search on the VNP43MA4 collection
# 2. Some filtering code to only get one row of granules from the SIN grid
vds = earthaccess.open_virtual_mfdataset(
row_results,
group="/HDFEOS/GRIDS/VIIRS_Grid_BRDF/Data_Fields",
access="indirect",
concat_dim="XDim",
loadable_variables=["XDim", "YDim"]
)I get a long error but the relevant part is at the end:
File "/app/venv/lib/python3.13/site-packages/virtualizarr/parsers/dmrpp.py", line 74, in __call__
manifest_store = parser.parse_dataset(object_store=store, group=self.group)
File "/app/venv/lib/python3.13/site-packages/virtualizarr/parsers/dmrpp.py", line 181, in parse_dataset
manifest_group = self._parse_dataset(dataset_element)
File "/app/venv/lib/python3.13/site-packages/virtualizarr/parsers/dmrpp.py", line 281, in _parse_dataset
variable = self._parse_variable(var_tag)
File "/app/venv/lib/python3.13/site-packages/virtualizarr/parsers/dmrpp.py", line 391, in _parse_variable
dimension_tags = self._find_dimension_tags(var_tag)
File "/app/venv/lib/python3.13/site-packages/virtualizarr/parsers/dmrpp.py", line 370, in _find_dimension_tags
dimension_tag = self.find_node_fqn(d.attrib["name"])
~~~~~~~~^^^^^^^^
KeyError: 'name'
Expected Behavior
I reverse engineered the earthaccess.open_virtual_mfdataset function and found that if the skip_variables kwarg could be passed to the DMRPPParser, the dataset could be virtualized:
import virtualizarr as vz
from obstore.store import HTTPStore
from virtualizarr.parsers import DMRPPParser
from virtualizarr.registry import ObjectStoreRegistry
# Assume the domain and tile_urls variables come from some parsing code from the earthaccess results
http_store = HTTPStore.from_url(
f"https://{domain}",
client_options={
"default_headers": {
"Authorization": f"Bearer {token}",
},
},
)
obstore_registry = ObjectStoreRegistry({f"https://{domain}": http_store})
vds = vz.open_virtual_mfdataset(
urls=tile_urls,
registry=obstore_registry,
parser=DMRPPParser(
group="/HDFEOS/GRIDS/VIIRS_Grid_BRDF/Data_Fields",
skip_variables=["Projection"],
),
combine="nested",
concat_dim="XDim",
parallel="dask",
loadable_variables=["XDim", "YDim"]
)So this code will work with no issues.
Steps To Reproduce
import earthaccess
import re
earthaccess.login()
results = earthaccess.search_data(
short_name="VNP43MA4",
temporal="2026-01-27",
)
tile_dict = {}
for res in results:
url = res.data_links(access="indirect")[0]
match = re.search(r'\.h(\d{2})v(\d{2})\.', url)
if match:
h, v = int(match.group(1)), int(match.group(2))
tile_dict[(h, v)] = res
row = 2
# this will be unsorted so technically the virtual dataset should be combine "by coords" as well, but it is enough
# to reproduce the issue
row_results = [tile_dict[(h, v)] for h, v in tile_dict.keys() if v == row]
vds = earthaccess.open_virtual_mfdataset(
row_results,
group="/HDFEOS/GRIDS/VIIRS_Grid_BRDF/Data_Fields",
access="indirect",
concat_dim="XDim",
loadable_variables=["XDim", "YDim"]
)Environment
- OS: MacOS Tahoe 26.3
- Python: 3.13.9
- earthaccess: 0.16.0Additional Context
The Projection variable does not have the same fields that the data variables in the group I am virtualizing do, and should be skipped.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status