-
Notifications
You must be signed in to change notification settings - Fork 3
Description
STAC Assets creation depends on an attribute called access_urls, which holds the various endpoints served by THREDDS. At the moment, we get these endpoints by
- Sending a request to the NcML service -> xml
- Converting the xml response to a dict using
xncml.Dataset.to_cf_dict-> attrs - Updating
attrs["access_urls"]withsiphon.catalog.Dataset.access_urls
These look like this:
{'HTTPServer': 'https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/fileServer/birdhouse/testdata/xclim/cmip6/sic_SImon_CCCma-CanESM5_ssp245_r13i1p2f1_2020.nc',
'OPENDAP': 'https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/dodsC/birdhouse/testdata/xclim/cmip6/sic_SImon_CCCma-CanESM5_ssp245_r13i1p2f1_2020.nc',
'NCML': 'https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/ncml/birdhouse/testdata/xclim/cmip6/sic_SImon_CCCma-CanESM5_ssp245_r13i1p2f1_2020.nc',
'UDDC': 'https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/uddc/birdhouse/testdata/xclim/cmip6/sic_SImon_CCCma-CanESM5_ssp245_r13i1p2f1_2020.nc',
'ISO': 'https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/iso/birdhouse/testdata/xclim/cmip6/sic_SImon_CCCma-CanESM5_ssp245_r13i1p2f1_2020.nc',
'WCS': 'https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/wcs/birdhouse/testdata/xclim/cmip6/sic_SImon_CCCma-CanESM5_ssp245_r13i1p2f1_2020.nc',
'WMS': 'https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/wms/birdhouse/testdata/xclim/cmip6/sic_SImon_CCCma-CanESM5_ssp245_r13i1p2f1_2020.nc',
'NetcdfSubset': 'https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/ncss/birdhouse/testdata/xclim/cmip6/sic_SImon_CCCma-CanESM5_ssp245_r13i1p2f1_2020.nc'}
This is done by THREDDSLoader.extract_metadata.
I think a cleaner solution would be to rely on the THREDDS response itself for those access urls instead of the siphon implementation.
We can get the THREDDS access points by sending a get request to the same NcML service, but with parameters:
requests.get(url, params={"catalog": catalog, "dataset": dataset}) with
catalog : str
Link to catalog storing the dataset.
dataset : str
Relative link to the dataset.
With this modified request url, the response includes the following additional group THREDDSMetadata:
OrderedDict([('attributes',
OrderedDict([('id',
'birdhouse/testdata/xclim/cmip6/sic_SImon_CCCma-CanESM5_ssp245_r13i1p2f1_2020.nc'),
('full_name',
'cmip6/sic_SImon_CCCma-CanESM5_ssp245_r13i1p2f1_2020.nc')])),
('groups',
OrderedDict([('services',
OrderedDict([('attributes',
OrderedDict([('httpserver_service',
'https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/fileServer/birdhouse/testdata/xclim/cmip6/sic_SImon_CCCma-CanESM5_ssp245_r13i1p2f1_2020.nc'),
('opendap_service',
'https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/dodsC/birdhouse/testdata/xclim/cmip6/sic_SImon_CCCma-CanESM5_ssp245_r13i1p2f1_2020.nc'),
('wcs_service',
'https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/wcs/birdhouse/testdata/xclim/cmip6/sic_SImon_CCCma-CanESM5_ssp245_r13i1p2f1_2020.nc?service=WCS&version=1.0.0&request=GetCapabilities'),
('wms_service',
'https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/wms/birdhouse/testdata/xclim/cmip6/sic_SImon_CCCma-CanESM5_ssp245_r13i1p2f1_2020.nc?service=WMS&version=1.3.0&request=GetCapabilities'),
('nccs_service',
'https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/ncss/birdhouse/testdata/xclim/cmip6/sic_SImon_CCCma-CanESM5_ssp245_r13i1p2f1_2020.nc/dataset.html')]))])),
('dates',
OrderedDict([('attributes', OrderedDict())]))]))])
This yields an id, and a list of services (note the keys are not the same as above, underlining the fact that the siphon implementation may be arbitrarily assigning names).
My feeling is that the function STAC_item_from_metadata should rely on the latter instead of the former, so it doesn't depend on custom logic hidden in the THREDDSLoader.extract_metadata.
The other bit of additional logic that we should get rid of is attrs["attributes"] = numpy_to_python_datatypes(attrs["attributes"]). I think this could be implemented into to_cf_dict, and I'm willing to make an xncml release with this in case you agree with the changes proposed here.