Skip to content

Clarify mechanism to get asset links #34

@huard

Description

@huard

STAC Assets creation depends on an attribute called access_urls, which holds the various endpoints served by THREDDS. At the moment, we get these endpoints by

  1. Sending a request to the NcML service -> xml
  2. Converting the xml response to a dict using xncml.Dataset.to_cf_dict -> attrs
  3. Updating attrs["access_urls"] with siphon.catalog.Dataset.access_urls

These look like this:

{'HTTPServer': 'https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/fileServer/birdhouse/testdata/xclim/cmip6/sic_SImon_CCCma-CanESM5_ssp245_r13i1p2f1_2020.nc',
 'OPENDAP': 'https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/dodsC/birdhouse/testdata/xclim/cmip6/sic_SImon_CCCma-CanESM5_ssp245_r13i1p2f1_2020.nc',
 'NCML': 'https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/ncml/birdhouse/testdata/xclim/cmip6/sic_SImon_CCCma-CanESM5_ssp245_r13i1p2f1_2020.nc',
 'UDDC': 'https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/uddc/birdhouse/testdata/xclim/cmip6/sic_SImon_CCCma-CanESM5_ssp245_r13i1p2f1_2020.nc',
 'ISO': 'https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/iso/birdhouse/testdata/xclim/cmip6/sic_SImon_CCCma-CanESM5_ssp245_r13i1p2f1_2020.nc',
 'WCS': 'https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/wcs/birdhouse/testdata/xclim/cmip6/sic_SImon_CCCma-CanESM5_ssp245_r13i1p2f1_2020.nc',
 'WMS': 'https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/wms/birdhouse/testdata/xclim/cmip6/sic_SImon_CCCma-CanESM5_ssp245_r13i1p2f1_2020.nc',
 'NetcdfSubset': 'https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/ncss/birdhouse/testdata/xclim/cmip6/sic_SImon_CCCma-CanESM5_ssp245_r13i1p2f1_2020.nc'}

This is done by THREDDSLoader.extract_metadata.

I think a cleaner solution would be to rely on the THREDDS response itself for those access urls instead of the siphon implementation.

We can get the THREDDS access points by sending a get request to the same NcML service, but with parameters:
requests.get(url, params={"catalog": catalog, "dataset": dataset}) with

   catalog : str
      Link to catalog storing the dataset.
    dataset : str
      Relative link to the dataset.

With this modified request url, the response includes the following additional group THREDDSMetadata:

OrderedDict([('attributes',
              OrderedDict([('id',
                            'birdhouse/testdata/xclim/cmip6/sic_SImon_CCCma-CanESM5_ssp245_r13i1p2f1_2020.nc'),
                           ('full_name',
                            'cmip6/sic_SImon_CCCma-CanESM5_ssp245_r13i1p2f1_2020.nc')])),
             ('groups',
              OrderedDict([('services',
                            OrderedDict([('attributes',
                                          OrderedDict([('httpserver_service',
                                                        'https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/fileServer/birdhouse/testdata/xclim/cmip6/sic_SImon_CCCma-CanESM5_ssp245_r13i1p2f1_2020.nc'),
                                                       ('opendap_service',
                                                        'https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/dodsC/birdhouse/testdata/xclim/cmip6/sic_SImon_CCCma-CanESM5_ssp245_r13i1p2f1_2020.nc'),
                                                       ('wcs_service',
                                                        'https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/wcs/birdhouse/testdata/xclim/cmip6/sic_SImon_CCCma-CanESM5_ssp245_r13i1p2f1_2020.nc?service=WCS&version=1.0.0&request=GetCapabilities'),
                                                       ('wms_service',
                                                        'https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/wms/birdhouse/testdata/xclim/cmip6/sic_SImon_CCCma-CanESM5_ssp245_r13i1p2f1_2020.nc?service=WMS&version=1.3.0&request=GetCapabilities'),
                                                       ('nccs_service',
                                                        'https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/ncss/birdhouse/testdata/xclim/cmip6/sic_SImon_CCCma-CanESM5_ssp245_r13i1p2f1_2020.nc/dataset.html')]))])),
                           ('dates',
                            OrderedDict([('attributes', OrderedDict())]))]))])

This yields an id, and a list of services (note the keys are not the same as above, underlining the fact that the siphon implementation may be arbitrarily assigning names).

My feeling is that the function STAC_item_from_metadata should rely on the latter instead of the former, so it doesn't depend on custom logic hidden in the THREDDSLoader.extract_metadata.

The other bit of additional logic that we should get rid of is attrs["attributes"] = numpy_to_python_datatypes(attrs["attributes"]). I think this could be implemented into to_cf_dict, and I'm willing to make an xncml release with this in case you agree with the changes proposed here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions