Skip to content

Using datatrees to represent datasets #809

@abkfenris

Description

@abkfenris

Is your feature request related to a problem? Please describe.

Not many forecasts are stored as init x lead, so that can add additional dataset wrangling for users or data providers are asked to store an additional copy of data.

Describe the solution you'd like

Datatree is working to create a tree-like data structure for Xarray. Datatrees can correspond to NetCDF groups or other hierarchies of datasets.

One of the ways this datatrees can be used is to collect related but non align-able datasets. I think this property could make datatree useful as datasets can be stored within a tree as they are structured on disk. Then a datatreeaccessor can be used to aggregate and reshape the underlying datasets for access and analysis.

I've started exploring using datatrees for forecasts in xarray_fmrc. I've initially modeled it off of THREDDS forecast model run collections, but I think it could support other forecast presentations like climpred's init x lead dataset structure.

I'm mainly coming about this with my data provider hat on, so input from researchers would be really nice (most of my forecast users are fishermen, sailors, surfers, and other folks on the water and around the waterfront, not scientists). There is a discussion going on the Pangeo Discourse.

Describe alternatives you've considered

Lots of individual datasets in ERDDAP, making users assemble things themselves.

Additional context

Relevant datatree links:

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions