NetCDF - Zarr Users Guide #76
Replies: 3 comments 21 replies
-
|
This is intriguing - but I'm not quite seeing how this fits in yet - can you give a concrete example of a problem which is solved with the NZUG that is not solved without it? I'm asking because the content in the NetCDF Users Guide (NUG) seems mostly like the sort of content that already exists for zarr, or just isn't relevant to it. |
Beta Was this translation helpful? Give feedback.
-
|
I think this is a great idea and should be formalized as a Zarr Convention. Thanks @dblodgett-usgs for writing it up! |
Beta Was this translation helpful? Give feedback.
-
|
Seems like https://github.com/zarr-conventions/CF has a lot of overlap with this proposal -- are they different in some way? |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi Zarr Developers -- I hope this message finds you safe and thriving!
As many of you are aware, the topic of NetCDF (specifically the widely adopted CF convention for NetCDF) has been a hot one with relation to Zarr since its inception. Yet in all the time that Zarr has been able to store data from NetCDF-CF sources, there has always been a degree of ambiguity in the finer points of encoding / round tripping data.
I've personally dipped in and out of this topic over the years and have recently come back to it in earnest. After a fair bit of rumination, I think I've realized something that could unstick the topic.
In short -- we need a NetCDF User's Guide for Zarr that defines the structure upon which CF is built to be defined by the community that stewards the Zarr format.
Background: Referred to simply as the "NUG" in the cf-convention -- it is the formative document providing an underlying structural reference upon which CF is built. The NUG is also simply a user's manual that Unidata put out when it first published NetCDF (the API, the format, etc.). As a result, it has content that pertains to much much more than the NetCDF file format itself. The details of the file format in the NUG have been stable / advancing over time as NetCDF3 transitioned to 4 and features were added. The core of the NetCDF data model is defined in the NUG and has been the same since its inception.
So what would a "NZUG" look like and who would own/steward it? I'd like to imagine a world where it would look like a zarr-developers repository modeled on the zarr-spec and the zarr-convention template. It would break some of the rules of zarr-conventions because it's structural, but it would be wholly defined on top of the Zarr V3 spec. It could, like the NUG, include both normative and informative content where the normative stuff would be strictly versioned and controlled and the informative stuff could evolve according to practice and ongoing need.
Scope, you ask? Read the front matter of a mockup spec I pasted below the fold.
I have mocked up what the core of such a thing would look like (with extensive back and forth with Claude). Some of the front matter to give folks a little more fidelity on what I'm suggesting is pasted below the fold on this discussion opener.
I'm very much in spitball / exploratory mode here. A discussion lit off on the cf-conventions discussion board over the last several days and this is one of the things I've been toying with related to that.
Good idea? Bad idea? What on earth are you talking about Dave? How do folks feel about the concept?
Cheers!!
Dave
p.s. my host organization (DOI-USGS) has very strict limits on creating new projects in my personal user or new projects in our organization's collection of open source projects. As a result, sharing this whole project will require a little hacking or someone to make a repo for me to drop the content into. I've formatted it as a zarr-convention and could contribute to the zarr-experiments or some other throw away place if there's a good place to do it.
Description
NZUG-1.0 (NetCDF Zarr Users Guide) is a Zarr convention that defines a structural interoperability layer for array-oriented scientific data on top of the Zarr v3 specification. It serves the same structural role for the Zarr ecosystem that the NetCDF Users Guide (NUG) serves for netCDF: it defines a shared vocabulary of structural concepts, a typed data model, naming rules, and a small set of reserved attributes that domain conventions can be written against without modification.
The primary design goal is to allow existing domain conventions — in particular the CF Metadata Conventions — to be applied to Zarr v3 datasets by replacing the NUG as the underlying structural reference, with no changes required to the domain convention itself.
NZUG-1.0 is not a geospatial convention, not a CF extension, and not a replacement for the Zarr v3 specification. It is the structural layer between the format specification and domain conventions.
Motivation
Separation of Concerns
NZUG-1.0 is explicitly scoped to the structural interoperability layer. The following are out of scope and left to conventions that build on NZUG:
This scoping is intentional and load-bearing. The geospatial and Earth science communities are actively debating CRS encoding, axis order, coordinate role identification, and related questions. NZUG does not enter those debates. It provides the structural floor on which they can be settled independently.
Relationship to the Zarr v3 Specification
NZUG-1.0 adds normative constraints and definitions above the Zarr v3 specification. It does not modify or contradict the Zarr v3 spec. Every valid NZUG-1.0 dataset is a valid Zarr v3 dataset. Implementations that support Zarr v3 can read NZUG-1.0 datasets; they will simply not enforce the additional constraints defined here.
Relationship to Domain Conventions
A domain convention is a document that specifies scientific semantics for datasets built on a structural interoperability layer. The CF Metadata Conventions are the primary domain convention this document is designed to support. NZUG-1.0 provides to a CF-Zarr profile exactly what the NUG provides to CF-netCDF.
Beta Was this translation helpful? Give feedback.
All reactions