-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
We want to support adoption by ESDIS by reporting:
- Why they would want to use this technology (home page)
- What do virtual stores enable?
- A vision for a future where all datasets are accessible through a single entrypoint
- How does it help data users?
- How does it help data providers and NASA overall save money?
- Technical Overview
- Application of Virtual Store Technology to NASA Data
- Mature
a. Consistently gridded data into collection-level aggregated chunk manifests (PODAAC) - Developing
b. Displacement data (ASF)
c. Ongoing data: Appending via icechunk or overwriting kerchunk JSON
a. different grids and compression schemes, L2/orbital swath data - What will not be supported
- Mature
- Known limitations:
- Icechunk is very python + rust centric
- Structural decisions about data formatting, chunking and chunk manifests made early on impact performance for different use cases. Still cannot simultaneously optimize for all use cases. Ideally, chunk manifests could be aggregated dynamically depending on the use case. For NISAR, for example, the current design for chunk manifests assumes users will be working with frames and thus optimizes for loading a chunk manifest per frame, but then you cannot easily load across frames.
- Governance decisions which need to be made
- Standards for where to put the metadata - collection-level and frame-level
- Established Best practices
- Typical use case patterns should be considered when designing the files/chunks and aggregated chunk manifests. For example, frames with NISAR data because of typical time series analysis.
This report would also include a link to:
- An onboarding guide for anyone who is looking to get started virtualizing data.
- An initial library of virtualization examples. These examples would represent the variety of data and use case patterns which have already been solved and serve as a resource for virtual layer producers.
A few more important points to be included:
- Adoption of GeoZarr and multiscales (recommendations)
- Chunk manifest protocol is in progress
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels