-
Notifications
You must be signed in to change notification settings - Fork 117
Open
Description
Discussion is long ongoing about improving the icepyx.Read module, especially for working in the cloud. Posts across this and other repos attempt to document all the parts of the journey to here, but the goal of this issue is documenting some high level next steps for making that dream a reality. Please ask in a comment for more details on a particular piece.
Key tasks
(some of these may already have machinery in place)
- based on the product, determine if data is point (hdf5) or raster (netcdf)
- determine if the user is working locally or in the cloud
- create an appropriate mechanism for the Read Class (subclasses? submodules?) that allows us to take the data type and location information (along with the input data/granule URLs and wanted variables list) and use them to actually read in data
Pipelines we hope to implement
- local reads of point data into xarray (current functionality)
- local reads of raster data into xarray (current functionality)
- local reads of point data using h5py into Pandas
- cloud reads of point data using h5coro into Pandas
- cloud reads of raster data into xarray
Things we're dreaming about
- accept ipx.Query objects or earthaccess.results objects (latter are still a WIP) into ipx.Read()
- an option for users to call SlideRule if they need custom data processing in the cloud
- versions of the above pipelines for larger-than-memory datasets and/or into cloud optimized storage options for further processing
Any and all contributions towards these efforts welcome!
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels