Skip to content

icepyx.Read updates summary #744

@JessicaS11

Description

@JessicaS11

Discussion is long ongoing about improving the icepyx.Read module, especially for working in the cloud. Posts across this and other repos attempt to document all the parts of the journey to here, but the goal of this issue is documenting some high level next steps for making that dream a reality. Please ask in a comment for more details on a particular piece.

Key tasks

(some of these may already have machinery in place)

  • based on the product, determine if data is point (hdf5) or raster (netcdf)
  • determine if the user is working locally or in the cloud
  • create an appropriate mechanism for the Read Class (subclasses? submodules?) that allows us to take the data type and location information (along with the input data/granule URLs and wanted variables list) and use them to actually read in data

Pipelines we hope to implement

  • local reads of point data into xarray (current functionality)
  • local reads of raster data into xarray (current functionality)
  • local reads of point data using h5py into Pandas
  • cloud reads of point data using h5coro into Pandas
  • cloud reads of raster data into xarray

Things we're dreaming about

  • accept ipx.Query objects or earthaccess.results objects (latter are still a WIP) into ipx.Read()
  • an option for users to call SlideRule if they need custom data processing in the cloud
  • versions of the above pipelines for larger-than-memory datasets and/or into cloud optimized storage options for further processing

Any and all contributions towards these efforts welcome!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions