Skip to content

Add option to cycle Hazard and Centroids with Parquet files #1055

@emanuel-schmid

Description

@emanuel-schmid

Since the refactoring of Centroids in Climada 5.0, their coordinates are stored as a GeoDataFrame.geometyr and not as numpy arrays, lat/lon anymore.

This has the following consequences:

  • the files produced by write_hdf5 are much bigger than they used to be. Here's an example for Centroids in a grid of size 5'760'000:
geometry saved as uncompressed compressed
shapely.Point (current) 210M 167M
wkb, byte array (planned) 177M 134M
x and y (no geometry) 134M 15M
  • when reading the hdf files with pickled Points, the risk for exceeding memory limitations is quite high. With a memory limit of 4G, I have not been able to read them without killing the kernel.

It has already been the plan to store Centroids.gdf geometries in wkb format, like the ones in Exposures.gdf.
This would alleviate the problem somewhat: lower memory requirements, smaller files (-20%)

However: if we converted the geometry column to x/y columns prior of storing and vice versa after reading, the files would be 90% smaller, and reading/writing faster.
This only works if the Centroids are really points and not another type of geometry.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions