-
Notifications
You must be signed in to change notification settings - Fork 142
Open
Labels
accepting pull requestContribute by raising a pull request to resolve this issue!Contribute by raising a pull request to resolve this issue!performance
Description
Since the refactoring of Centroids
in Climada 5.0, their coordinates are stored as a GeoDataFrame.geometyr
and not as numpy
arrays, lat/lon anymore.
This has the following consequences:
- the files produced by
write_hdf5
are much bigger than they used to be. Here's an example for Centroids in a grid of size 5'760'000:
geometry saved as | uncompressed | compressed |
---|---|---|
shapely.Point (current) |
210M | 167M |
wkb, byte array (planned) | 177M | 134M |
x and y (no geometry) | 134M | 15M |
- when reading the hdf files with pickled
Point
s, the risk for exceeding memory limitations is quite high. With a memory limit of 4G, I have not been able to read them without killing the kernel.
It has already been the plan to store Centroids.gdf
geometries in wkb format, like the ones in Exposures.gdf
.
This would alleviate the problem somewhat: lower memory requirements, smaller files (-20%)
However: if we converted the geometry column to x/y columns prior of storing and vice versa after reading, the files would be 90% smaller, and reading/writing faster.
This only works if the Centroids
are really points and not another type of geometry.
Metadata
Metadata
Assignees
Labels
accepting pull requestContribute by raising a pull request to resolve this issue!Contribute by raising a pull request to resolve this issue!performance