-
Notifications
You must be signed in to change notification settings - Fork 3
Description
Python's multiprocessing support generally makes it hard to use complex shared objects, and so when doing parallel operations Yirgacheffe will close down everything and re-open it when "passed" from parent to child, and uses manually managed shared memory segments for returning data from children back to the parent.
This only works when you do parallel_save, and ideally Yirgacheffe would let you use multiprocessing map to load a few common datasets and than map over specific species - for example in LIFE we load the elevation map for each species we process, and that is thus sitting in memory many times. We should load it just once and share that.
The problem with this is GDAL - you just can't share GDAL objects in Python across child processes.
For the case where we'd like to force load a GeoTIFF entirely into memory we'd like a version of RasterLayer that doesn't actually use GDAL under the hood, and just loads all the data into memory in a shared memory segment, and read_array operations access this rather than call to GDAL's ReadAsArray.
(if we do this we could also stop using GDAL for RasterLayer too if easy, but that's not a particular pain point right now)