Skip to content

Allow entirely in-memory loading of rasters that can be shared between multiprocessing processesΒ #33

@mdales

Description

@mdales

Python's multiprocessing support generally makes it hard to use complex shared objects, and so when doing parallel operations Yirgacheffe will close down everything and re-open it when "passed" from parent to child, and uses manually managed shared memory segments for returning data from children back to the parent.

This only works when you do parallel_save, and ideally Yirgacheffe would let you use multiprocessing map to load a few common datasets and than map over specific species - for example in LIFE we load the elevation map for each species we process, and that is thus sitting in memory many times. We should load it just once and share that.

The problem with this is GDAL - you just can't share GDAL objects in Python across child processes.

For the case where we'd like to force load a GeoTIFF entirely into memory we'd like a version of RasterLayer that doesn't actually use GDAL under the hood, and just loads all the data into memory in a shared memory segment, and read_array operations access this rather than call to GDAL's ReadAsArray.

(if we do this we could also stop using GDAL for RasterLayer too if easy, but that's not a particular pain point right now)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions