Skip to content

Simplify and improve performance of habitat processing#30

Merged
mdales merged 2 commits intomainfrom
mwd-habitat-process
Nov 4, 2025
Merged

Simplify and improve performance of habitat processing#30
mdales merged 2 commits intomainfrom
mwd-habitat-process

Conversation

@mdales
Copy link
Copy Markdown
Contributor

@mdales mdales commented Nov 4, 2025

The original habitat processing script attempted to be clever about processing different habitat classes in parallel, within the constraints of how much memory there was free, as that was the biggest resource constraint. However:

  1. In practice it was rare that it used more than one thread due to how limited memory is and how conservative we have to be given the risks of being wrong (the linux OOM killer).
  2. GDAL works best when you can tell it to have at as much memory as there is, and by restricting it (or even leaving it at the default of using up to 25% of memory as a cache), we were not getting the most out of it
  3. To make this work we had to use private access to bits of Yirgacheffe
  4. Performance still wasn't that great.

I spent time looking into where we were not getting performance wins:

  1. In making the filter map, we didn't use parallel_save as we assumed an outer level of parallelism that in practice was very limited.
  2. GDAL warp was being constrained due to using LZW compression by writing the data out to a TIFF file.

In this PR:

  1. I've made the main body of the program single threaded, as we could never leverage that outer parallelism very much
  2. The map filter now uses as many CPUs as it can using parallel_save, considerably improving performance of that first stage
  3. We tell GDAL it can use as much active memory as there is free
  4. We move to using the GDAL /vsimem system rather than the "mem" driver in Yirgacheffe, as this seems to help GDAL use memory rather than actually caching things out to disk.
  5. Moving to /vsimem also means we can stop using private APIs in Yirgacheffe.

In general this seems to have doubled performance on my Mac Studio.

@mdales mdales requested a review from shaneweisz November 4, 2025 14:14
@mdales mdales merged commit eb1b2ab into main Nov 4, 2025
1 check passed
@mdales mdales deleted the mwd-habitat-process branch November 4, 2025 16:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant