-
Notifications
You must be signed in to change notification settings - Fork 3
Improve peak memory allocation during polygon construction #200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
bc6fb8a to
e4d3e9a
Compare
|
|
Linux x64 VM, Python 3.14 from conda, numpy 2.3.4, shapely 2.1.2, xarray 2025.10.1 |
|
It also passes in whatever environment Github runners are |
|
Alright I am pretty convinced that this works across multiple environments! This is ready for review. |
Seems Windows uses slightly more memory and we were right on the limit. I am unsure what aspects introduce variance and if there is anything we can do about that.
david-sh-csiro
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested locally with MoVE. Triangulation of 1km resolution ugrid national dataset took around 37 seconds which might be slightly slower than previous results. Otherwise works as advertised. Memory usage was stable between 26% and 30% throughout the triangulation.
|
This PR shouldn't have affected triangulation much. That is a separate process that happens after constructing the polygons. Anything that was possible before should still be possible now, but hopefully we can now open even larger datasets. Constructing the polygons might be marginally slower if you have more polygons in your dataset than the batch size, but no noticeably slower outside of benchmarks. |
* origin/main: Update Python version in release automation workflow Bump version of Sphinx tools Bump minimum Python to 3.12, add 3.14 support


When opening particularly large datasets emsarray could crash, printing a memory error. The code would attempt to construct a numpy array containing coordinates for all polygons in the dataset, then construct all the polygons in one batch. Because of shapely internals, the polygon coordinate array would be copied and thus memory usage would be doubled, briefly, before the original polygon coordinate array was garbage collected.
This PR contains a number of improvements to this situation:
The memory figures in this PR are taken from runs on my laptop. We should test on multiple different systems to ensure that these figures are representative of multiple different environments.
Tested on:
Please run the following and copy the output to a comment, so we can compare memory usage:
Please also run the full test suite to verify that everything passes. No need to include the output of this: