Skip to content

fix two performance bottlenecks in the gribjump source#877

Merged
sandorkertesz merged 1 commit intodevelopfrom
perf/gribjump-coords-caching
Jan 23, 2026
Merged

fix two performance bottlenecks in the gribjump source#877
sandorkertesz merged 1 commit intodevelopfrom
perf/gribjump-coords-caching

Conversation

@andreas-grafberger
Copy link
Contributor

@andreas-grafberger andreas-grafberger commented Jan 13, 2026

Fixes two small performance bottlenecks that made retrieval for large time-series unnecessarily slow:

  1. If fetch_coords_from_fdb=True, the reference latitudes and longitudes were read from the reference field again for each individually retrieved field. Originally, I somehow assumed they would be cached together with the GribMetadata object and only noticed this issue recently during some benchmarks. I propose to simply fix this for now by explicitly caching latitudes/longitudes.
  2. If indices instead of ranges were used, they would get converted into a list of ranges (by this function) again for each single field. As a simple fix, I propose that we just convert indices to ranges once directly in this source. I will follow up on this in a future PR to find a cleaner fix.

I performed a few ad-hoc benchmarks for these two fixes:

  1. Retrieving one month of hydrological reanalysis data shows that the wall-clock time with the same configuration (same fdb, #threads, ~1k random points) goes from ~1 min to ~18 seconds.
  2. When retrieving 4 years of hydrological reanalysis data, ~14% of that time were spent in repeated calls to (ExtractionRequest.from_indices). This fix would remove this overhead.

Contributor Declaration

By opening this pull request, I affirm the following:

  • All authors agree to the Contributor License Agreement.
  • The code follows the project's coding standards.
  • I have performed self-review and added comments where needed.
  • I have added or updated tests to verify that my changes are effective and functional.
  • I have run all existing tests and confirmed they pass.

@codecov-commenter
Copy link

codecov-commenter commented Jan 13, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 85.35%. Comparing base (4acd81b) to head (b44dce5).

Additional details and impacted files
@@           Coverage Diff            @@
##           develop     #877   +/-   ##
========================================
  Coverage    85.35%   85.35%           
========================================
  Files          184      184           
  Lines        14562    14562           
  Branches       732      732           
========================================
  Hits         12429    12429           
  Misses        1922     1922           
  Partials       211      211           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@andreas-grafberger andreas-grafberger force-pushed the perf/gribjump-coords-caching branch from 443fbc6 to e3c4c35 Compare January 14, 2026 14:26
@andreas-grafberger andreas-grafberger changed the title perf(gribjump): compute coordinates once instead of per-field perf(gribjump): improve caching in gribjump source Jan 14, 2026
@andreas-grafberger andreas-grafberger marked this pull request as ready for review January 14, 2026 15:29
Copy link
Member

@ChrisspyB ChrisspyB left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems to make sense

@sandorkertesz
Copy link
Collaborator

@andreas-grafberger thank you for this improvement. Please synch this PR with develop and it can be merged.

Avoid recomputing per-field data by caching/precomputing:
- Cache reference lat/lon when fetch_coords_from_fdb=True to avoid re-reading
  the reference field's geography per retrieved field.
- Pre-convert index lists to ranges once to avoid repeated calls to ExtractionRequest.from_indices.
@andreas-grafberger andreas-grafberger force-pushed the perf/gribjump-coords-caching branch from eea9c61 to b44dce5 Compare January 21, 2026 13:11
@andreas-grafberger
Copy link
Contributor Author

@andreas-grafberger thank you for this improvement. Please synch this PR with develop and it can be merged.

@sandorkertesz Thank you! I rebased onto develop and the gribjump tests run locally on my machine with the latest commit.

From my side, either merging or waiting for #878 would be fine, which enables the gribjump tests in the CI. There are still a few hurdles I ran into, which will delay this by a couple of days.

@andreas-grafberger andreas-grafberger changed the title perf(gribjump): improve caching in gribjump source fix two performance bottlenecks in the gribjump source Jan 21, 2026
@sandorkertesz sandorkertesz merged commit 1ece921 into develop Jan 23, 2026
145 of 161 checks passed
@andreas-grafberger andreas-grafberger deleted the perf/gribjump-coords-caching branch January 23, 2026 14:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants