fix two performance bottlenecks in the gribjump source by andreas-grafberger · Pull Request #877 · ecmwf/earthkit-data

andreas-grafberger · 2026-01-13T16:36:06Z

Fixes two small performance bottlenecks that made retrieval for large time-series unnecessarily slow:

If fetch_coords_from_fdb=True, the reference latitudes and longitudes were read from the reference field again for each individually retrieved field. Originally, I somehow assumed they would be cached together with the GribMetadata object and only noticed this issue recently during some benchmarks. I propose to simply fix this for now by explicitly caching latitudes/longitudes.
If indices instead of ranges were used, they would get converted into a list of ranges (by this function) again for each single field. As a simple fix, I propose that we just convert indices to ranges once directly in this source. I will follow up on this in a future PR to find a cleaner fix.

I performed a few ad-hoc benchmarks for these two fixes:

Retrieving one month of hydrological reanalysis data shows that the wall-clock time with the same configuration (same fdb, #threads, ~1k random points) goes from ~1 min to ~18 seconds.
When retrieving 4 years of hydrological reanalysis data, ~14% of that time were spent in repeated calls to (ExtractionRequest.from_indices). This fix would remove this overhead.

Contributor Declaration

By opening this pull request, I affirm the following:

All authors agree to the Contributor License Agreement.
The code follows the project's coding standards.
I have performed self-review and added comments where needed.
I have added or updated tests to verify that my changes are effective and functional.
I have run all existing tests and confirmed they pass.

codecov-commenter · 2026-01-13T17:01:10Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 85.35%. Comparing base (4acd81b) to head (b44dce5).

Additional details and impacted files

@@           Coverage Diff            @@
##           develop     #877   +/-   ##
========================================
  Coverage    85.35%   85.35%           
========================================
  Files          184      184           
  Lines        14562    14562           
  Branches       732      732           
========================================
  Hits         12429    12429           
  Misses        1922     1922           
  Partials       211      211

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

ChrisspyB

seems to make sense

tests/sources/test_gribjump.py

sandorkertesz · 2026-01-19T10:51:59Z

@andreas-grafberger thank you for this improvement. Please synch this PR with develop and it can be merged.

Avoid recomputing per-field data by caching/precomputing: - Cache reference lat/lon when fetch_coords_from_fdb=True to avoid re-reading the reference field's geography per retrieved field. - Pre-convert index lists to ranges once to avoid repeated calls to ExtractionRequest.from_indices.

andreas-grafberger · 2026-01-21T13:21:27Z

@andreas-grafberger thank you for this improvement. Please synch this PR with develop and it can be merged.

@sandorkertesz Thank you! I rebased onto develop and the gribjump tests run locally on my machine with the latest commit.

From my side, either merging or waiting for #878 would be fine, which enables the gribjump tests in the CI. There are still a few hurdles I ran into, which will delay this by a couple of days.

andreas-grafberger force-pushed the perf/gribjump-coords-caching branch from 443fbc6 to e3c4c35 Compare January 14, 2026 14:26

andreas-grafberger changed the title ~~perf(gribjump): compute coordinates once instead of per-field~~ perf(gribjump): improve caching in gribjump source Jan 14, 2026

andreas-grafberger marked this pull request as ready for review January 14, 2026 15:29

andreas-grafberger requested review from ChrisspyB and sandorkertesz January 14, 2026 15:29

ChrisspyB approved these changes Jan 14, 2026

View reviewed changes

tests/sources/test_gribjump.py Show resolved Hide resolved

andreas-grafberger force-pushed the perf/gribjump-coords-caching branch from eea9c61 to b44dce5 Compare January 21, 2026 13:11

andreas-grafberger changed the title ~~perf(gribjump): improve caching in gribjump source~~ fix two performance bottlenecks in the gribjump source Jan 21, 2026

sandorkertesz merged commit 1ece921 into develop Jan 23, 2026
145 of 161 checks passed

andreas-grafberger deleted the perf/gribjump-coords-caching branch January 23, 2026 14:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix two performance bottlenecks in the gribjump source#877

fix two performance bottlenecks in the gribjump source#877
sandorkertesz merged 1 commit intodevelopfrom
perf/gribjump-coords-caching

andreas-grafberger commented Jan 13, 2026 •

edited

Loading

Uh oh!

codecov-commenter commented Jan 13, 2026 •

edited

Loading

Uh oh!

ChrisspyB left a comment

Uh oh!

Uh oh!

sandorkertesz commented Jan 19, 2026

Uh oh!

andreas-grafberger commented Jan 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

andreas-grafberger commented Jan 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Contributor Declaration

Uh oh!

codecov-commenter commented Jan 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

ChrisspyB left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sandorkertesz commented Jan 19, 2026

Uh oh!

andreas-grafberger commented Jan 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

andreas-grafberger commented Jan 13, 2026 •

edited

Loading

codecov-commenter commented Jan 13, 2026 •

edited

Loading