Skip to content

Commit fc7f702

Browse files
feature: experimental gribjump source (#689)
* wip: first rough draft of a GribJumpSource * wip: experimental tests for easier development * format changes using pre-commit hooks and fix small bug * add prototype for the GribJumpSource based on SimpleFieldList * tests: add a few more simple tests and add NO_GRIBJUMP flag for pytest * tidy: small cleanup, improve variable naming and fix type hints * use original type in request dictionaries to make .sel more intuitive * assign grid index to each value in to_xarray * tidy: add some more error handling and improve tests * refactor: introduce ExtractionRequest wrapper that combines pygribjump.ExtractionRequest and original fdb request dict * feat(test): modify (now failing) test to expect latitude and longitude information * feat: wip: allow reference lat/lons to be loaded from an fdb reference field * refactor: move hardcoded test fixtures into pytest fixtures * test: add failing test showing bug with geography for gridded extracts * docs: add notebook draft with example usage of gribjump source * tidy: move validation that extract request share the same ranges * docs: add documentation for gribjump source * docs: small fixes of markdown syntax * feat: wip experiment to verify gridspec of reference field * fix: force flattened array in xarray dataset creation * refactor: tidy up the metadata enrichment a bit * refactor: create ExtractionRequestCollection * refactor: use FDBRetriever to load reference metadata * docs: add example for masks and indices to notebook * feat: enforce that masks are 1D boolean arrays * refactor: simplify by condensing request splitting utilities into one function * feat: remove verifiation functionality for now, to be added later * tidy: small renamings and docstrings * tidy: comments * fix: type hint and name * feat: convert masks to ranges once for significant speedups * test: add another test for mask_to_ranges * tidy: small comment/docstring changes * tidy: make warning about missing validation in docs more explicit * docs: improve wording of warning * fix: allow fdb and gribjump to be configured via FDB5_CONFIG and GRIBJUMP_HOME * chore: update docstring * feat: pass log context to gribjump * refactor: simplify gribjump log context * test: add t_gribjump.grib test data with expver xxxx * add pygribjump as an optional dependency * docs: clarify gribjump install instructions and dependency handling * docs: move warning before parameters section * docs: clarify parameter description and types * add pyfdb as a gribjump group dependency and update docs * last docs and typo fixes * docs: change notebook to also set FDB_HOME * docs: reference gribjump example notebook in missing locations
1 parent a312ba5 commit fc7f702

File tree

10 files changed

+2095
-12
lines changed

10 files changed

+2095
-12
lines changed

docs/examples/gribjump.ipynb

Lines changed: 972 additions & 0 deletions
Large diffs are not rendered by default.

docs/examples/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,7 @@ Data sources
3030
polytope_feature.ipynb
3131
s3.ipynb
3232
wekeo.ipynb
33+
gribjump.ipynb
3334

3435
GRIB
3536
++++++

docs/guide/sources.rst

Lines changed: 81 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -66,6 +66,8 @@ We can get data from a given source by using :func:`from_source`:
6666
- retrieve data from `WEkEO`_ using the WEkEO grammar
6767
* - :ref:`data-sources-wekeocds`
6868
- retrieve `CDS <https://cds.climate.copernicus.eu/>`_ data stored on `WEkEO`_ using the `cdsapi`_ grammar
69+
* - :ref:`data-sources-gribjump`
70+
- retrieve data from the `FDB (Fields DataBase)`_ using the `gribjump`_ library
6971
* - :ref:`data-sources-zarr`
7072
- load data from a `Zarr <https://zarr.readthedocs.io/en/stable/>`_ store
7173

@@ -1231,6 +1233,85 @@ wekeocds
12311233
- :ref:`/examples/wekeo.ipynb`
12321234

12331235

1236+
.. _data-sources-gribjump:
1237+
1238+
gribjump
1239+
--------
1240+
1241+
.. py:function:: from_source("gribjump", request, *, ranges=None, mask=None, indices=None, fetch_coords_from_fdb=False, fdb_kwargs=None, **kwargs)
1242+
:noindex:
1243+
1244+
The ``gribjump`` source enables fast retrieval of GRIB message subsets from the `FDB (Fields DataBase)`_ using the `gribjump <https://github.com/ecmwf/gribjump/>`_ library.
1245+
Both `pygribjump <https://pypi.org/project/pygribjump/>`_ and `pyfdb`_ must be installed. The `pygribjump`_ package uses `findlibs <https://github.com/ecmwf/findlibs>`_ to locate an installation of the `gribjump`_ library.
1246+
If the library is not available on your system, you can install it via the `gribjumplib <https://pypi.org/project/gribjumplib/>`_ wheel from PyPI.
1247+
Installing `gribjumplib` from PyPI will also automatically install `fdb5lib <https://pypi.org/project/fdb5lib/>`_ and other dependencies, which may take priority over any existing installations on your system.
1248+
1249+
.. warning::
1250+
⚠️ This source is **experimental** and may change in future versions without
1251+
warning. It performs **no validation** that the specified grid indices,
1252+
masks, or ranges correspond to the fields' actual underlying grids.
1253+
**Incorrect usage may silently return wrong data points.**
1254+
The provided ranges or masks might correspond to unexpected points on the
1255+
grid. This source is also currently **not thread-safe**.
1256+
1257+
Exactly one of the parameters ``ranges``, ``mask`` or ``indices`` must be specified at a time.
1258+
1259+
:param request: the FDB request as a dictionary. GribJump requires strict value formatting
1260+
(e.g., hdates as "YYYYMMDD", not "YYYY-MM-DD"). Format errors may result in "DataNotFound" errors.
1261+
:type request: dict
1262+
:param ranges: a list of tuples specifying the ranges of 1D grid indices to retrieve in the form
1263+
[(start1, end1), (start2, end2), ...]. Ranges are exclusive, meaning that the end index is not included in the range.
1264+
:type ranges: list[tuple[int, int]], optional
1265+
:param mask: a 1D boolean mask specifying which grid points to retrieve
1266+
:type mask: numpy.array, optional
1267+
:param indices: a 1D array of grid indices to retrieve
1268+
:type indices: numpy.array, optional
1269+
:param fetch_coords_from_fdb: if ``True``, loads the first field's metadata from
1270+
the FDB to extract the coordinates at the specified indices. If ``False``, the
1271+
coordinates are not loaded and no separate FDB request is made.
1272+
Default is ``False``. Please note that no validation is performed to
1273+
ensure that all fields in the requests share the same grid.
1274+
:type fetch_coords_from_fdb: bool, optional
1275+
:param fdb_kwargs: only used when ``fetch_coords_from_fdb=True``. A dict of
1276+
keyword arguments passed to the `pyfdb.FDB` constructor. This allows to
1277+
specify the FDB configuration, user configuration, etc. If not provided,
1278+
the default configuration is used. These arguments are only passed to the
1279+
FDB when fetching coordinates and are not used by GribJump for the
1280+
extraction itself.
1281+
:type fdb_kwargs: dict, optional
1282+
1283+
1284+
The following example retrieves a subset from a GRIB message in the FDB using a boolean mask:
1285+
1286+
.. code-block:: python
1287+
1288+
import earthkit.data as ekd
1289+
import numpy as np
1290+
1291+
request = {
1292+
"class": "od",
1293+
"type": "fc",
1294+
"stream": "oper",
1295+
"expver": "0001",
1296+
"repres": "gg",
1297+
"levtype": "sfc",
1298+
"param": "2t",
1299+
"date": "20250703",
1300+
"time": 0,
1301+
"step": list(range(0, 24, 6)),
1302+
"domain": "g",
1303+
}
1304+
1305+
ranges = [(0, 10), (20, 30)]
1306+
1307+
source = ekd.from_source("gribjump", request, ranges=ranges)
1308+
ds = source.to_xarray()
1309+
1310+
Further examples:
1311+
1312+
- :ref:`/examples/gribjump.ipynb`
1313+
1314+
12341315

12351316
.. _data-sources-zarr:
12361317

docs/install.rst

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -51,6 +51,7 @@ Alternatively, you can install the following components:
5151
- covjsonkit: provides access to CoverageJSON data served by the :ref:`data-sources-polytope` source
5252
- s3: provides access to non-public :ref:`s3 <data-sources-s3>` buckets (new in version *0.11.0*)
5353
- geotiff: adds GeoTIFF support (new in version *0.11.0*). Please note that this is not included in the ``[all]`` option and has to be invoked separately.
54+
- gribjump: provides access to the :ref:`data-sources-gribjump` source
5455
- zarr: provides access to the :ref:`data-sources-zarr` source (new in version *0.15.0*). Please note that this is not included in the ``[all]`` option and has to be invoked separately.
5556

5657
E.g. to add :ref:`data-sources-mars` support you can use:
@@ -85,3 +86,9 @@ FDB
8586
+++++
8687

8788
For FDB (Fields DataBase) access FDB5 must be installed on the system. See the `FDB documentation <https://fields-database.readthedocs.io/en/latest/>`_ for details.
89+
90+
91+
GribJump
92+
++++++++++++
93+
94+
For FDB access with GribJump, both FDB5 and GribJump must be installed on the system. See the `GribJump project <https://github.com/ecmwf/gribjump>`_ for details.

pyproject.toml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -47,7 +47,7 @@ dependencies = [
4747
"xarray>=0.19",
4848
]
4949
optional-dependencies.all = [
50-
"earthkit-data[cds,covjsonkit,ecmwf-opendata,fdb,geo,geopandas,mars,odb,polytope,projection,s3,wekeo]",
50+
"earthkit-data[cds,covjsonkit,ecmwf-opendata,fdb,geo,geopandas,gribjump,mars,odb,polytope,projection,s3,wekeo]",
5151
]
5252
optional-dependencies.cds = [ "cdsapi>=0.7.2" ]
5353
optional-dependencies.ci = [ "numpy" ]
@@ -70,6 +70,7 @@ optional-dependencies.fdb = [ "pyfdb>=0.1" ]
7070
optional-dependencies.geo = [ "earthkit-geo>=0.2" ]
7171
optional-dependencies.geopandas = [ "geopandas" ]
7272
optional-dependencies.geotiff = [ "pyproj", "rasterio", "rioxarray" ]
73+
optional-dependencies.gribjump = [ "pyfdb>=0.1", "pygribjump" ]
7374
optional-dependencies.mars = [ "ecmwf-api-client>=1.6.1" ]
7475
optional-dependencies.odb = [ "pyodc" ]
7576
optional-dependencies.polytope = [ "polytope-client>=0.7.6" ]

0 commit comments

Comments
 (0)