GEFS GRIBs in S3: Scan GRIB and Fast Referencing Using Index Files with Zarr v2 — 24-Hour Accumulation Plot and Comparison with Dynamical.org GEFS #572

nishadhka · 2025-07-24T10:33:24Z

nishadhka
Jul 24, 2025

Following the earlier [GFS GRIB index kerchunk discussion](#530), we've applied a similar method for processing GEFS ensemble GRIBs in S3 using kerchunk, with efficient referencing and Zarr output but using zarr version2.

Step 1: One-Time Expensive Preprocessing
Script: run_gefs_preprocessing.py

Purpose: Create Parquet mapping files that describe the GRIB structure
When: Run once per ensemble member to generate the reference templates

Output: Parquet files stored in GCS, e.g.:

gs://bucket/gefs/{member}/gefs-time-{date}-{member}-rt{hour}.parquet

Why it's expensive: Scans actual GRIB data to build the index mapping
Reusability: These mapping files can be reused for different forecast dates!

Step 2: Fast Daily Processing
Script: run_day_gefs_ensemble_full.py

Purpose: Process new forecast dates using existing parquet structure + new GRIB .idx index
When: Run daily for each new forecast
How it works:
- Loads the pre-generated Parquet mapping templates from GCS
- Reads the .idx index files from the current day's S3 forecast
- Combines index with mapping to generate references without scanning the full GRIB

Step 3: Generate 24-Hour Accumulation Plots
Script: run_gefs_24h_accumulation.pyusing the generated paraquet in grib-index-kerchunk method

Computes accumulated precipitation and generates visual outputs

Step 4: Compare with Dynamical.org GEFS Zarr
Script: [test_compare_dynamical_zarr_gefs_24h_accumulation.py](https://github.com/icpac-igad/grib-index-kerchunk/blob/main/gefs/test_compare_dynamical_zarr_gefs_24h_accumulation.py)

Compares 24h accumulation against the GEFS Zarr dataset published at [dynamical.org](https://dynamical.org/)

Supporting Files

These files are hosted under [icpac-igad/grib-index-kerchunk](https://github.com/icpac-igad/grib-index-kerchunk/tree/main/gefs):

gefs_utils.py
run_day_gefs_ensemble_full.py
run_gefs_preprocessing.py
run_gefs_24h_accumulation.py
ea_ghcf_simple.geojson

Notes

We're using Zarr v2 for the outputs, which works well with Coiled and Dask clusters
This approach avoids redundant scanning and significantly reduces processing time for daily runs
Thanks to kerchunk and the new fast-indexing approach, GEFS ensemble referencing is now much more scalable
There is huge potential in improving the method with migration to Zarr version3, icechunk and obstore which could potentially help in real time rendering of the GRIB in frontend web application using webgl apps.

Let me know if others have tried similar approaches or have suggestions on improving the pipeline.

Some of the plots comparing the Dynamical.org GEFS zarr and grib-index-kerchunk method data streamed GEFS

martindurant · 2025-07-24T14:06:39Z

martindurant
Jul 24, 2025
Maintainer

Thanks for showing what looks like an excellent demonstration use-case! Do you plan on publicizing this anywhere? I'm sure people would be interested to know the resource/time required for the various steps, the storage requirements for the references and any benchmarks you can do for the final read performance.

1 reply

nishadhka Jul 25, 2025
Author

A demo on the method is available at this link https://www.youtube.com/watch?v=T-isngUXY30. The paraquet reference(step2) creation taking for 30 members 5 minutes and 2 minutes for the plot from the references.

nishadhka · 2025-07-24T16:26:42Z

nishadhka
Jul 24, 2025
Author

Thanks! I'm considering putting together a Jupyter notebook and a note on benchmarking. I'll share an update once it's ready.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

GEFS GRIBs in S3: Scan GRIB and Fast Referencing Using Index Files with Zarr v2 — 24-Hour Accumulation Plot and Comparison with Dynamical.org GEFS #572

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

GEFS GRIBs in S3: Scan GRIB and Fast Referencing Using Index Files with Zarr v2 — 24-Hour Accumulation Plot and Comparison with Dynamical.org GEFS #572

Uh oh!

nishadhka Jul 24, 2025

Supporting Files

Notes

Replies: 2 comments · 1 reply

Uh oh!

martindurant Jul 24, 2025 Maintainer

Uh oh!

nishadhka Jul 25, 2025 Author

Uh oh!

nishadhka Jul 24, 2025 Author

nishadhka
Jul 24, 2025

Replies: 2 comments 1 reply

martindurant
Jul 24, 2025
Maintainer

nishadhka Jul 25, 2025
Author

nishadhka
Jul 24, 2025
Author