GEFS GRIBs in S3: Scan GRIB and Fast Referencing Using Index Files with Zarr v2 — 24-Hour Accumulation Plot and Comparison with Dynamical.org GEFS #572
nishadhka
started this conversation in
Show and tell
Replies: 2 comments 1 reply
-
Thanks for showing what looks like an excellent demonstration use-case! Do you plan on publicizing this anywhere? I'm sure people would be interested to know the resource/time required for the various steps, the storage requirements for the references and any benchmarks you can do for the final read performance. |
Beta Was this translation helpful? Give feedback.
1 reply
-
Thanks! I'm considering putting together a Jupyter notebook and a note on benchmarking. I'll share an update once it's ready. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Following the earlier [GFS GRIB index kerchunk discussion](#530), we've applied a similar method for processing GEFS ensemble GRIBs in S3 using kerchunk, with efficient referencing and Zarr output but using zarr version2.
Step 1: One-Time Expensive Preprocessing
Script:
run_gefs_preprocessing.py
Purpose: Create Parquet mapping files that describe the GRIB structure
When: Run once per ensemble member to generate the reference templates
Output: Parquet files stored in GCS, e.g.:
Why it's expensive: Scans actual GRIB data to build the index mapping
Reusability: These mapping files can be reused for different forecast dates!
Step 2: Fast Daily Processing
Script:
run_day_gefs_ensemble_full.py
Purpose: Process new forecast dates using existing parquet structure + new GRIB
.idx
indexWhen: Run daily for each new forecast
How it works:
.idx
index files from the current day's S3 forecastStep 3: Generate 24-Hour Accumulation Plots
Script:
run_gefs_24h_accumulation.py
using the generated paraquet in grib-index-kerchunk methodStep 4: Compare with Dynamical.org GEFS Zarr
Script:
[test_compare_dynamical_zarr_gefs_24h_accumulation.py](https://github.com/icpac-igad/grib-index-kerchunk/blob/main/gefs/test_compare_dynamical_zarr_gefs_24h_accumulation.py)
Supporting Files
These files are hosted under
[icpac-igad/grib-index-kerchunk](https://github.com/icpac-igad/grib-index-kerchunk/tree/main/gefs)
:gefs_utils.py
run_day_gefs_ensemble_full.py
run_gefs_preprocessing.py
run_gefs_24h_accumulation.py
ea_ghcf_simple.geojson
Notes
Let me know if others have tried similar approaches or have suggestions on improving the pipeline.
Some of the plots comparing the Dynamical.org GEFS zarr and grib-index-kerchunk method data streamed GEFS

Beta Was this translation helpful? Give feedback.
All reactions