This repo documents the workflow used to build the WorldCereal crop calendars and provides legacy outputs for reference. All official inputs and outputs are published on Zenodo and must be used when reproducing results.
Primary dataset (inputs + outputs): DOI 10.5281/zenodo.17866271 (version v2, published Dec 7, 2025; last modified Jan 27, 2026)
- Download the dataset from Zenodo using the DOI above.
- Extract the files to a local folder (example:
DATA_ROOT=/path/to/zenodo_worldcereal). - Use the files in that folder for all processing steps below.
Optional local example (ignored by git): this repo includes data_root_example/ as a safe
place to unpack Zenodo files and validate paths. You can point DATA_ROOT there.
Recommended local layout after download/extract:
DATA_ROOT/
auxiliar_data.zip
NDVI_hants.zip
summer_crops_dataset.csv
winter_crops_dataset.csv
S1_SOS_WGS84.tif
S1_EOS_WGS84.tif
S2_SOS_WGS84.tif
S2_EOS_WGS84.tif
Phase1_legacy_cropcalendars.zip
doy_palette.qml
Unzip the archives before running:
unzip DATA_ROOT/auxiliar_data.zip -d DATA_ROOT/auxiliar_data
unzip DATA_ROOT/NDVI_hants.zip -d DATA_ROOT/NDVI_hantsNote: You can keep the legacy cropcalendars_phase1/ and cropcalendars_phase2/ folders for quick inspection,
but the authoritative data are on Zenodo.
Total size: ~36.2 GB
auxiliar_data.zip
doy_palette.qml
Global Crop Calendar Dataset Documentation.pdf
NDVI_hants.zip
Phase1_legacy_cropcalendars.zip
S1_EOS_WGS84.tif
S1_SOS_WGS84.tif
S2_EOS_WGS84.tif
S2_SOS_WGS84.tif
summer_crops_dataset.csv
winter_crops_dataset.csv
Optional checksum verification:
md5 NDVI_hants.zip
# or (Linux)
md5sum NDVI_hants.zipsrc/crop_calendars/scripts/: climate modeling scripts (XGBoost) to estimate SOS/EOS.src/crop_calendars/necessary/: auxiliary rasters and masks used by the scripts.src/rs_process.py: HANTS-based smoothing utilities for remote sensing time series.src/hants_3d.py: HANTS smoother that reads MODIS CMG NDVI from repo root.cropcalendars_phase1/: legacy outputs (Phase 1, Franch et al., 2022).cropcalendars_phase2/: legacy outputs (Phase 2, Moletto-Lobos et al., under review).
The workflow has three stages, in this order:
- Climate modeling (SOS/EOS prediction, first iteration)
- Remote sensing processing (HANTS + LSP extraction)
- Climate modeling again (SOS/EOS prediction with LSP predictors, second iteration)
Create the conda environment:
conda env create -f environment.yml
conda activate ewoc-calendarsScripts:
src/crop_calendars/scripts/wc_sos_xgboost_1st_lsp.pysrc/crop_calendars/scripts/wc_eos_xgboost_1st_lsp.pysrc/crop_calendars/scripts/sc_sos_xgboost_1st_lsp.pysrc/crop_calendars/scripts/sc_eos_xgboost_1st_lsp.py
Required inputs (from Zenodo):
winter_crops_dataset.csvandsummer_crops_dataset.csvauxiliar_data.zip(ERA5 data, masks, administrative boundaries, vegetation cover)
Recommended parameter mapping (paths):
file_path_ -> DATA_ROOT
necessary_path -> extracted auxiliar_data.zip folder
results_path -> local output folder (feature selection)
model_output -> local output folder (trained model)
What to edit in each script before running:
file_path_: folder that containswinter_crops_dataset.csvandsummer_crops_dataset.csvnecessary_path: folder that contains the auxiliary rasters and vectors (fromauxiliar_data.zip)results_path: where to write feature selection outputsmodel_output: where to save trained model files
Run example (one script at a time):
python src/crop_calendars/scripts/wc_sos_xgboost_1st_lsp.pyOutputs:
- Trained model files (JSON)
- Feature importance CSVs
- Predicted SOS/EOS values (per script logic)
These first-iteration calendars are used as the baseline window in the LSP step (they define the SOS/EOS range used to constrain the MODIS CMG time series).
The MODIS CMG NDVI time series used in the paper are based on MODIS Aqua MYD09CMG.
The HANTS-smoothed output is published on Zenodo as NDVI_hants.zip.
If you need to reproduce the smoothing step from raw MODIS CMG, use src/hants_3d.py
and point it to the HDF files (MYD09CMG).
Example: run HANTS from MODIS CMG HDFs (outputs split into 20 COG chunks):
DATA_ROOT=/path/to/zenodo_worldcereal \
python src/hants_3d.py \
--modis-cmg \
--input-dir "$DATA_ROOT" \
--modis-pattern "MYD09CMG*.hdf" \
--split-count 20 \
--output-format cog \
--output-dir outputs/hants_cog \
--output-prefix hants_myd09cmgTo apply HANTS to MODIS NDVI (high level):
- Load the NDVI stack into a 3D array shaped
(nx, ny, nt). - Call
HANTS_3Dfromsrc/rs_process.py. - Save the smoothed time series and proceed with feature extraction as in the modeling scripts.
LSP extraction (uses MODIS CMG, crop masks, and the 1st-iteration calendars):
src/lsp_world.pyreads MODIS CMG HDFs, applies QA masks, fills gaps, runs HANTS, and then extracts SOS/EOS/POK from the smoothed NDVI.- The HANTS smoothing and LSP extraction are executed split-by-split (20 row-wise chunks) to keep memory usage stable.
- Environment variables used by
lsp_world.py(defaults shown):DATA_ROOT(default:data_root_example)MODIS_CMG_DIR(default:$DATA_ROOT/MODIS_CMGor$DATA_ROOTif missing)WC_SOS_PATH(default:$DATA_ROOT/S1_SOS_WGS84.tif)WC_EOS_PATH(default:$DATA_ROOT/S1_EOS_WGS84.tif)CROP_MASK_PATH(default:$DATA_ROOT/auxiliar_data/CropCoverFraction_50km_3857.tif)LSP_OUTPUT_DIR(default:outputs/lsp_world)
LSP_USE_TERRA(set to1to also include MOD09CMG)CONTINENTS_SHP(default:$DATA_ROOT/auxiliar_data/ne_10m_admin_0_sovereignty/ne_10m_admin_0_sovereignty.shp)LSP_VAL_RATIO(default:0.3, continent-stratified split)LSP_SPLIT_SEED(default:42)
Validation points (70/30 split by continent):
- Place full points as GeoJSONs in
LSP_VALIDATION_DIR:maize_points.geojsonwheat_points.geojson
- On first run,
lsp_world.pygenerates:maize_train.geojson/maize_validation.geojsonwheat_train.geojson/wheat_validation.geojsonusing a 70/30 split stratified by continent.
After LSP metrics are generated, re-run the XGBoost scripts (second iteration) to train the final SOS/EOS models using the LSP-derived predictors.
Re-run the same four scripts as in stage 1, now pointing to the LSP-derived inputs and features generated in stage 2.
Prerequisites:
- Zenodo dataset extracted to
DATA_ROOT - MODIS CMG HDFs (MYD09CMG, optionally MOD09CMG) for the study period
- Validation points:
maize_points.geojsonwheat_points.geojsonplaced inLSP_VALIDATION_DIR(default:$DATA_ROOT/validation)
Steps:
- First iteration (climate-only baseline)
python src/crop_calendars/scripts/wc_sos_xgboost_1st_lsp.py
python src/crop_calendars/scripts/wc_eos_xgboost_1st_lsp.py
python src/crop_calendars/scripts/sc_sos_xgboost_1st_lsp.py
python src/crop_calendars/scripts/sc_eos_xgboost_1st_lsp.py- HANTS smoothing (MODIS CMG → HANTS)
DATA_ROOT=/path/to/zenodo_worldcereal \
python src/hants_3d.py \
--modis-cmg \
--input-dir "$DATA_ROOT" \
--modis-pattern "MYD09CMG*.hdf" \
--split-count 20 \
--output-format cog \
--output-dir outputs/hants_cog \
--output-prefix hants_myd09cmg- LSP extraction (HANTS + phenology metrics)
DATA_ROOT=/path/to/zenodo_worldcereal \
python src/lsp_world.py- Second iteration (climate + LSP)
python src/crop_calendars/scripts/wc_sos_xgboost_1st_lsp.py
python src/crop_calendars/scripts/wc_eos_xgboost_1st_lsp.py
python src/crop_calendars/scripts/sc_sos_xgboost_1st_lsp.py
python src/crop_calendars/scripts/sc_eos_xgboost_1st_lsp.pyNote: the scripts in src/crop_calendars/scripts/ require the path parameters
to be updated before running (see the section above).
Phase 1:
Franch, B., Cintas, J., Becker-Reshef, I., Sanchez-Torres, M. J., Roger, J., Skakun, S., ... & Whitcraft, A. (2022). Global crop calendars of maize and wheat in the framework of the WorldCereal project. GIScience & Remote Sensing, 59(1), 885-913. https://doi.org/10.1080/15481603.2022.2079273
Phase 2:
Moletto-Lobos, I. G., Franch, B., Guillem-Valls, A., Cyran, K., Kalecinski, N., Van Tricht, K., Vermote, E., Becker-Reshef, I., Nair, S., Degerickx, J., Butsko, C., Szantoi, Z., de Vos, K., Whitcraft, A. Enhancing WorldCereal Crop Calendars with Land Surface Phenology and Machine Learning. Available at SSRN: https://ssrn.com/abstract=5188749
Dataset:
Moletto-Lobos, I., Franch, B., Guillem-Valls, A., Cyran, K., Kalecinski, N., Van Tricht, K., ... & Szantoi, Z. (2025). WorldCereal crop calendars. Zenodo. DOI: 10.5281/zenodo.17866271