Generate tiled Sentinel-2 image chips and masks for multiclass hazard detection using Google Earth Engine (GEE).
- Load AOI (boundary) and multiple class shapefiles (GeoJSON, SHP , or zipped SHP)
- Fetch Sentinel-2 SR imagery with cloud filtering
- Generate per-tile image chips and masks with class_id labels
- Export tiles in batch to Google Drive (or Kaggle, optional)
- Configurable pipeline via configs/config.yaml
- Clone the repository
git clone https://github.com/mohamadrahdan/gee-sentinel2-multiclass-dataset-generator.git cd gee-sentinel2-multiclass-dataset-generator python -m venv .venv && source .venv/bin/activate pip install -r requirements.txt
-
Provide your own shapefiles for the boundary and hazard classes.
Do not put these inside the repo (they’re private). -
Place them in a private folder, for example:
D:/hazard-data/areakomeh.zip
D:/hazard-data/landslides_merged.zip
D:/hazard-data/PseudoLandslides_merged.zip
D:/hazard-data/NonLandslides_merged.zip
-
Set the environment variable
DATA_ROOT
to that folder:-
Windows (PowerShell):
setx DATA_ROOT "D:\hazard-data"
-
Linux/macOS:
export DATA_ROOT=/Users/me/hazard-data
-
Google Colab:
from google.colab import drive drive.mount('/content/drive') import os os.environ["DATA_ROOT"] = "/content/drive/MyDrive/hazard-data"
-
- Example configuration:
output_dir: "./outputs" boundary_path: "areakomeh.zip" classes: - { name: "landslide", path: "landslides_merged.zip", class_id: 1 } - { name: "pseudo_landslide", path: "PseudoLandslides_merged.zip", class_id: 2 } - { name: "non_landslide", path: "NonLandslides_merged.zip", class_id: 3 } start_date: "2024-01-01" end_date: "2024-12-31" cloud_max: 10 tile_size: 256 bands: ["B02","B03","B04","B08"] output_destination: "local" # local | drive | kaggle kaggle_dataset_slug: "mohamadrahdan/gee-s2-multiclass" export: drive_folder: "gee_s2_multiclass" image_prefix: "img_" mask_prefix: "mask_" format: "GEO_TIFF" scale: 10 max_tiles: 200
-
Local usage
Make sure to activate your virtual environment and set
DATA_ROOT
first.jupyter notebook notebooks/gee-s2-multiclass-dataset-generator.ipynb
-
Google Colab
Click the "Open in Colab" badge above.
Mount your Drive, set DATA_ROOT, and run the cells.
-
Quick Preview
The notebook prints download URLs for one tile + mask. -
Batch Export
- Tasks are created in Earth Engine.
- Files are saved in your Google Drive folder:
gee_s2_multiclass
- Filenames follow this format:
img_00001.tif
,mask_00001.tif
, etc.
- After export, you can push the results to Kaggle:
python tools/upload_to_kaggle.py \
--slug yourname/gee-s2-multiclass \
--dir ./outputs \
--init --public
- For subsequent versions:
python tools/upload_to_kaggle.py \
--slug yourname/gee-s2-multiclass \
--dir ./outputs \
--message "update v0.2"
- B02 / B03 / B04 (Blue, Green, Red – 10m): true-color composites, quality control (QC)
- B08 (Near Infrared – 10m): vegetation index (NDVI), critical for hazard mapping
- Q: Where do I put my shapefiles?
A: In a private folder outside Git. Point to it viaDATA_ROOT
. - Q: Can I add more classes?
A: Yes, just add them underclasses
inconfig.yaml
with a uniqueclass_id
. - Q: How do I avoid thousands of tasks?
A: Usemax_tiles
inconfig.yaml
and export in batches. - Q: Do outputs go to GitHub?
A: No,outputs/
anddata/
are ignored via.gitignore
.
MIT License © 2025 Mohamad Rahdan