Skip to content

mohamadrahdan/gee-sentinel2-multiclass-dataset-generator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

43 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GEE Sentinel-2 Multiclass Dataset Generator

Open In Colab

Generate tiled Sentinel-2 image chips and masks for multiclass hazard detection using Google Earth Engine (GEE).

Features

  • Load AOI (boundary) and multiple class shapefiles (GeoJSON, SHP , or zipped SHP)
  • Fetch Sentinel-2 SR imagery with cloud filtering
  • Generate per-tile image chips and masks with class_id labels
  • Export tiles in batch to Google Drive (or Kaggle, optional)
  • Configurable pipeline via configs/config.yaml

Setup

  • Clone the repository
    git clone https://github.com/mohamadrahdan/gee-sentinel2-multiclass-dataset-generator.git
    
    cd gee-sentinel2-multiclass-dataset-generator
    
    python -m venv .venv && source .venv/bin/activate
    
    pip install -r requirements.txt
    
    

Private Inputs (AOI + Classes)

  • Provide your own shapefiles for the boundary and hazard classes.
    Do not put these inside the repo (they’re private).

  • Place them in a private folder, for example:

    • D:/hazard-data/areakomeh.zip
    • D:/hazard-data/landslides_merged.zip
    • D:/hazard-data/PseudoLandslides_merged.zip
    • D:/hazard-data/NonLandslides_merged.zip
  • Set the environment variable DATA_ROOT to that folder:

    • Windows (PowerShell):

      setx DATA_ROOT "D:\hazard-data"
    • Linux/macOS:

      export DATA_ROOT=/Users/me/hazard-data
    • Google Colab:

      from google.colab import drive
      drive.mount('/content/drive')
      import os
      os.environ["DATA_ROOT"] = "/content/drive/MyDrive/hazard-data"

Config File (configs/config.yaml)

  • Example configuration:
    output_dir: "./outputs"
    
    boundary_path: "areakomeh.zip"
    classes:
      - { name: "landslide",        path: "landslides_merged.zip",       class_id: 1 }
      - { name: "pseudo_landslide", path: "PseudoLandslides_merged.zip", class_id: 2 }
      - { name: "non_landslide",    path: "NonLandslides_merged.zip",    class_id: 3 }
    
    start_date: "2024-01-01"
    end_date: "2024-12-31"
    cloud_max: 10
    
    tile_size: 256
    bands: ["B02","B03","B04","B08"]
    
    output_destination: "local"   # local | drive | kaggle
    kaggle_dataset_slug: "mohamadrahdan/gee-s2-multiclass"
    
    export:
      drive_folder: "gee_s2_multiclass"
      image_prefix: "img_"
      mask_prefix: "mask_"
      format: "GEO_TIFF"
      scale: 10
      max_tiles: 200
    
    

Run the Pipeline

  • Local usage

    Make sure to activate your virtual environment and set DATA_ROOT first.

    jupyter notebook notebooks/gee-s2-multiclass-dataset-generator.ipynb
    
    
  • Google Colab

    Click the "Open in Colab" badge above.

    Mount your Drive, set DATA_ROOT, and run the cells.

Outputs

  • Quick Preview
    The notebook prints download URLs for one tile + mask.

  • Batch Export

    • Tasks are created in Earth Engine.
    • Files are saved in your Google Drive folder: gee_s2_multiclass
    • Filenames follow this format: img_00001.tif, mask_00001.tif, etc.

Publish to Kaggle(Optional)

  • After export, you can push the results to Kaggle:
python tools/upload_to_kaggle.py \
  --slug yourname/gee-s2-multiclass \
  --dir ./outputs \
  --init --public
  • For subsequent versions:
python tools/upload_to_kaggle.py \
  --slug yourname/gee-s2-multiclass \
  --dir ./outputs \
  --message "update v0.2"

Why these bands? (B02, B03, B04, B08)

  • B02 / B03 / B04 (Blue, Green, Red – 10m): true-color composites, quality control (QC)
  • B08 (Near Infrared – 10m): vegetation index (NDVI), critical for hazard mapping

FAQ

  • Q: Where do I put my shapefiles?
    A: In a private folder outside Git. Point to it via DATA_ROOT.
  • Q: Can I add more classes?
    A: Yes, just add them under classes in config.yaml with a unique class_id.
  • Q: How do I avoid thousands of tasks?
    A: Use max_tiles in config.yaml and export in batches.
  • Q: Do outputs go to GitHub?
    A: No, outputs/ and data/ are ignored via .gitignore.

License

MIT License © 2025 Mohamad Rahdan

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published