Sen12Landslides: Spatio-Temporal Landslide & Anomaly Detection Dataset

A large-scale, multi-modal, multi-temporal collection of 128×128px Sentinel-1/2 + DEM patches with 10m spatial resolution and with 75k landslide annotations.

Quick Start

# Clone & setup
git clone https://github.com/PaulH97/Sen12Landslides.git
cd Sen12Landslides
pip install -e .
pip install --upgrade huggingface_hub

# Authenticate (only once)
hf auth login

# Download harmonized dataset or raw dataset
mkdir -p data
hf download paulhoehn/Sen12Landslides \
  --repo-type dataset \
  --local-dir data \
  --include "data_harmonized/**" 

# Extract and clean up archives
for sensor in s1asc s1dsc s2; do
  for archive in data/data_harmonized/$sensor/*.tar.gz; do
    [ -f "$archive" ] && tar -xzf "$archive" -C "data/data_harmonized/$sensor" && rm "$archive"
  done
done

Update the path of root_dir in the global configuration file (Sen12Landslides/configs/config.yaml) to point to your Sen12Landslides folder, ensuring that it contains the aforementioned data.

Dataset Overview

Full Dataset

Modality	Samples	Annotated	Ann. Rate
S1-asc	13,306	6,492	48.8%
S1-dsc	12,622	6,347	50.3%
S2	13,628	6,737	49.4%
Aligned	11,719	6,026	51.4%

Task Splits

Modality	S12LS-LD	S12LS-AD
S1-asc	4,793 (100%)	13,306 (48.8%)
S1-dsc	4,666 (100%)	12,622 (50.3%)
S2	4,988 (100%)	13,628 (49.4%)
Aligned	4,392 (100%)	11,719 (51.4%)

S12LS-LD: Landslide detection with only annotated patches (>50 annotated pixels per patch)
S12LS-AD: Anomaly detection with mixed annotated/non-annotated samples to learn normal vs. anomalous patterns
See Sen12Landslides/tasks/<task>/config.json for split details

Dataset Versions

Harmonized (recommended)

The harmonized version contains radiometrically consistent data that has been pre-processed and bounded for stable model training:

Sentinel-1 (Backscatter):
- VH and VV bands converted from linear power to decibels (dB) via $10 \cdot \log_{10}(x)$
- Values bounded to [-50, 10] dB to remove extreme noise and specular outliers
Sentinel-2 (Reflectance):
- Bands B02–B12 corrected for the +1000 DN radiometric offset introduced by ESA Baseline 04.00 (January 25, 2022 onward)
- Values bounded to [0, 10000] DN to ensure physical reflectance consistency
DEM (Elevation):
- Values bounded to [0, 8800] m to maintain a global terrain baseline

Raw (original)

The raw version preserves the data exactly as published in the original dataset paper, ensuring full reproducibility of reported results:

Sentinel-1: Linear power scale (not converted to dB)
Sentinel-2: No radiometric offset correction applied
DEM: Unmodified

The conversion functions for both corrections are available in the utils.py file of the GitHub repository.

Data Structure

Sen12Landslides/
├── data/
│   ├── data_harmonized/                    ← recommended for training
│   │   ├── inventories.shp.zip
│   │   ├── s1asc/                          Sentinel-1 Ascending (dB)
│   │   │   └── <region>_s1asc_<id>.nc
│   │   ├── s1dsc/                          Sentinel-1 Descending (dB)
│   │   │   └── <region>_s1dsc_<id>.nc
│   │   └── s2/                             Sentinel-2 (offset corrected)
│   │       └── <region>_s2_<id>.nc
│   └── data_raw/                           ← original paper version
│       ├── inventories.shp.zip
│       ├── s1asc/
│       ├── s1dsc/
│       └── s2/
├── tasks/
│   ├── S12LS-LD/                           Landslide detection
│   │   ├── config.json
│   │   ├── harmonized/
│   │   │   └── <modality>/
│   │   │       ├── splits.json
│   │   │       ├── norm.json
│   │   │       └── patch_locations.geojson
│   │   └── raw/
│   │       └── <modality>/
│   │           ├── splits.json
│   │           ├── norm.json
│   │           └── patch_locations.geojson
│   └── S12LS-AD/                           Anomaly detection
│       ├── harmonized/
│       │   └── ...
│       └── raw/
│           └── ...
└── src/                                    Data loaders, models, training

Patch Format

Each .nc file contains 128×128 px across 15 time steps:

Modality	Bands	Additional
Sentinel-1-NRB	VV, VH	DEM, MASK
Sentinel-2-L2A	B02-B08, B8A, B11-B12	SCL, DEM, MASK

>>> import xarray as xr
>>> ds = xr.open_dataset("Sen12Landslides/data/s2/italy_s2_6982.nc")
>>> ds
<xarray.Dataset> Size: 6MB
Dimensions:      (time: 15, x: 128, y: 128)
Coordinates:
  * x            (x) float64 1kB 7.552e+05 … 7.565e+05
  * y            (y) float64 1kB 4.882e+06 … 4.881e+06
  * time         (time) datetime64[ns] 2022-10-05 … 2023-09-10
Data variables: (12/14)
    B02          (time, x, y) int16 …
    B03          (time, x, y) int16 …
    …             
    B12          (time, x, y) int16 …
    SCL          (time, x, y) int16 …
    MASK         (time, x, y) uint8 …
    DEM          (time, x, y) int16 …
    spatial_ref  int64 8B  
Attributes:
    ann_id:           41125,41124,…  
    ann_bbox:         (755867.58,4880640.0,…)  
    event_date:       2023-05-16  
    date_confidence:  1.0  
    pre_post_dates:   {'pre': 7, 'post': 8}  
    annotated:        True  
    satellite:        s2  
    center_lat:       4881280.0  
    center_lon:       755840.0  
    crs:              EPSG:32632

Tasks

We provide two task-specific configurations:

Creating custom splits:

python src/data/create_splits.py  # Configure in configs/splits/config.yaml

Always Generated (Root Level)

File	Description
config.json	Filter criteria, split ratios, and stratification settings

Per-Satellite Folders (`s1asc/`, `s1dsc/`, `s2/`)

File	Description
splits.json	Train/val/test splits for this satellite modality
norm.json	Per-band normalization statistics (mean/std) for this satellite
patch_locations.geojson	Geographic patch locations with train/val/test assignments for this satellite

Multi-Modal Files

File	Description
splits_aligned.json	Train/val/test splits containing only patches available across all satellites
norm_aligned.json	Normalization statistics computed from aligned patches only
patch_locations_aligned.geojson	Geographic locations of patches available across all satellites

Usage:

Single-modal training: Load <satellite>/splits.json + <satellite>/norm.json
Multi-modal training: Load splits_aligned.json + norm_aligned.json for cross-modal fusion
Visualization: Open patch_locations.geojson in QGIS or mapping tools

Training

This project uses Hydra for configuration management. See Hydra documentation for more details. Note that the standard parameters of some classes are overwritten by those in the configuration files. Therefore, ensure that you always update the config files under configs/ accordingly for your hardware and requirements.

Available Configurations

Config	Options
model	utae, convgru, unet3d, unet_convlstm
dataset	sen12ls_s2, sen12ls_s1asc, sen12ls_s1dsc
trainer	cpu, gpu, ddp
lit_module	binary, multiclass

Examples

# Train ConvGRU on Sentinel-2
python src/pipeline/train.py model=convgru dataset=sen12ls_s2

# Train UTAE on Sentinel-1 with DEM
python src/pipeline/train.py model=utae dataset=sen12ls_s1asc dataset.dem=true dataset.num_channels=3

# Multi-GPU training
python src/pipeline/train.py trainer.devices=4 trainer.strategy=ddp dataset=sen12ls_s2

# Multirun with three models
python src/pipeline/train.py --multirun model=utae,convlstm,convgru dataset=sen12ls_s2

Baselines

Due to class imbalance (~3% landslides), we provide, additionaly to our macro-avg metrics in the paper, binary metrics on the landslide class for benchmarking against other detection methods.

Note: To compare landslide detection performance, use the binary metrics below rather than the macro-averaged metrics from the paper.

Benchmark Results (`S12LS-LD`)

Benchmark using paper architectures with binary metrics on Sentinel-2 + DEM:

Model	AP	F1	IoU	Precision	Recall
U-ConvLSTM	65,13	61,95	44,88	60,59	63,92
Unet3d	62,08	58,82	41,66	55,75	62,56
ConvGRU	60,00	59,06	41,91	56,72	61,77
U-TAE	67,75	61,80	44,74	53,19	74,90

Three training runs (seed=42,123,777) were performed for each model on the harmonized S12LS-LD split with lit_module=binary for 75 epochs. Test metrics were averaged across seeds on the held-out test set. See configs/ for full settings.

⚠️ These models serve as a "quick-run" proof of concept using a baseline threshold of 0.5, though optimizing this threshold significantly improves metrics alongside future architectural scaling and feature fusion.

Reproducibility

# Train all baselines
python pipeline/train.py --multirun \
  model=unet3d,convgru,utae,unet_convlstm \
  seed=42,123,777 \
  dataset=sen12ls_s2

Challenges

What makes this dataset demanding and a good resource for new methodological improvements to beat the baselines:

Severe class imbalance (~3% landslides)
Small spatial extent - landslides often span few pixels at 10m resolution
Multi-temporal complexity - effective temporal fusion remains challenging
Geographic diversity - varied terrain, vegetation, and climate

Name		Name	Last commit message	Last commit date
Latest commit History 161 Commits
configs		configs
experiments		experiments
src		src
tasks		tasks
tests		tests
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sen12Landslides: Spatio-Temporal Landslide & Anomaly Detection Dataset

Quick Start

Dataset Overview

Dataset Versions

Harmonized (recommended)

Raw (original)

Data Structure

Patch Format

Tasks

Always Generated (Root Level)

Per-Satellite Folders (`s1asc/`, `s1dsc/`, `s2/`)

Multi-Modal Files

Training

Available Configurations

Examples

Baselines

Benchmark Results (`S12LS-LD`)

Reproducibility

Challenges

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

PaulH97/Sen12Landslides

Folders and files

Latest commit

History

Repository files navigation

Sen12Landslides: Spatio-Temporal Landslide & Anomaly Detection Dataset

Quick Start

Dataset Overview

Dataset Versions

Harmonized (recommended)

Raw (original)

Data Structure

Patch Format

Tasks

Always Generated (Root Level)

Per-Satellite Folders (s1asc/, s1dsc/, s2/)

Multi-Modal Files

Training

Available Configurations

Examples

Baselines

Benchmark Results (S12LS-LD)

Reproducibility

Challenges

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Per-Satellite Folders (`s1asc/`, `s1dsc/`, `s2/`)

Benchmark Results (`S12LS-LD`)

Packages