A deep learning-based flood monitoring pipeline that integrates physics-inspired loss functions, water index constraints, and structured generation, trained on real-world Sentinel-1 and Sentinel-2 pairs.
This project proposes a physics-aware SAR-to-Optical generation framework for flood mapping, where the objective is not just image translation, but domain-specific flood analysis. We move beyond traditional GANs by adding meaningful hydrological supervision, cloud-aware learning, and physically-consistent representation.
- Water-Aware Generation: Model outputs RGB + Water map directly.
- NDWI Consistency: Reinforces spectral water behavior in RGB bands.
- Cloud-Aware Losses: Losses masked using cloud masks (
LabelCloud.tif). - Custom Generator/Discriminator: Modified U-Net + PatchGAN.
- Speckle-Aware Texture Loss: Preserves fine SAR edges in RGB.
- EMA & TTUR: Training stabilized with exponential moving average and two time-scale updates. =======
This repository contains the implementation of Flood-GAN, a novel multi-task framework for translating Synthetic Aperture Radar (SAR) imagery into cloud-free optical images, with a specialized focus on accurate flood water delineation. Unlike conventional image-to-image translation models, Flood-GAN is architected with a deep understanding of remote sensing physics and the specific requirements of hydrological analysis.
Our model builds upon the foundational framework of conditional Generative Adversarial Networks (cGANs), as proposed by Isola et al. in their seminal paper. We adapt this framework with a specialized generator architecture and a multi-component discriminator strategy to create a system tailored for remote sensing applications.
The generator is responsible for the core translation task. We employ a U-Net architecture, which is an encoder-decoder network enhanced with skip connections. This design is particularly effective as it allows low-level feature information (like edges and textures) from the encoder to be passed directly to the corresponding layers in the decoder, preventing information loss and aiding in the reconstruction of high-fidelity details.
A key innovation in our design is the Multi-Task Output Head. Instead of a single output, our U-Net generator's final layer branches into two separate convolutional heads:
- RGB Head: A
Conv2dlayer with 3 output channels and atanhactivation function, responsible for generating the final optical image. - Segmentation Head: A
Conv2dlayer with 1 output channel, which produces the raw logits for the water segmentation mask.
This architecture transforms the generator from a simple translator into a powerful multi-task learning system.
| Component | Layer Structure | Purpose |
|---|---|---|
| Encoder | 8 DownsamplingBlock layers (C64-C128-C256-C512-C512-C512-C512-C512) |
Progressively downsamples the input image, capturing increasingly abstract, high-level features. |
| Decoder | 8 UpsamplingBlock layers (CD512-CD1024-CD1024-C1024-C1024-C512-C256-C128) |
Reconstructs the image from the bottleneck, using skip connections to re-integrate fine-grained details from the encoder. |
| Upsampling | Upsample (Bilinear) + Conv2d |
We explicitly avoid using Transposed Convolutions to prevent common checkerboard artifacts, ensuring smoother and more realistic outputs. |
To enforce realism, we use a PatchGAN discriminator. As described by Isola et al., instead of classifying the entire image as real or fake, the PatchGAN operates on N x N patches of the image. It outputs a feature map where each value represents the "realness" of a corresponding patch in the input. This approach encourages the generator to produce realistic high-frequency details across the entire image.
Our implementation uses a 70x70 PatchGAN (n_layers=3), which has been shown to provide an effective balance between performance and computational efficiency.
The true power of Flood-GAN comes from its unique, multi-component loss function. It is meticulously engineered to guide the model using principles of remote sensing and SAR physics, going far beyond standard reconstruction losses.
The total generator loss is a weighted sum of six distinct components:
| Loss Component | Purpose & Implementation | Key Benefit |
|---|---|---|
L_adversarial (LSGAN) |
Enforces realism by training the generator to fool the discriminator. We use Least Squares GAN loss (MSE) for improved training stability over traditional BCE loss. | Provides smoother, more stable gradients, preventing the generator from getting stuck. |
L1_masked |
Calculates L1 (Mean Absolute Error) pixel loss between the generated and real optical images, but only on clear pixels as defined by the cloud mask. | Maximizes data usage by learning from partially cloudy scenes, while preventing the model from being penalized for incorrect cloud generation. |
Perceptual_masked |
Compares high-level feature maps from a pre-trained VGG19 network. This loss is also masked to apply only to clear regions. | Improves perceptual quality by focusing on texture and structure, leading to images that look more natural to the human eye. |
SegmentationLoss (BCE) |
Directly supervises the water segmentation output. Compares the generator's segmentation head output against the ground truth s2_watermask.tif using Binary Cross-Entropy. |
This is the core of our task-oriented approach. It makes the model an explicit flood mapper, not just an image translator. |
SpecklePreservationLoss |
A custom, physics-aware loss. It calculates the image gradient (via a Sobel filter) of both the input SAR and generated optical image. It then penalizes the generator for creating high-frequency details (texture) in the optical image where the corresponding SAR image is smooth (e.g., calm water). | Creates more realistic water surfaces by preventing the model from "hallucinating" textures where none should exist, directly leveraging the physical properties of SAR. |
WaterIndexConsistencyLoss |
A physics-based guidance loss. It creates a "weak water label" from the input SAR VH channel (where water has low backscatter). It then calculates a proxy for the Normalized Difference Water Index (NDWI) on the generated optical image and penalizes the model if the generated NDWI is low in areas where the SAR input strongly suggests water is present. | Provides powerful cross-modal physics guidance. It ensures the generated optical image is not just visually plausible but also hydrologically consistent with the input SAR data. |
| Metric | Score |
|---|---|
| PSNR | 31.25 |
| SSIM | 0.94 |
| L1 Loss | ~0.05 |
The model demonstrates a remarkable ability to generate high-fidelity optical images and accurate water masks, even in complex scenes.
| Input SAR | Ground Truth Optical | Generated Optical | Generated Water Mask | Ground Truth Water Mask |
|---|---|---|---|---|
| ` | ` | [Link to your Generated Optical Image] |
[Link to your Generated Water Mask Image] |
`` |
| Sentinel-1 Input | Sentinel-2 Ground Truth | Our Model's Output | Our Model's Segmentation | Ground Truth Segmentation |
The model exhibits stable convergence and consistent improvement across key validation metrics.
| Validation L1 Loss | Validation PSNR | Validation SSIM |
|---|---|---|
[Link to your val_l1_loss graph] |
[Link to your val_psnr graph] |
[Link to your val_ssim graph] |
sar-colorization/
├── .venv/ # Python virtual environment (already set up with uv)
├── data/
│ ├── raw/ # Raw downloaded Sentinel-1 and Sentinel-2 images
│ └── processed/ # Preprocessed, aligned, and tiled image data
├── notebooks/
│ └── 01_data_exploration.ipynb # Jupyter notebook for initial EDA and visualization
├── outputs/
│ ├── checkpoints/ # Model checkpoints saved during training
│ └── results/ # Output images and logs from inference
├── scripts/
│ ├── 01_fetch_data.py # Download SAR and optical images via Sentinel Hub API
│ └── 02_preprocess_data.py # Co-registration, tiling, cloud masking, etc.
├── src/
│ ├── __init__.py
│ ├── config.py # Configuration (paths, API keys, constants)
│ ├── data_loader.py # Custom PyTorch Dataset and DataLoader
│ ├── losses.py # Loss functions (e.g., perceptual loss, SSIM)
│ ├── model.py # UNet/ResNet Generator & PatchGAN Discriminator
│ └── train.py # PyTorch Lightning training loop
├── monitoring/
│ ├── docker-compose.yml # Monitoring stack setup (Prometheus + Grafana)
│ └── prometheus/
│ └── prometheus.yml # Prometheus scraping config
├── inference.py # Inference script to run trained model on new SAR data
├── pyproject.toml # Python project metadata and dependencies
└── README.md # You're reading this file- Isola et al., "Image-to-Image Translation with Conditional Adversarial Networks", CVPR 2017 [Paper]
- Zhang et al., "Understanding Deep Learning Techniques for Remote Sensing Image Analysis", ISPRS 2019
- Flood mapping requires more than RGB synthesis; this model learns spectrally consistent water regions.
- NDWI-based supervision connects physics-based hydrology with deep generative learning.
- Using both SAR texture and RGB spectral cues, the network is encouraged to synthesize interpretable, reliable flood maps.