Chase the Cloud: Leveraging Diffusion Models for Cloud Motion Prediction using INSAT-3DR/3DS Imagery
This repository contains a prototype implementation of a conditional diffusion model to predict the next geostationary satellite image frame using past satellite observations. The primary input is the TIR1 channel, with optional use of WV as a conditional feature.
The goal is to explore generative modeling for short-term cloud motion forecasting using satellite imagery. This proof of concept predicts the next TIR1 frame from previous timesteps using a simple UNet-based architecture within a diffusion framework.
The data is organized in the following format:
output/ YYYYMMDD/ HHMM/ IMG_TIR1.png IMG_TIR2.png IMG_WV.png IMG_MIR.png IMG_SWIR.png IMG_VIS.png
Each subfolder contains six grayscale PNG images representing different spectral channels at that timestep.
- Conditional diffusion model using UNet
- TIR1 as the base prediction target
- WV used as an optional conditional feature
- Sliding window frame input for temporal context
- Evaluation using SSIM, PSNR, and MAE
- Trained on geostationary satellite imagery with real spatial resolution
- The model takes in three consecutive timesteps of TIR1 and WV images
- It predicts the next TIR1 frame
- The model is trained using a sliding window strategy over the available dataset
- During training, the prediction from the fixed sample (first four instances of 1st June) is saved every epoch for visual validation
The model is a shallow convolutional encoder-decoder with 6 input channels and 1 output channel. It is trained with MSE loss and optimized with Adam.
To evaluate a saved model:
- Load a specific triplet of input timesteps
- Predict the next frame using the trained model
- Compute SSIM, PSNR, and MAE against the ground truth
- Add support for more conditional channels (e.g. MIR, SWIR)
- Train or integrate a self-supervised encoder and decoder setup for latent conditioning
- Explore improved temporal encoders (ConvLSTM, 3D CNN)
- Replace UNet with a deeper or hierarchical model
- Switch to latent diffusion for efficiency and scalability
- Python 3.8+
- PyTorch
- torchvision
- scikit-image
- tqdm
- Pillow
- OpenCV
- Make sure to adjust input paths and device configuration as per your system
- Training may take time depending on image resolution and hardware
This project is for research and prototyping purposes only.