This repository implements state-of-the-art deep learning architectures for road segmentation in satellite imagery. It compares multiple models to determine the best-performing approach for pixel-level road segmentation tasks. The study focuses on leveraging architectural innovations and rigorous data preprocessing to achieve superior model performance.
- Introduction
- Features
- Models
- Data Preparation
- Optimization Strategies
- How To Run The Code
- Directory Structure
- Contributors
Road segmentation from satellite imagery is a crucial task in urban development, disaster management, and autonomous navigation. This project evaluates and compares the performance of five deep learning models, including:
- GC-DCNN
- DLinkNet
- ResUNet++
- ResUNetFormer
- UnetEfficientNetB7
The project identifies UnetEfficientNetB7 as the best-performing model, achieving an F1 score of 91.8%.
- Advanced Preprocessing: Includes geometric and photometric augmentations to improve model robustness.
- Cutting-Edge Models: Implements modern architectures like EfficientNet and ResNet backbones.
- Comprehensive Evaluation: Utilizes F1 score for model evaluation across diverse urban scenarios.
- Open-Source Implementation: Provides detailed documentation for reproducibility.
The repository includes the following models:
- GC-DCNN: Employs residual dilated blocks and pyramid pooling modules for global context extraction.
- DLinkNet: Combines ResNet-34 with an efficient encoder-decoder structure.
- EfficientNet-UNet: Utilizes EfficientNet-B7 for hierarchical feature extraction with a U-Net-inspired decoder.
- Other Architectures: Includes ResUNetFormer and Unet++152 with for additional comparisons.
- The dataset was expanded from 100 to 10,566 images using diverse high-resolution satellite imagery from cities like Chicago, Berlin, Zurich, and Paris.
- Splitting and Resizing: Satellite images were divided into quadrants and resized to 400×400 pixels.
- Geometric Augmentation: Includes rotations and flips for spatial diversity.
- Photometric Augmentation: Applied using the
Albumentationslibrary to simulate real-world conditions.
- Hyperparameter Tuning: Optimized batch size and learning rate using grid search.
- Loss Functions: Evaluated Binary Cross-Entropy (BCE), Dice Loss, Focal Loss, and Combined Loss for optimal performance.
To use this project, first clone the repository, make sure you have python 3.11 or above installed on your system and install all dependencies listed in requirements.txt by running:
git clone https://github.com/CS-433/ml-project-2-amel_project.gitcd ml-project-2-amel_projectpip install -r requirements.txtTo create augmented datasets, you need to run:
python processing/data_augmentation.pyto create the different dataset sizes (5k and 10k)
Once everything is set up, you can run the training scripts (train.py or hyper_search.py) to train the models with your dataset or perform a hyper parameter search on a chosen model.
python train.py --model_name EfficientNetB7 --dataset_size 10k --num_epochs 20 --batch_size_train 8python hyper_search.py --model_name EfficientNetB7 --dataset_size 10k --num_epochs 20The parameter grid can be changed directly in the python file.
We created a file predict.py to compute the final masks.
When running the script, you can specify which models to use for making predictions. For instance:
- To make predictions using only the ResUnet model (the default setting):
python predict.py
Prior to making predictions with a particular model, ensure that you have successfully trained the model and saved it in the .pth format in the saved_models/ directory. The training and saving process is automated and can be performed using the train.py or hyper_search.py scripts.
ml-project-2-amel-project/
│
├── data/ # Directory containing all data sets
│ ├── 1k/ # Data set with 1,000 instances
│ ├── 10k/ # Data set with 10,000 instances
│ ├── test_set_images/ # Images for model testing
│ └── training/ # Initial training data images and labels
│
├── helpers/ # Utility scripts for various helper functions
│ ├── load.py # Functions to load data
│ ├── loss.py # Definitions of loss functions used in training
│ ├── mask_to_submission.py # Convert model output masks into submission format
│ ├── plot.py # Visualization functions for data and results
│ ├── slurm_to_json.py # Converts SLURM job outputs to JSON format
│ └── submission_to_mask.py # Converts submission format back to mask format
│
├── jobs/ # Scripts for running batch jobs on a cluster
│ ├── hyper_search.run # Run this to perform hyperparameter search
│ ├── predict.run # Run this to perform a prediction with a saved model
│ └── train.run # Run this to train a model on specific parameters
│
├── models/ # Contains various model architecture definitions
│ ├── DLink.py # Model script for DLink architecture
│ ├── GCDCNN.py # Model script for GCDCNN architecture
│ ├── ResUNet.py # Model script for ResUNet architecture
│ ├── ResUNet++.py # Unet++ model using ResNet-152 as a backbone
│ └── UnetEfficientNetB7.py # Unet model using EfficientNetB7 as a backbone
│
├── processing/ # Scripts for data preprocessing and augmentation
│ ├── data_augmentation.py # Script used to augment data
│ ├── data_processing.py # Script used to process data
│ ├── helpers_augmentation.py # Helper functions specific to data augmentation
│ └── helpers_processing.py # Helper functions for data processing
│
├── saved_models/ # Directory to store trained model weights
│
├── requirements.txt # List of Python packages that are necessary to run the project
│
├── hyper_search.py # Script for hyperparameter search and optimization
├── predict.py # Script for making predictions with trained models
├── train.py # Script for training models on the data
├── training_loop.py # Python file containing the training loops
│
└── README.md # Markdown file with project overview and instructions
- Ali Ridha Mrad
- Mehdi Bouchoucha
- Mohamed Hedi Hidri