This study developed and evaluated an end-to-end deep learning pipeline, utilizing a UNet architecture with pre-trained ResNet backbones, for semantic segmentation of kelp canopy in Landsat 7 imagery using Floating Forests labels, incorporating rigorous data preprocessing and augmentation. We found that a ResNet34 backbone, trained on cleaned and augmented data, achieved an Intersection over Union (IoU) of 0.5028, with data preprocessing and augmentation proving essential for optimal performance. Our study suggests that deep learning, leveraged with citizen-science-derived ground truth, offers a viable and scalable approach to automate kelp canopy mapping, which can enhance the efficiency of conservation efforts by reallocating resources towards direct ecological interventions.
- Overview
- Key Features
- How to Use the Tool (Inference)
- How to Reproduce Our Results (Training & Evaluation)
- Requirements
This project presents a deep learning pipeline for the automated mapping of kelp canopy from Landsat 7 satellite imagery. It leverages a UNet architecture with ResNet backbones and citizen science data (Floating Forests labels) for training and evaluation. The primary goal is to provide a scalable solution for monitoring kelp ecosystems, aiding conservation efforts.
- UNet Architecture: Utilizes a robust UNet model for semantic segmentation.
- ResNet Backbones: Supports pre-trained ResNet backbones (e.g., ResNet34) for feature extraction.
- Landsat 7 Imagery: Specifically designed for processing Landsat 7 satellite data.
- Citizen Science Integration: Incorporates Floating Forests labels for ground truth.
- Data Preprocessing & Augmentation: Includes steps for data cleaning and augmentation to improve model performance.
- End-to-End Pipeline: Covers data preparation, model training, inference, and evaluation.
Follow these steps to use the pre-trained model to generate kelp canopy masks on new Landsat 7 imagery:
-
Obtain Landsat 7 Images:
- Acquire the Landsat 7 images you wish to process.
- Images must be in the shape
(350, 350, 7). - The required band ordering is:
0: Short-wave infrared (SWIR)1: Near infrared (NIR)2: Red3: Green4: Blue5: Cloud Mask (binary - 0 for no cloud, 1 for cloud)6: Digital Elevation Model (DEM - meters above sea-level)
- If Cloud Mask and DEM bands are unavailable, they can be substituted with layers of zeros of the same spatial dimensions.
- Store these
.tifimages in thedata/cleaned/train_satellite/directory.- For a quick test, you can use
data_copy.pyto copy some sample training data to this directory.
- For a quick test, you can use
-
Clean Data (Optional but Recommended):
- The raw Landsat 7 data can be noisy. We provide an automated cleaning script.
- Navigate to the data cleaning directory and run the script:
(You might need to adjust paths within
cd data_cleaning/ python data_clean.pydata_clean.pyif your input data for cleaning is not in the default expected location.)
-
Run Inference:
- Download the Pre-trained Model:
- Download the
34_clean_augmodel folder from: Google Drive Link
- Download the
- Place Model Files:
- Store the downloaded
34_clean_augfolder (containingbest_weights.pthandoptimal_threshold.txt) into theruns/directory at the root of this project. Your structure should look likeruns/34_clean_aug/best_weights.pth.
- Store the downloaded
- Generate Masks:
- Navigate to the models directory and run the
generate_masks.pyscript. This script will load the model weights and process the images indata/cleaned/train_satellite/. - Adjust the
SATELLITE_INPUT_DIR_STRandORIGINAL_RUN_DIR_FOR_WEIGHTSconstants at the top ofgenerate_masks.pyif your paths differ.
cd models/ # Or cd ../models if you were in data_cleaning python generate_masks.py
- Predicted masks will be saved to the
output/generated_masks/INFERENCE_RUN_NAME/directory (whereINFERENCE_RUN_NAMEis set ingenerate_masks.py).
- Navigate to the models directory and run the
- Download the Pre-trained Model:
-
View Results:
- To visualize the generated masks alongside their corresponding RGB satellite images:
- Navigate to the data visualization directory and run the
output_view.pyscript. - Ensure the
INFERENCE_RUN_NAMEconstant at the top ofoutput_view.pymatches the one used ingenerate_masks.py.cd ../data_visualization/ # Or appropriate path from models/ python output_view.py
Follow these steps to replicate the training process and evaluation results presented in our study:
-
Download Data:
- Download the full dataset (including training and ground truth labels) from: Google Drive Link
- This dataset contains satellite images and their corresponding kelp masks.
-
Organize Data:
- Create a
cleaneddirectory inside yourdatafolder if it doesn't exist. - Save the downloaded satellite images into
data/cleaned/train_satellite1/. - Save the downloaded kelp masks into
data/cleaned/train_kelp1/.
- Create a
-
Clean Data:
- The satellite images in
data/cleaned/train_satellite1/need to be processed by the cleaning script. - Navigate to the data cleaning directory:
(Important: You may need to adjust the input/output directory paths within
cd data_cleaning/ # Adjust path as needed python data_clean.py
data_clean.pyto point totrain_satellite1and save its cleaned output, e.g., todata/cleaned/train_satellite/which the split script might expect.)
- The satellite images in
-
Split Data:
- The cleaned data needs to be split into training, validation, and testing sets.
- Navigate to the utils directory:
(Ensure
cd ../utils/ # Adjust path as needed python split_data.py
split_data.pyis configured to read from the output directory of your cleaning step and write to the standardtrain_satellite,train_kelp,val_satellite,val_kelp,test_satellite,test_kelpsubdirectories withindata/cleaned/.)
-
Train Model:
- Navigate to the models directory (e.g.,
cd ../models/). - Open
350resnet.py(or the relevant training script, e.g.,train.py). - Adjust training parameters (e.g.,
BACKBONE,MAX_EPOCHS,RUN_NAME,DATA_DIR) at the top of the script as needed. For reproducing the ResNet34 result, ensureBACKBONE = "resnet34". - Run the training script:
python 350resnet.py
- Once training is complete, all relevant information, including model weights and logs, will be stored in a subdirectory within the
runs/directory, named according toRUN_NAME.
- Navigate to the models directory (e.g.,
-
Testing & Evaluation:
- Find Optimal Threshold:
- Navigate to the models directory (if not already there).
- Open
find_threshold.py. Adjust the parameters at the top (e.g.,RUN_NAME,DATA_DIR_STR,BACKBONE_NAME) to match the details of your completed training run stored in theruns/directory. - Run the script:
This will save an
python find_threshold.py
optimal_threshold.txtfile in your specific run directory withinruns/.
- Run Test Set Evaluation:
- Open
test.py. Adjust parameters at the top (e.g.,RUN_NAME,BACKBONE_NAME,DATA_DIR_STR) to point to your completed training run and its weights. The script will automatically try to load theoptimal_threshold.txt. - Run the script:
Predicted masks for the test set will be saved to the
python test.py
output/RUN_NAME/directory, and evaluation metrics will be printed and saved toruns/RUN_NAME/results.txt.
- Open
- View Outputted Masks (Comparison):
- Navigate to the data visualization directory (e.g.,
cd ../data_visualization/). - Open
data_compare.py. AdjustOUTPUT_RUNat the top to match theRUN_NAMEof your test evaluation. - Run the script:
This will display a random sample comparing the original satellite image, the model's prediction, and the ground truth mask.
python data_compare.py
- Navigate to the data visualization directory (e.g.,
- Find Optimal Threshold:
- Python 3.8+
- PyTorch
- PyTorch Lightning
- Torchvision
- TorchMetrics
- NumPy
- Pandas
- Matplotlib
- Seaborn
- Tifffile
- TQDM
- Albumentations (if used for augmentation in training)
