Skip to content

hotosm/HOTOSM-EO-TT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

30 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

HOTOSM-EO-TT: Building Detection & Material Classification from Satellite Imagery

๐Ÿ“‹ Project Overview

The HOTOSM-EO-TT (Humanitarian OpenStreetMap Team - Earth Observation - Top Talented) project is a comprehensive deep learning solution for automated building detection, segmentation, and material classification using satellite and aerial imagery. This project is designed to support humanitarian mapping, disaster response, and infrastructure monitoring by providing robust tools for extracting building footprints and classifying roof materials.

๐ŸŽฏ Project Objectives

  • Building Detection: Accurately identify and localize buildings in satellite imagery
  • Building Segmentation: Generate precise building footprints and boundaries
  • Material Classification: Classify roof materials for infrastructure assessment
  • Scalable Pipeline: Create reusable pipelines for different geographic regions

๐Ÿ—๏ธ Project Architecture

The project is structured into two distinct phases, each addressing specific challenges in geospatial AI:

๐Ÿ“Š Phase 1: Building Detection & Segmentation

Overview

Phase 1 focuses on developing robust building detection and segmentation capabilities using state-of-the-art computer vision models. The approach combines object detection (YOLO) with segmentation (SAM) to achieve high-precision building extraction.

๐Ÿ”ฌ Research & Development Process

1. YOLO Model Selection & Training

  • Initial Testing: Evaluated multiple YOLO versions (v8, v11, v12)
  • Dataset: Trained on RAMP dataset for building detection
  • Innovation Team Insights: Leveraged previous team's research and findings
  • Final Selection: YOLO v11n and YOLO v11s based on performance metrics
  • Results: Achieved 78% mAP on RAMP dataset

2. SAM Model Integration

  • Fine-tuning: Customized SAM model on RAMP dataset
  • Integration Strategy: Used YOLO detection results as input to SAM
  • Input Methods Tested:
    • Points: Single point prompts for each building
    • Boxes: Bounding box prompts for each building
    • Point-by-Point: Individual point processing
    • Box-by-Box: Individual bounding box processing
  • Evaluation: Calculated IoU (Intersection over Union) for each method

3. Multi-Area Validation

  • HOTOSM Areas: Tested on multiple geographic regions provided by HOTOSM team
  • Transfer Learning: Applied RAMP-trained models to new areas
  • Performance: Achieved improved mAP and SAM IoU scores
  • Conclusion: YOLO + SAM pipeline successfully generalized across different regions

๐Ÿ“ˆ Phase 1 Results Summary

Model Training Time Epochs mAP SAM IoU Status
YOLO v11n 72 hrs 100 78% Improved โœ… Selected
YOLO v11s 72 hrs 100 78% Improved โœ… Selected
SAM (Fine-tuned) - - - High โœ… Integrated

๐ŸŽจ Phase 2: Material Detection & Classification

Overview

Phase 2 extends the building detection capabilities to include material classification, enabling detailed infrastructure assessment through roof material identification.

๐Ÿ”ฌ Three-Approach Investigation

Approach 1: Two-Stage Classification Pipeline

Methodology:

  • Stage 1: YOLO object detection to identify buildings
  • Stage 2: Crop detected buildings and classify materials using:
    • EfficientNet-B5
    • VGG16

Dataset: RoofNet (15 material classes) Results:

  • Overall accuracy: <65%
  • Multiple classes showed low accuracy
  • Status: โŒ Insufficient performance

Approach 2: Direct Object Detection & Segmentation

Methodology:

  • Model: YOLO for multi-class object detection and segmentation
  • Dataset: Nacala dataset (5 material classes)

Results:

  • mAP: 65%
  • Segmentation quality: Much better than Approach 1
  • Status: โœ… Best performing approach

Approach 3: Dataset Conversion & Training

Methodology:

  • Dataset Conversion: Transformed RoofNet from classification to object detection
  • Process:
    1. Used YOLO v11n (trained on RAMP) to detect buildings in RoofNet images
    2. Created annotations based on folder structure and detected buildings
  • Training: Fine-tuned YOLO on converted RoofNet dataset

Results:

  • Performance: Not promising
  • Status: โŒ Insufficient results

๐Ÿš€ Phase 2 Enhancement: Ground DINO Integration

Ground DINO for Building Detection

Innovation: Introduced Ground DINO for building detection Advantages:

  • Zero-shot capability: No additional training required
  • Out-of-the-box performance: Excellent results without fine-tuning
  • Comparison: Superior to YOLO which required fine-tuning on HOTOSM areas

DINOv3 + ViTPerHead Segmentation

Final Approach:

  • Backbone: DINOv3 for feature extraction
  • Segmentation Head: ViTPerHead for precise segmentation
  • Training Data: 5% of RAMP dataset
  • Results:
    • IoU: 70%
    • Dice Loss: 82%

๐Ÿ“ˆ Phase 2 Results Summary

Approach Model Dataset Classes mAP/IoU Status
1 YOLO + EfficientNet/VGG16 RoofNet 15 <65% โŒ
2 YOLO Multi-class Nacala 5 65% mAP โœ… Best
3 YOLO (Converted) RoofNet 15 Low โŒ
Enhancement Ground DINO - - High โœ… Zero-shot
Final DINOv3 + ViTPerHead RAMP (5%) - 70% IoU โœ…

๐Ÿ—๏ธ Technical Architecture

Model Architectures Used

1. YOLO (You Only Look Once)

  • Versions: v8, v11, v12
  • Selected: v11n, v11s
  • Architecture: Single-stage object detector
  • Advantages: Fast inference, good accuracy
  • Use Case: Building detection and material classification

2. SAM (Segment Anything Model)

  • Type: Foundation model for segmentation
  • Fine-tuning: Customized on RAMP dataset
  • Input Types: Points, bounding boxes
  • Integration: Receives YOLO detection results
  • Output: Precise building segmentation masks

3. DINOv3

  • Type: Self-supervised vision transformer
  • Use Case: Feature extraction backbone
  • Advantages: Strong feature representation
  • Integration: Combined with ViTPerHead for segmentation

4. Ground DINO

  • Type: Zero-shot object detection model
  • Advantages: No training required, excellent generalization
  • Use Case: Building detection without fine-tuning
  • Performance: Superior to fine-tuned YOLO

5. EfficientNet-B5 & VGG16

  • Type: Classification models
  • Use Case: Material classification from cropped building images
  • Performance: Limited effectiveness for this task

๐Ÿ“Š Key Metrics Explained

mAP (mean Average Precision)

  • Definition: Average precision across all classes
  • Range: 0-100%
  • Interpretation: Higher values indicate better detection accuracy
  • Our Results: 65% mAP on Nacala dataset

IoU (Intersection over Union)

  • Definition: Ratio of intersection to union of predicted and ground truth masks
  • Range: 0-1 (0-100%)
  • Interpretation: Higher values indicate better segmentation quality
  • Our Results: 70% IoU with DINOv3 + ViTPerHead

Dice Loss

  • Definition: 1 - Dice coefficient, measures overlap between predicted and ground truth
  • Range: 0-1
  • Interpretation: Lower values indicate better segmentation
  • Our Results: 82% Dice coefficient (18% Dice loss)

๐Ÿ“ Project Structure & Notebook Links

Data Acquisition Notebooks

Original Paper Experiments

Building Detection Pipelines

YOLO Training & ETL

SAM Training & Fine-tuning

DINOv3 Training

Inference Pipelines

Evaluation

Material Classification & Detection

Material Classification

Material Detection


๐Ÿ“Š Performance Comparison

Building Detection Models

Model Training Required mAP Inference Speed Generalization
YOLO v11n Yes (RAMP + HOTOSM) High Fast Good
YOLO v11s Yes (RAMP + HOTOSM) High Fast Good
Ground DINO No (Zero-shot) High Medium Excellent

Segmentation Models

Model IoU Dice Coefficient Training Data Specialization
SAM (Fine-tuned) High High RAMP Building-specific
DINOv3 + ViTPerHead 70% 82% RAMP (5%) General

Material Classification

Approach Dataset Classes Accuracy Complexity
YOLO + EfficientNet/VGG16 RoofNet 15 <65% High
YOLO Multi-class Nacala 5 65% mAP Medium

๐Ÿ”ฎ Recommendations

Technical Recommendations

  1. Use Ground DINO: For new regions, start with Ground DINO for zero-shot building detection
  2. Fine-tune SAM: For specific regions, fine-tune SAM on local data
  3. Material Classification: Focus on Nacala dataset approach (5 classes) for better results
  4. DINOv3 Integration: Consider DINOv3 + ViTPerHead for high-precision segmentation

๐Ÿ“š References & Resources

Models & Frameworks

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors