HOTOSM-EO-TT: Building Detection & Material Classification from Satellite Imagery

📋 Project Overview

The HOTOSM-EO-TT (Humanitarian OpenStreetMap Team - Earth Observation - Top Talented) project is a comprehensive deep learning solution for automated building detection, segmentation, and material classification using satellite and aerial imagery. This project is designed to support humanitarian mapping, disaster response, and infrastructure monitoring by providing robust tools for extracting building footprints and classifying roof materials.

🎯 Project Objectives

Building Detection: Accurately identify and localize buildings in satellite imagery
Building Segmentation: Generate precise building footprints and boundaries
Material Classification: Classify roof materials for infrastructure assessment
Scalable Pipeline: Create reusable pipelines for different geographic regions

🏗️ Project Architecture

The project is structured into two distinct phases, each addressing specific challenges in geospatial AI:

📊 Phase 1: Building Detection & Segmentation

Overview

Phase 1 focuses on developing robust building detection and segmentation capabilities using state-of-the-art computer vision models. The approach combines object detection (YOLO) with segmentation (SAM) to achieve high-precision building extraction.

🔬 Research & Development Process

1. YOLO Model Selection & Training

Initial Testing: Evaluated multiple YOLO versions (v8, v11, v12)
Dataset: Trained on RAMP dataset for building detection
Innovation Team Insights: Leveraged previous team's research and findings
Final Selection: YOLO v11n and YOLO v11s based on performance metrics
Results: Achieved 78% mAP on RAMP dataset

2. SAM Model Integration

Fine-tuning: Customized SAM model on RAMP dataset
Integration Strategy: Used YOLO detection results as input to SAM
Input Methods Tested:
- Points: Single point prompts for each building
- Boxes: Bounding box prompts for each building
- Point-by-Point: Individual point processing
- Box-by-Box: Individual bounding box processing
Evaluation: Calculated IoU (Intersection over Union) for each method

3. Multi-Area Validation

HOTOSM Areas: Tested on multiple geographic regions provided by HOTOSM team
Transfer Learning: Applied RAMP-trained models to new areas
Performance: Achieved improved mAP and SAM IoU scores
Conclusion: YOLO + SAM pipeline successfully generalized across different regions

📈 Phase 1 Results Summary

Model	Training Time	Epochs	mAP	SAM IoU	Status
YOLO v11n	72 hrs	100	78%	Improved	✅ Selected
YOLO v11s	72 hrs	100	78%	Improved	✅ Selected
SAM (Fine-tuned)	-	-	-	High	✅ Integrated

🎨 Phase 2: Material Detection & Classification

Overview

Phase 2 extends the building detection capabilities to include material classification, enabling detailed infrastructure assessment through roof material identification.

🔬 Three-Approach Investigation

Approach 1: Two-Stage Classification Pipeline

Methodology:

Stage 1: YOLO object detection to identify buildings
Stage 2: Crop detected buildings and classify materials using:
- EfficientNet-B5
- VGG16

Dataset: RoofNet (15 material classes) Results:

Overall accuracy: <65%
Multiple classes showed low accuracy
Status: ❌ Insufficient performance

Approach 2: Direct Object Detection & Segmentation

Methodology:

Model: YOLO for multi-class object detection and segmentation
Dataset: Nacala dataset (5 material classes)

Results:

mAP: 65%
Segmentation quality: Much better than Approach 1
Status: ✅ Best performing approach

Approach 3: Dataset Conversion & Training

Methodology:

Dataset Conversion: Transformed RoofNet from classification to object detection
Process:
1. Used YOLO v11n (trained on RAMP) to detect buildings in RoofNet images
2. Created annotations based on folder structure and detected buildings
Training: Fine-tuned YOLO on converted RoofNet dataset

Results:

Performance: Not promising
Status: ❌ Insufficient results

🚀 Phase 2 Enhancement: Ground DINO Integration

Ground DINO for Building Detection

Innovation: Introduced Ground DINO for building detection Advantages:

Zero-shot capability: No additional training required
Out-of-the-box performance: Excellent results without fine-tuning
Comparison: Superior to YOLO which required fine-tuning on HOTOSM areas

DINOv3 + ViTPerHead Segmentation

Final Approach:

Backbone: DINOv3 for feature extraction
Segmentation Head: ViTPerHead for precise segmentation
Training Data: 5% of RAMP dataset
Results:
- IoU: 70%
- Dice Loss: 82%

📈 Phase 2 Results Summary

Approach	Model	Dataset	Classes	mAP/IoU	Status
1	YOLO + EfficientNet/VGG16	RoofNet	15	<65%	❌
2	YOLO Multi-class	Nacala	5	65% mAP	✅ Best
3	YOLO (Converted)	RoofNet	15	Low	❌
Enhancement	Ground DINO	-	-	High	✅ Zero-shot
Final	DINOv3 + ViTPerHead	RAMP (5%)	-	70% IoU	✅

🏗️ Technical Architecture

Model Architectures Used

1. YOLO (You Only Look Once)

Versions: v8, v11, v12
Selected: v11n, v11s
Architecture: Single-stage object detector
Advantages: Fast inference, good accuracy
Use Case: Building detection and material classification

2. SAM (Segment Anything Model)

Type: Foundation model for segmentation
Fine-tuning: Customized on RAMP dataset
Input Types: Points, bounding boxes
Integration: Receives YOLO detection results
Output: Precise building segmentation masks

3. DINOv3

Type: Self-supervised vision transformer
Use Case: Feature extraction backbone
Advantages: Strong feature representation
Integration: Combined with ViTPerHead for segmentation

4. Ground DINO

Type: Zero-shot object detection model
Advantages: No training required, excellent generalization
Use Case: Building detection without fine-tuning
Performance: Superior to fine-tuned YOLO

5. EfficientNet-B5 & VGG16

Type: Classification models
Use Case: Material classification from cropped building images
Performance: Limited effectiveness for this task

📊 Key Metrics Explained

mAP (mean Average Precision)

Definition: Average precision across all classes
Range: 0-100%
Interpretation: Higher values indicate better detection accuracy
Our Results: 65% mAP on Nacala dataset

IoU (Intersection over Union)

Definition: Ratio of intersection to union of predicted and ground truth masks
Range: 0-1 (0-100%)
Interpretation: Higher values indicate better segmentation quality
Our Results: 70% IoU with DINOv3 + ViTPerHead

Dice Loss

Definition: 1 - Dice coefficient, measures overlap between predicted and ground truth
Range: 0-1
Interpretation: Lower values indicate better segmentation
Our Results: 82% Dice coefficient (18% Dice loss)

📁 Project Structure & Notebook Links

Data Acquisition Notebooks

Download RAMP Dataset: Download and setup RAMP dataset for building detection
Download Nacala Dataset: Download Nacala dataset for material classification
Download RoofNet Dataset: Download RoofNet dataset for roof material classification

Original Paper Experiments

Original Paper Nacala Roof Inference: Reproduce original Nacala paper results
Original Paper RoofNet Inference: Reproduce original RoofNet paper results

Building Detection Pipelines

YOLO Training & ETL

YOLO Training Pipeline: Complete YOLO training pipeline for building detection
ETL Pipeline for YOLO: Data preprocessing and annotation conversion for YOLO format
YOLO Requirements: Dependencies for YOLO training

SAM Training & Fine-tuning

SAM Fine-tuning Pipeline: Fine-tune SAM model on RAMP dataset
ETL Pipeline for SAM: Data preprocessing for SAM training
SAM Requirements: Dependencies for SAM training

DINOv3 Training

DINOv3 Training Pipeline: Train DINOv3 model for segmentation
ETL Pipeline for DINOv3: Data preprocessing for DINOv3 training

Inference Pipelines

YOLO-SAM Inference Pipeline: Combined YOLO detection + SAM segmentation pipeline
Ground DINO Inference Pipeline: Zero-shot building detection using Ground DINO
YOLO-SAM Requirements: Dependencies for inference pipeline

Evaluation

YOLO-SAM Evaluation: Comprehensive evaluation of YOLO-SAM pipeline results
Evaluation Requirements: Dependencies for evaluation

Material Classification & Detection

Material Classification

Material Classification Pipeline: Complete pipeline for roof material classification
Material Classification Config: Configuration file for material classification

Material Detection

Nacala Object Detection Pipeline: YOLO-based material detection on Nacala dataset
RoofNet Object Detection Pipeline: YOLO-based material detection on RoofNet dataset
RoofNet to Object Detection Mapping: Convert RoofNet classification dataset to object detection format

📊 Performance Comparison

Building Detection Models

Model	Training Required	mAP	Inference Speed	Generalization
YOLO v11n	Yes (RAMP + HOTOSM)	High	Fast	Good
YOLO v11s	Yes (RAMP + HOTOSM)	High	Fast	Good
Ground DINO	No (Zero-shot)	High	Medium	Excellent

Segmentation Models

Model	IoU	Dice Coefficient	Training Data	Specialization
SAM (Fine-tuned)	High	High	RAMP	Building-specific
DINOv3 + ViTPerHead	70%	82%	RAMP (5%)	General

Material Classification

Approach	Dataset	Classes	Accuracy	Complexity
YOLO + EfficientNet/VGG16	RoofNet	15	<65%	High
YOLO Multi-class	Nacala	5	65% mAP	Medium

🔮 Recommendations

Technical Recommendations

Use Ground DINO: For new regions, start with Ground DINO for zero-shot building detection
Fine-tune SAM: For specific regions, fine-tune SAM on local data
Material Classification: Focus on Nacala dataset approach (5 classes) for better results
DINOv3 Integration: Consider DINOv3 + ViTPerHead for high-precision segmentation

📚 References & Resources

Models & Frameworks

YOLO - Object detection framework
Segment Anything Model (SAM) - Foundation model for segmentation
DINOv3 - Self-supervised vision transformer
Ground DINO - Zero-shot object detection

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
data		data
notebooks		notebooks
reports		reports
src		src
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

HOTOSM-EO-TT: Building Detection & Material Classification from Satellite Imagery

📋 Project Overview

🎯 Project Objectives

🏗️ Project Architecture

📊 Phase 1: Building Detection & Segmentation

Overview

🔬 Research & Development Process

1. YOLO Model Selection & Training

2. SAM Model Integration

3. Multi-Area Validation

📈 Phase 1 Results Summary

🎨 Phase 2: Material Detection & Classification

Overview

🔬 Three-Approach Investigation

Approach 1: Two-Stage Classification Pipeline

Approach 2: Direct Object Detection & Segmentation

Approach 3: Dataset Conversion & Training

🚀 Phase 2 Enhancement: Ground DINO Integration

Ground DINO for Building Detection

DINOv3 + ViTPerHead Segmentation

📈 Phase 2 Results Summary

🏗️ Technical Architecture

Model Architectures Used

1. YOLO (You Only Look Once)

2. SAM (Segment Anything Model)

3. DINOv3

4. Ground DINO

5. EfficientNet-B5 & VGG16

📊 Key Metrics Explained

mAP (mean Average Precision)

IoU (Intersection over Union)

Dice Loss

📁 Project Structure & Notebook Links

Data Acquisition Notebooks

Original Paper Experiments

Building Detection Pipelines

YOLO Training & ETL

SAM Training & Fine-tuning

DINOv3 Training

Inference Pipelines

Evaluation

Material Classification & Detection

Material Classification

Material Detection

📊 Performance Comparison

Building Detection Models

Segmentation Models

Material Classification

🔮 Recommendations

Technical Recommendations

📚 References & Resources

Models & Frameworks

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages