Skip to content

pengwingokla/MetaAI-SAM-satellite-imagery-segmentation

Repository files navigation

Segment Anything Model for Remote Sensing Applications

A comprehensive computer vision project implementing Meta AI's Segment Anything Model (SAM) for satellite imagery segmentation, with a focus on sidewalk and infrastructure detection from remote sensing data.

🎯 Project Overview

This project explores the application of foundational models in computer vision, specifically leveraging the Segment Anything Model (SAM) to segment sidewalks and other urban features from satellite imagery. The work demonstrates the adaptation of large-scale foundation models for specialized remote sensing applications.

Key Objectives

  • Foundation Model Application: Implement SAM for remote sensing use cases
  • Sidewalk Segmentation: Specialized detection of sidewalks from satellite imagery
  • Model Fine-tuning: Adapt pre-trained SAM for specific geospatial features
  • Production Deployment: Deploy model via Hugging Face Spaces and web applications

πŸ—οΈ System Architecture

The project follows a four-milestone development approach:

  1. Environment Setup - Docker containerization with PyTorch
  2. SAM Implementation - Base model integration and testing
  3. Model Fine-tuning - Custom training on sidewalk datasets
  4. Production Deployment - Web application and demonstration

πŸ“‹ Table of Contents

πŸ› οΈ Tech Stack

Core Frameworks

  • Deep Learning: PyTorch 2.2.2, TorchVision 0.17.2
  • Computer Vision: Segment Anything Model (SAM), OpenCV
  • Geospatial: GDAL, Rasterio, Geopandas, Leafmap
  • Interactive Computing: Jupyter Notebooks, IPyWidgets

Model & Data Processing

  • Foundation Model: segment-anything, segment-anything-hq
  • Geospatial Processing: segment-geospatial, pystac-client
  • Data Handling: NumPy, Pandas, Pillow
  • Visualization: Matplotlib, Folium, IPyLeaflet

Deployment & Infrastructure

  • Containerization: Docker, Docker Compose
  • Web Framework: Flask, Django (components)
  • Cloud Storage: Azure Blob Storage, Google Cloud Storage
  • Model Hosting: Hugging Face Spaces, Transformers

Development Environment

  • Notebooks: Jupyter with geospatial extensions
  • Data Visualization: Interactive maps with Folium and Leafmap
  • Model Management: Hugging Face Hub integration
  • Package Management: Comprehensive requirements with 240+ dependencies

πŸš€ Installation

Prerequisites

  • Docker and Docker Compose
  • Python 3.9+
  • CUDA-compatible GPU (recommended for training)
  • At least 16GB RAM (for large model inference)

Quick Start with Docker

# Clone the repository
git clone <repository-url>
cd MetaAI-segment-anything-model

# Build and run the container
cd meta-ai-segment-anything-model
docker build -t working-sam .
docker-compose up --build

Access Jupyter Environment

After building the container:

  1. Look for the server prompt: http://127.0.0.1:8888/tree/notebooks...
  2. Use this link to access your notebooks
  3. For local network access, replace 127.0.0.1 with your computer's IP address

Local Development Setup

# Create virtual environment
python -m venv sam-env
source sam-env/bin/activate  # On Windows: sam-env\Scripts\activate

# Install dependencies
pip install --upgrade pip
pip install -r meta-ai-segment-anything-model/requirements.txt

# Launch Jupyter
jupyter notebook

πŸ”„ Project Milestones

Milestone 1: Docker Environment Setup

Objective: Establish reproducible PyTorch environment for SAM development

Components:

  • Dockerfile: Multi-stage build with Python 3.9 and PyTorch 2.2.2
  • Docker Compose: Orchestrated Jupyter notebook service
  • Requirements: Comprehensive dependency management with geospatial libraries

Features:

  • PyTorch 2.2.2 with TorchVision 0.17.2
  • GDAL 3.6.2 for geospatial data processing
  • Jupyter notebook server on port 1128
  • Volume mounting for persistent development

Usage:

cd meta-ai-segment-anything-model
docker build -t working-sam .
docker-compose up --build

Milestone 2: SAM Implementation & Replication

Objective: Implement and demonstrate core SAM functionality on satellite imagery

Key Notebooks:

  • sam-reimplementation.ipynb: Core SAM implementation and testing
  • Interactive Colab Demo: Comprehensive satellite imagery segmentation

Features:

  • Complete SAM model integration
  • Satellite imagery preprocessing pipeline
  • Interactive segmentation demonstrations
  • Geospatial visualization with interactive maps

Demo Access: Open In Colab

Capabilities:

  • Automatic mask generation for satellite images
  • Point and box prompt-based segmentation
  • Multi-scale object detection and segmentation
  • Geospatial coordinate integration

Milestone 3: Model Fine-tuning for Sidewalk Detection

Objective: Specialize SAM for sidewalk and infrastructure segmentation

Key Components:

  • Training Pipeline: Custom fine-tuning on sidewalk datasets
  • Data Processing: Specialized preprocessing for urban imagery
  • Model Optimization: Adaptation for remote sensing characteristics

Notebooks:

  • main.ipynb: Complete training workflow and experimentation
  • predict.ipynb: Inference demonstration on Google Earth screenshots
  • retrieve-parquet.ipynb: Dataset processing and management

Training Features:

  • Custom dataset preparation for sidewalk segmentation
  • Transfer learning from pre-trained SAM weights
  • Performance monitoring and validation
  • Model export for production deployment

Evaluation Metrics:

  • Intersection over Union (IoU) for segmentation accuracy
  • Precision and recall for sidewalk detection
  • Visual quality assessment on diverse urban scenes
  • Computational efficiency analysis

Milestone 4: Production Deployment & Demonstration

Objective: Deploy trained model for public access and demonstration

Deployment Platforms:

  • Hugging Face Spaces: Interactive web application
  • Video Demonstration: Comprehensive project walkthrough

Production Features:

  • Real-time image upload and segmentation
  • Interactive result visualization
  • Model performance metrics display
  • User-friendly web interface

Access Links:

Web Application Features:

  • Drag-and-drop image upload
  • Real-time segmentation processing
  • Interactive mask visualization
  • Downloadable results
  • Model performance statistics

πŸ“Š Model Performance

SAM Base Model Capabilities

Architecture: Vision Transformer (ViT) based encoder-decoder

  • Model Size: 2.4B parameters (ViT-H backbone)
  • Training Data: SA-1B dataset (11 million images, 1.1 billion masks)
  • Zero-shot Transfer: Strong performance on unseen domains

Fine-tuned Model Performance

Sidewalk Detection Metrics:

  • IoU Score: 0.8
  • Processing Speed: Real-time inference capability

Technical Specifications

Inference Performance:

  • GPU Memory: ~8GB VRAM for ViT-H model
  • Processing Time: ~2-5 seconds per image (depending on resolution)
  • Input Resolution: Flexible, optimized for satellite imagery scales
  • Output Quality: High-fidelity segmentation masks

🌐 Deployment

Hugging Face Spaces Application

The production model is deployed as an interactive web application on Hugging Face Spaces, providing:

  • Real-time Segmentation: Upload satellite images for instant processing
  • Interactive Interface: User-friendly web UI for non-technical users
  • Result Visualization: High-quality mask overlays and downloadable outputs
  • Model Information: Performance metrics and usage guidelines

Local Deployment Options

# Run Flask application locally
cd meta-ai-segment-anything-model
python app.py

# Or use Docker for consistent environment
docker-compose up --build

πŸŽ“ Academic Context

Course: Artificial Intelligence
Supervisor: Prof. Pantelis Monogioudis
Author: Uyen Nguyen
Institution: New Jersery Institute of Technology

Research Background

This project represents the computer vision equivalent of large language models like GPT-3, demonstrating how foundation models can be adapted for specialized applications. The work explores:

  • Foundation Model Transfer: Adapting general-purpose models for domain-specific tasks
  • Geospatial AI: Application of deep learning to remote sensing data
  • Model Specialization: Fine-tuning strategies for infrastructure detection
  • Production AI: End-to-end deployment of research models

πŸ“š References and Resources

Research Papers

Technical Resources

Datasets and Benchmarks

  • SA-1B Dataset: Meta's 11M image segmentation dataset
  • Satellite Imagery: Various urban and suburban scenes
  • Custom Sidewalk Dataset: Curated for infrastructure segmentation

Development Tools


Winter Internship Project
Computer Vision & Remote Sensing Applications
NJIT Artificial Intelligence Course

Releases

No releases published

Packages

 
 
 

Contributors 3

  •  
  •  
  •  

Languages