A comprehensive computer vision project implementing Meta AI's Segment Anything Model (SAM) for satellite imagery segmentation, with a focus on sidewalk and infrastructure detection from remote sensing data.
This project explores the application of foundational models in computer vision, specifically leveraging the Segment Anything Model (SAM) to segment sidewalks and other urban features from satellite imagery. The work demonstrates the adaptation of large-scale foundation models for specialized remote sensing applications.
- Foundation Model Application: Implement SAM for remote sensing use cases
- Sidewalk Segmentation: Specialized detection of sidewalks from satellite imagery
- Model Fine-tuning: Adapt pre-trained SAM for specific geospatial features
- Production Deployment: Deploy model via Hugging Face Spaces and web applications
The project follows a four-milestone development approach:
- Environment Setup - Docker containerization with PyTorch
- SAM Implementation - Base model integration and testing
- Model Fine-tuning - Custom training on sidewalk datasets
- Production Deployment - Web application and demonstration
- Deep Learning: PyTorch 2.2.2, TorchVision 0.17.2
- Computer Vision: Segment Anything Model (SAM), OpenCV
- Geospatial: GDAL, Rasterio, Geopandas, Leafmap
- Interactive Computing: Jupyter Notebooks, IPyWidgets
- Foundation Model: segment-anything, segment-anything-hq
- Geospatial Processing: segment-geospatial, pystac-client
- Data Handling: NumPy, Pandas, Pillow
- Visualization: Matplotlib, Folium, IPyLeaflet
- Containerization: Docker, Docker Compose
- Web Framework: Flask, Django (components)
- Cloud Storage: Azure Blob Storage, Google Cloud Storage
- Model Hosting: Hugging Face Spaces, Transformers
- Notebooks: Jupyter with geospatial extensions
- Data Visualization: Interactive maps with Folium and Leafmap
- Model Management: Hugging Face Hub integration
- Package Management: Comprehensive requirements with 240+ dependencies
- Docker and Docker Compose
- Python 3.9+
- CUDA-compatible GPU (recommended for training)
- At least 16GB RAM (for large model inference)
# Clone the repository
git clone <repository-url>
cd MetaAI-segment-anything-model
# Build and run the container
cd meta-ai-segment-anything-model
docker build -t working-sam .
docker-compose up --buildAfter building the container:
- Look for the server prompt:
http://127.0.0.1:8888/tree/notebooks... - Use this link to access your notebooks
- For local network access, replace
127.0.0.1with your computer's IP address
# Create virtual environment
python -m venv sam-env
source sam-env/bin/activate # On Windows: sam-env\Scripts\activate
# Install dependencies
pip install --upgrade pip
pip install -r meta-ai-segment-anything-model/requirements.txt
# Launch Jupyter
jupyter notebookObjective: Establish reproducible PyTorch environment for SAM development
Components:
- Dockerfile: Multi-stage build with Python 3.9 and PyTorch 2.2.2
- Docker Compose: Orchestrated Jupyter notebook service
- Requirements: Comprehensive dependency management with geospatial libraries
Features:
- PyTorch 2.2.2 with TorchVision 0.17.2
- GDAL 3.6.2 for geospatial data processing
- Jupyter notebook server on port 1128
- Volume mounting for persistent development
Usage:
cd meta-ai-segment-anything-model
docker build -t working-sam .
docker-compose up --buildObjective: Implement and demonstrate core SAM functionality on satellite imagery
Key Notebooks:
sam-reimplementation.ipynb: Core SAM implementation and testing- Interactive Colab Demo: Comprehensive satellite imagery segmentation
Features:
- Complete SAM model integration
- Satellite imagery preprocessing pipeline
- Interactive segmentation demonstrations
- Geospatial visualization with interactive maps
Capabilities:
- Automatic mask generation for satellite images
- Point and box prompt-based segmentation
- Multi-scale object detection and segmentation
- Geospatial coordinate integration
Objective: Specialize SAM for sidewalk and infrastructure segmentation
Key Components:
- Training Pipeline: Custom fine-tuning on sidewalk datasets
- Data Processing: Specialized preprocessing for urban imagery
- Model Optimization: Adaptation for remote sensing characteristics
Notebooks:
main.ipynb: Complete training workflow and experimentationpredict.ipynb: Inference demonstration on Google Earth screenshotsretrieve-parquet.ipynb: Dataset processing and management
Training Features:
- Custom dataset preparation for sidewalk segmentation
- Transfer learning from pre-trained SAM weights
- Performance monitoring and validation
- Model export for production deployment
Evaluation Metrics:
- Intersection over Union (IoU) for segmentation accuracy
- Precision and recall for sidewalk detection
- Visual quality assessment on diverse urban scenes
- Computational efficiency analysis
Objective: Deploy trained model for public access and demonstration
Deployment Platforms:
- Hugging Face Spaces: Interactive web application
- Video Demonstration: Comprehensive project walkthrough
Production Features:
- Real-time image upload and segmentation
- Interactive result visualization
- Model performance metrics display
- User-friendly web interface
Access Links:
- Live Application: Hugging Face Spaces
- Code Walkthrough: Demo Video
Web Application Features:
- Drag-and-drop image upload
- Real-time segmentation processing
- Interactive mask visualization
- Downloadable results
- Model performance statistics
Architecture: Vision Transformer (ViT) based encoder-decoder
- Model Size: 2.4B parameters (ViT-H backbone)
- Training Data: SA-1B dataset (11 million images, 1.1 billion masks)
- Zero-shot Transfer: Strong performance on unseen domains
Sidewalk Detection Metrics:
- IoU Score: 0.8
- Processing Speed: Real-time inference capability
Inference Performance:
- GPU Memory: ~8GB VRAM for ViT-H model
- Processing Time: ~2-5 seconds per image (depending on resolution)
- Input Resolution: Flexible, optimized for satellite imagery scales
- Output Quality: High-fidelity segmentation masks
The production model is deployed as an interactive web application on Hugging Face Spaces, providing:
- Real-time Segmentation: Upload satellite images for instant processing
- Interactive Interface: User-friendly web UI for non-technical users
- Result Visualization: High-quality mask overlays and downloadable outputs
- Model Information: Performance metrics and usage guidelines
# Run Flask application locally
cd meta-ai-segment-anything-model
python app.py
# Or use Docker for consistent environment
docker-compose up --buildCourse: Artificial Intelligence
Supervisor: Prof. Pantelis Monogioudis
Author: Uyen Nguyen
Institution: New Jersery Institute of Technology
This project represents the computer vision equivalent of large language models like GPT-3, demonstrating how foundation models can be adapted for specialized applications. The work explores:
- Foundation Model Transfer: Adapting general-purpose models for domain-specific tasks
- Geospatial AI: Application of deep learning to remote sensing data
- Model Specialization: Fine-tuning strategies for infrastructure detection
- Production AI: End-to-end deployment of research models
- SAM Paper: Segment Anything - Meta AI
- Foundation Models: On the Opportunities and Risks of Foundation Models
- Remote Sensing AI: Computer Vision Applications in Geospatial Analysis
- SAM Official: Segment Anything Model
- Meta AI Repository: segment-anything
- Geospatial SAM: segment-geospatial
- SA-1B Dataset: Meta's 11M image segmentation dataset
- Satellite Imagery: Various urban and suburban scenes
- Custom Sidewalk Dataset: Curated for infrastructure segmentation
- PyTorch: Deep Learning Framework
- Hugging Face: Model Hub and Deployment
- Leafmap: Interactive Geospatial Analysis
Winter Internship Project
Computer Vision & Remote Sensing Applications
NJIT Artificial Intelligence Course