A comprehensive collection of AI-powered computer vision tools for image analysis, search, classification, and privacy protection. Built with state-of-the-art deep learning models including CLIP, YOLOv8, and FAISS.
Author: Madhavendranath S Email: madhavendranaths@gmail.com
This toolkit provides production-ready implementations of common computer vision tasks, leveraging modern AI models for:
- Image Metadata Extraction & OCR - Extract EXIF data, GPS coordinates, and text from images
- Scene Classification - Classify images as indoor/outdoor and identify specific scene types
- Semantic Image Search - Search images using natural language or reference images
- Privacy Anonymization - Automatically detect and blur/pixelate people in images
- Image Similarity Analysis - Robust similarity scoring across transformations
- Object-Level Search - Find images containing specific objects
All modules are designed to be used independently or integrated into larger systems.
- Zero-shot scene classification (indoor/outdoor + 20+ scene types)
- Text-to-image search with natural language queries
- Image-to-image similarity search
- Object-level retrieval across large datasets
- Automatic human detection and anonymization
- Multiple privacy modes (blur, mosaic, black-fill)
- Face detection fallback for edge cases
- Comprehensive metadata extraction (EXIF, GPS, timestamps)
- OCR with language detection
- Multi-metric similarity scoring (pHash, SSIM, ORB)
- Robustness testing across transformations
- GPU acceleration support (CUDA/MPS)
- Efficient caching systems
- Batch processing capabilities
- Production-optimized implementations
- Python 3.9 or higher
- Virtual environment (recommended)
- GPU recommended for optimal performance (optional)
# Clone the repository
git clone https://github.com/Madhav-000-s/image-analysis-toolkit.git
cd image-analysis-toolkit
# Navigate to desired module (example: scene classifier)
cd indoor-outdoor-classifier
# Create virtual environment
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt# Classify an image scene
python src/predict.py --image path/to/image.jpg --out_dir outputs
# Search images with text
cd ../search-images-with-text-or-image
python src/predict.py --text-search "sunset beach"
# Anonymize people in image
cd ../human-only-blur
python src/blur.py --to-blur path/to/image.jpgLocation: image_metadata_showcase/
Extract comprehensive metadata from images including EXIF data, GPS coordinates, camera settings, timestamps, and embedded text.
Key Features:
- EXIF/IPTC/XMP metadata parsing
- GPS coordinate extraction and mapping
- OCR with Tesseract (90+ languages)
- Language detection
- Camera settings analysis
How to Use:
- Open the Colab notebook for interactive usage
- Ideal for digital forensics, photo management, content verification
Location: indoor-outdoor-classifier/
Zero-shot scene classification using CLIP (ViT-B/32) trained on LAION-2B. Classifies images as indoor/outdoor and identifies specific scene types (office, park, street, etc.).
Key Features:
- Indoor vs outdoor classification
- 20+ scene type recognition (customizable)
- Confidence blending for improved accuracy
- Batch processing support
- Annotated preview generation
Usage:
# Single image
python src/predict.py --image path/to/image.jpg --out_dir outputs
# Batch processing
python src/predict.py --images_dir images/ --batch_size 8 --topk 3Outputs: CSV with predictions + annotated preview images
Location: search-images-with-text-or-image/
Semantic image search using CLIP embeddings. Find images using natural language descriptions or reference images.
Key Features:
- Text-to-image search ("desert sunset", "busy street")
- Image-to-image similarity search
- Fast cached embedding system
- Cosine similarity with softmax scoring
- Support for animated GIFs
Usage:
# Text search
python src/predict.py --text-search "desert landscape" --topk 5
# Image search
python src/predict.py --image-search query.jpg --topk 5
# Rebuild index after adding images
python src/predict.py --reindexOutputs: Ranked results with similarity scores + JSON export
Location: human-only-blur/
Automatically detect and anonymize people in images using YOLOv8 segmentation. Essential for GDPR compliance and privacy protection.
Key Features:
- Precise human segmentation (not just bounding boxes)
- Three anonymization modes: Gaussian blur, mosaic pixelation, black-fill
- Adjustable blur strength and feathering
- Face detection fallback (Haar cascade)
- GPU acceleration
Usage:
# Basic blur
python src/blur.py --to-blur image.jpg
# Strong pixelation
python src/blur.py --to-blur image.jpg --mode mosaic --mosaic-tile 24
# Maximum privacy (black fill)
python src/blur.py --to-blur image.jpg --mode blackOutputs: Anonymized images in blurredimages/
Location: image-similarity-suite/
Comprehensive image similarity testing with multiple metrics. Analyze robustness across rotations, scaling, compression, and filters.
Key Features:
- Multi-metric scoring (pHash, dHash, aHash, SSIM, ORB)
- Parametric testing (rotation angles, scale factors, JPEG quality)
- Automatic plot generation
- Preview montage grid
- CSV export for analysis
Usage:
python src/similarity_suite.py --image path/to/image.jpg \
--angles 1,10,17 \
--scales 0.5,0.75,1.25 \
--do-gray --do-blur --do-sharpen \
--jpeg-qualities 95,80,60 \
--preview-gridOutputs:
results.csv- Similarity scores per transformationplots/- Metric vs parameter graphspreviews/- Visual comparison grid
Location: object-level-image-search/
Advanced object-level image retrieval using YOLOv8 detection + CLIP embeddings + FAISS indexing. Find images containing similar objects.
Key Features:
- Object detection with YOLOv8
- Per-object CLIP embeddings
- FAISS vector similarity search
- Automatic region caching
- IoU-based deduplication
- Optional query-side detection
Usage:
# Build index (first time)
python src/script.py --reindex --image-folder images/
# Search for object
python src/script.py --search-object query.jpg --topk 10
# Advanced options
python src/script.py --search-object query.jpg \
--detector yolov8s.pt \
--min-conf 0.3 \
--save-viz \
--allow-multiple-per-imageOutputs: Ranked object matches + optional visualizations
| Module | Model(s) | Purpose | Size |
|---|---|---|---|
| Metadata Extractor | Tesseract OCR | Text extraction | ~40MB |
| Scene Classifier | CLIP ViT-B/32 (LAION-2B) | Zero-shot classification | ~600MB |
| Image Search | CLIP ViT-B/32 (LAION-2B) | Semantic embeddings | ~600MB |
| Privacy Anonymizer | YOLOv8-seg (nano) | Human segmentation | ~7MB |
| OpenCV Haar Cascade | Face detection fallback | <1MB | |
| Similarity Analyzer | ORB (OpenCV) | Feature matching | N/A (classical) |
| Object Search | YOLOv8 (nano) | Object detection | ~6.5MB |
| CLIP ViT-B/32 | Object embeddings | ~600MB | |
| FAISS | Vector search | N/A (library) |
All models are automatically downloaded on first run.
- Automated photo organization and tagging
- Visual search for stock photo libraries
- Duplicate image detection
- Content moderation
- GDPR-compliant image anonymization
- Dataset preparation for machine learning
- Public media privacy protection
- Automated redaction pipelines
- Digital forensics and metadata analysis
- Image provenance verification
- Computer vision benchmarking
- Similarity testing and evaluation
- Visual product search
- Scene-based product recommendations
- Similar item discovery
- Catalog organization
- Scene classification for monitoring
- Privacy-preserving video analytics
- Object-based search in footage
- Metadata extraction for evidence
- Python: 3.9 or higher
- OS: Windows, Linux, macOS
- RAM: 8GB minimum (16GB recommended for large batches)
- Storage: 2-5GB for models and cache
- NVIDIA GPU: CUDA 11.8+ (PyTorch with CUDA)
- Apple Silicon: MPS acceleration (M1/M2/M3)
- Performance: 5-10x faster inference with GPU
Each module has its own requirements.txt. Common dependencies:
- PyTorch (torch, torchvision)
- OpenCV (opencv-python)
- Pillow (PIL)
- NumPy, Pandas
- Model-specific libraries (open-clip-torch, ultralytics, transformers, faiss)
image-analysis-toolkit/
βββ image_metadata_showcase/ # Metadata extraction
βββ indoor-outdoor-classifier/ # Scene classification
βββ search-images-with-text-or-image/ # Image search
βββ human-only-blur/ # Privacy anonymizer
βββ image-similarity-suite/ # Similarity analysis
βββ object-level-image-search/ # Object search
βββ LICENSE # MIT License
βββ README.md # This file
- Always use virtual environments for isolation
- Cache model weights to avoid re-downloading
- Use GPU acceleration for production workloads
- Batch process images when possible
- Monitor memory usage for large datasets
This project is licensed under the MIT License - see the LICENSE file for details.
Contributions are welcome! Areas for improvement:
- Additional scene categories
- New anonymization modes
- Performance optimizations
- Additional similarity metrics
- Better error handling
- Documentation improvements
Madhavendranath S π§ Email: madhavendranaths@gmail.com π GitHub: @Madhav-000-s
Built with:
- OpenAI CLIP - Vision-language models
- OpenCLIP - Open source CLIP implementation
- Ultralytics YOLOv8 - Object detection
- FAISS - Vector similarity search
- Tesseract OCR - Text extraction
Pretrained weights from:
- LAION-2B dataset
- COCO dataset (YOLOv8)