Extract comprehensive metadata and embedded text from images using Python. This tool analyzes EXIF data, GPS coordinates, camera settings, timestamps, and performs OCR with automatic language detection.
This module provides a complete solution for image metadata analysis, useful for:
- Digital Forensics - Verify image authenticity and provenance
- Photo Management - Organize and catalog image collections
- Content Verification - Extract creation details and modifications
- Research - Analyze camera settings and capture conditions
- Privacy Auditing - Identify potentially sensitive metadata
- EXIF Data - Camera make/model, lens info, serial numbers
- GPS Coordinates - Latitude, longitude, altitude, direction
- Timestamps - Capture date/time, creation, modification
- Camera Settings - Exposure, aperture, ISO, focal length, flash
- Image Properties - Resolution, orientation, color space, compression
- Software Tags - Editing applications and processing history
- Text Extraction - Tesseract OCR for 90+ languages
- Language Detection - Automatic identification of text language
- Multi-language Support - Handle mixed-language content
- Confidence Scoring - OCR accuracy metrics
- Interactive Display - Visual preview with annotations
- Summary Tables - Pandas DataFrames for analysis
- Structured Data - JSON-compatible dictionaries
Open the notebook directly in your browser - no installation required!
Steps:
- Click the badge above to open in Colab
- Run the setup cell to install dependencies (automatic)
- Upload your images or fetch from GitHub repository
- Execute the analysis cells
- View results: metadata tables, GPS maps, OCR text
# Create virtual environment
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
# Install dependencies
pip install pillow exifread opencv-python pytesseract langdetect matplotlib pandas
# Install Tesseract OCR
# Ubuntu/Debian: sudo apt-get install tesseract-ocr
# macOS: brew install tesseract
# Windows: Download from https://github.com/UB-Mannheim/tesseract/wiki- Make / Model - Device brand & model (e.g., Canon EOS 80D, iPhone 14)
- Lens Info - Lens model, focal length, zoom capabilities
- Serial Numbers - Unique identifiers for camera/lens (when available)
Use: Identify capture device and verify hardware specifications
- DateTimeOriginal - Exact moment photo was captured
- CreateDate / ModifyDate - File creation and last modification
- SubSecTimeOriginal - Fractional seconds for precision timing
- Timezone Information - Local time vs UTC
Use: Establish capture timeline and detect time inconsistencies
- Latitude / Longitude - Precise geographic coordinates
- Altitude - Elevation above sea level
- ImgDirection - Compass bearing of camera
- GPSDateStamp / GPSTimeStamp - GPS fix timestamp
Use: Geolocate images and map capture locations
- ExposureTime - Shutter speed (e.g., 1/200 sec)
- FNumber - Aperture setting (e.g., f/2.8)
- ISO - Sensor sensitivity (e.g., ISO 400)
- FocalLength - Lens zoom level (e.g., 50mm)
- Flash - Flash status (fired/not fired)
- MeteringMode - Exposure metering method
- WhiteBalance - Color temperature settings
- SceneType - Scene mode (portrait, landscape, etc.)
Use: Understand capture conditions and camera configuration
- Orientation - Portrait/landscape/rotated
- ImageWidth / ImageHeight - Resolution in pixels
- ColorSpace - Color encoding (sRGB, AdobeRGB)
- Compression - JPEG quality, encoding method
- BitsPerSample - Color depth per channel
Use: Verify image properties and quality settings
- Software - Application that saved/edited file (Photoshop, WhatsApp, etc.)
- CustomRendered - Post-processing applied
- DigitalZoomRatio - Digital zoom factor
- ModifyDate - Evidence of post-capture editing
Use: Detect modifications and trace editing workflow
- Title / Caption - Image descriptions
- Keywords / Tags - Categorization labels
- Copyright / Author - Ownership information
- Contact Info - Photographer details
- Usage Rights - Licensing restrictions
Use: Content management and rights tracking
=== Processing: DSC_0001.JPG ===
📷 Camera: Canon EOS 5D Mark IV
🔍 Lens: EF24-105mm f/4L IS USM
📅 Captured: 2024-03-15 14:32:18
🌍 Location: 37.7749° N, 122.4194° W (San Francisco, CA)
⚙️ Settings: f/4.0, 1/500s, ISO 200, 50mm
📝 OCR Text (English):
"Welcome to the Golden Gate Bridge. Built in 1937..."
🗺️ GPS: https://maps.google.com/?q=37.7749,-122.4194
| File | Camera | Date | GPS | OCR Language | Text Length |
|---|---|---|---|---|---|
| DSC_0001.JPG | Canon EOS 5D IV | 2024-03-15 | 37.77,-122.42 | English | 245 chars |
| IMG_5432.JPG | iPhone 14 Pro | 2024-03-16 | None | None | 0 chars |
All dependencies are automatically installed in Colab. For local use:
- Pillow - Image processing
- ExifRead - EXIF metadata parsing
- OpenCV - Image handling
- pytesseract - OCR engine wrapper
- langdetect - Language detection
- matplotlib - Visualization
- pandas - Data tables
- GPS Data - Can reveal home/work locations
- Timestamps - May expose daily routines
- Device IDs - Serial numbers can be linked to individuals
- Recommendation: Strip metadata before sharing sensitive images
- Not Always Present - Screenshots and social media exports often lack metadata
- Can Be Altered - Metadata is not cryptographically secure
- Stripped by Platforms - Many websites remove metadata automatically
- Accuracy Varies - Depends on image quality, font, lighting
- Language Support - Some languages require additional Tesseract data
- Performance - Large images or many images may be slow
- Verify image authenticity by checking timestamps and device info
- Detect manipulated images through metadata inconsistencies
- Geolocate events using GPS coordinates
- Auto-organize photos by camera, date, or location
- Generate searchable tags from metadata
- Create timeline visualizations
- Confirm original source of viral images
- Check if image has been edited (ModifyDate)
- Extract copyright and author information
- Study camera settings across professional photographers
- Analyze GPS patterns in wildlife photography
- Extract text from scanned documents and signs
This project is open-sourced under the MIT License.
- Scene Classifier - Classify image scenes
- Image Search - Find similar images
- Privacy Anonymizer - Remove identifying information