Image Metadata Extractor & OCR

Extract comprehensive metadata and embedded text from images using Python. This tool analyzes EXIF data, GPS coordinates, camera settings, timestamps, and performs OCR with automatic language detection.

🎯 Purpose

This module provides a complete solution for image metadata analysis, useful for:

Digital Forensics - Verify image authenticity and provenance
Photo Management - Organize and catalog image collections
Content Verification - Extract creation details and modifications
Research - Analyze camera settings and capture conditions
Privacy Auditing - Identify potentially sensitive metadata

✨ Features

Metadata Extraction

EXIF Data - Camera make/model, lens info, serial numbers
GPS Coordinates - Latitude, longitude, altitude, direction
Timestamps - Capture date/time, creation, modification
Camera Settings - Exposure, aperture, ISO, focal length, flash
Image Properties - Resolution, orientation, color space, compression
Software Tags - Editing applications and processing history

OCR Capabilities

Text Extraction - Tesseract OCR for 90+ languages
Language Detection - Automatic identification of text language
Multi-language Support - Handle mixed-language content
Confidence Scoring - OCR accuracy metrics

Output Formats

Interactive Display - Visual preview with annotations
Summary Tables - Pandas DataFrames for analysis
Structured Data - JSON-compatible dictionaries

🏃 How to Use

Option 1: Google Colab (Recommended)

Open the notebook directly in your browser - no installation required!

Steps:

Click the badge above to open in Colab
Run the setup cell to install dependencies (automatic)
Upload your images or fetch from GitHub repository
Execute the analysis cells
View results: metadata tables, GPS maps, OCR text

Option 2: Local Installation

# Create virtual environment
python -m venv .venv
source .venv/bin/activate    # Windows: .venv\Scripts\activate

# Install dependencies
pip install pillow exifread opencv-python pytesseract langdetect matplotlib pandas

# Install Tesseract OCR
# Ubuntu/Debian: sudo apt-get install tesseract-ocr
# macOS: brew install tesseract
# Windows: Download from https://github.com/UB-Mannheim/tesseract/wiki

📊 What Can Be Extracted?

📸 Camera & Device Information

Make / Model - Device brand & model (e.g., Canon EOS 80D, iPhone 14)
Lens Info - Lens model, focal length, zoom capabilities
Serial Numbers - Unique identifiers for camera/lens (when available)

Use: Identify capture device and verify hardware specifications

🕒 Date & Time Information

DateTimeOriginal - Exact moment photo was captured
CreateDate / ModifyDate - File creation and last modification
SubSecTimeOriginal - Fractional seconds for precision timing
Timezone Information - Local time vs UTC

Use: Establish capture timeline and detect time inconsistencies

🌍 Location Data (GPS)

Latitude / Longitude - Precise geographic coordinates
Altitude - Elevation above sea level
ImgDirection - Compass bearing of camera
GPSDateStamp / GPSTimeStamp - GPS fix timestamp

Use: Geolocate images and map capture locations

⚠️ Privacy Note: GPS data can reveal sensitive location information

⚙️ Camera Settings

ExposureTime - Shutter speed (e.g., 1/200 sec)
FNumber - Aperture setting (e.g., f/2.8)
ISO - Sensor sensitivity (e.g., ISO 400)
FocalLength - Lens zoom level (e.g., 50mm)
Flash - Flash status (fired/not fired)
MeteringMode - Exposure metering method
WhiteBalance - Color temperature settings
SceneType - Scene mode (portrait, landscape, etc.)

Use: Understand capture conditions and camera configuration

🖼️ Image Characteristics

Orientation - Portrait/landscape/rotated
ImageWidth / ImageHeight - Resolution in pixels
ColorSpace - Color encoding (sRGB, AdobeRGB)
Compression - JPEG quality, encoding method
BitsPerSample - Color depth per channel

Use: Verify image properties and quality settings

🧭 Software & Editing History

Software - Application that saved/edited file (Photoshop, WhatsApp, etc.)
CustomRendered - Post-processing applied
DigitalZoomRatio - Digital zoom factor
ModifyDate - Evidence of post-capture editing

Use: Detect modifications and trace editing workflow

🧾 IPTC & XMP Metadata

Title / Caption - Image descriptions
Keywords / Tags - Categorization labels
Copyright / Author - Ownership information
Contact Info - Photographer details
Usage Rights - Licensing restrictions

Use: Content management and rights tracking

🔬 Example Output

Console Display

=== Processing: DSC_0001.JPG ===

📷 Camera: Canon EOS 5D Mark IV
🔍 Lens: EF24-105mm f/4L IS USM
📅 Captured: 2024-03-15 14:32:18
🌍 Location: 37.7749° N, 122.4194° W (San Francisco, CA)
⚙️ Settings: f/4.0, 1/500s, ISO 200, 50mm

📝 OCR Text (English):
"Welcome to the Golden Gate Bridge. Built in 1937..."

🗺️ GPS: https://maps.google.com/?q=37.7749,-122.4194

Summary Table

File	Camera	Date	GPS	OCR Language	Text Length
DSC_0001.JPG	Canon EOS 5D IV	2024-03-15	37.77,-122.42	English	245 chars
IMG_5432.JPG	iPhone 14 Pro	2024-03-16	None	None	0 chars

🧰 Dependencies

All dependencies are automatically installed in Colab. For local use:

Pillow - Image processing
ExifRead - EXIF metadata parsing
OpenCV - Image handling
pytesseract - OCR engine wrapper
langdetect - Language detection
matplotlib - Visualization
pandas - Data tables

⚠️ Important Considerations

Privacy & Security

GPS Data - Can reveal home/work locations
Timestamps - May expose daily routines
Device IDs - Serial numbers can be linked to individuals
Recommendation: Strip metadata before sharing sensitive images

Metadata Reliability

Not Always Present - Screenshots and social media exports often lack metadata
Can Be Altered - Metadata is not cryptographically secure
Stripped by Platforms - Many websites remove metadata automatically

OCR Limitations

Accuracy Varies - Depends on image quality, font, lighting
Language Support - Some languages require additional Tesseract data
Performance - Large images or many images may be slow

💡 Use Cases

Digital Forensics

Verify image authenticity by checking timestamps and device info
Detect manipulated images through metadata inconsistencies
Geolocate events using GPS coordinates

Photo Management

Auto-organize photos by camera, date, or location
Generate searchable tags from metadata
Create timeline visualizations

Content Verification

Confirm original source of viral images
Check if image has been edited (ModifyDate)
Extract copyright and author information

Research & Analysis

Study camera settings across professional photographers
Analyze GPS patterns in wildlife photography
Extract text from scanned documents and signs

📜 License

This project is open-sourced under the MIT License.

🔗 Related Modules

Scene Classifier - Classify image scenes
Image Search - Find similar images
Privacy Anonymizer - Remove identifying information

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Image Metadata Extractor & OCR

🎯 Purpose

✨ Features

Metadata Extraction

OCR Capabilities

Output Formats

🏃 How to Use

Option 1: Google Colab (Recommended)

Option 2: Local Installation

📊 What Can Be Extracted?

📸 Camera & Device Information

🕒 Date & Time Information

🌍 Location Data (GPS)

⚙️ Camera Settings

🖼️ Image Characteristics

🧭 Software & Editing History

🧾 IPTC & XMP Metadata

🔬 Example Output

Console Display

Summary Table

🧰 Dependencies

⚠️ Important Considerations

Privacy & Security

Metadata Reliability

OCR Limitations

💡 Use Cases

Digital Forensics

Photo Management

Content Verification

Research & Analysis

📜 License

🔗 Related Modules

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Image Metadata Extractor & OCR

🎯 Purpose

✨ Features

Metadata Extraction

OCR Capabilities

Output Formats

🏃 How to Use

Option 1: Google Colab (Recommended)

Option 2: Local Installation

📊 What Can Be Extracted?

📸 Camera & Device Information

🕒 Date & Time Information

🌍 Location Data (GPS)

⚙️ Camera Settings

🖼️ Image Characteristics

🧭 Software & Editing History

🧾 IPTC & XMP Metadata

🔬 Example Output

Console Display

Summary Table

🧰 Dependencies

⚠️ Important Considerations

Privacy & Security

Metadata Reliability

OCR Limitations

💡 Use Cases

Digital Forensics

Photo Management

Content Verification

Research & Analysis

📜 License

🔗 Related Modules