A high-performance Python tool for analyzing large image datasets to identify images with specific dimensions on either width or height.
This tool provides efficient parallel processing and comprehensive reporting capabilities for dimension-based image filtering across large datasets (tested with 30+ GB of image data).
- Parallel Processing: Multi-threaded analysis using configurable worker pools
- Dual Output Mode: Generate both complete dataset analysis and filtered results
- Real-time Progress: Visual progress tracking with tqdm
- Flexible Matching: Support for exact or less-than-equal dimension matching
- Comprehensive Logging: Both console and file logging for debugging
- Multi-format Support: Handles JPG, JPEG, PNG, TIFF, TIF, BMP, GIF, WEBP
- Memory Efficient: Processes images individually to handle large datasets
# Clone the repository
git clone https://github.com/ecpantalone/image-dimension-analyzer.git
cd image-dimension-analyzer
# Install dependencies
pip install -r requirements.txtPillow- Image processingtqdm- Progress bar visualizationFlask- Web UI framework (optional, for web interface)
You can use this tool either via command line or through a web interface.
# Analyze a directory of images (default: 330px)
python analyze_images.py /path/to/images
# Search for images with 500px dimension
python analyze_images.py /path/to/images --dimension 500
# Search for images with exactly 200px dimension
python analyze_images.py /path/to/images --dimension 200 --mode exact# Specify custom dimension and output directory
python analyze_images.py /path/to/images --dimension 800 --output-dir ./results
# Use exact dimension matching (e.g., exactly 1024px)
python analyze_images.py /path/to/images --dimension 1024 --mode exact
# Adjust number of parallel workers
python analyze_images.py /path/to/images --workers 8
# Disable filtered output (only generate complete analysis)
python analyze_images.py /path/to/images --no-filtered
# Combine multiple options
python analyze_images.py /path/to/images --dimension 150 --mode lte --workers 12| Argument | Description | Default |
|---|---|---|
directory |
Path to directory containing images | Required |
--dimension, -d |
Target dimension to search for (in pixels) | 330 |
--output-dir |
Directory for output CSV files | Current directory |
--workers, -w |
Number of parallel workers | 4 |
--mode |
Matching mode: 'lte' (less than or equal) or 'exact' (exact match) | 'lte' |
--no-filtered |
Skip creating filtered CSV file | False |
The tool generates two CSV files with timestamps:
image_analysis_all_[YYYYMMDD_HHMMSS].csv- Complete dataset analysisimage_analysis_[dimension]px_[YYYYMMDD_HHMMSS].csv- Only images matching dimension criteria
For example, when searching for 500px images, the filtered file would be named image_analysis_500px_20231115_143022.csv.
file_path- Full path to the image filefilename- Name of the filewidth- Image width in pixelsheight- Image height in pixelshas_target_dimension- Boolean flag for target dimension matchfile_size_mb- File size in megabytes
- Default configuration uses 4 parallel workers
- Optimized for I/O-bound operations using ThreadPoolExecutor
- Memory-efficient single-image processing
- Handles corrupted/unreadable images gracefully
The target dimension is configurable via command-line argument. The default is 330px:
# Use default 330px
python analyze_images.py /path/to/images
# Use custom dimension
python analyze_images.py /path/to/images --dimension 768Modify SUPPORTED_FORMATS to add or remove image formats:
SUPPORTED_FORMATS = {'.jpg', '.jpeg', '.png', '.tiff', '.tif', '.bmp', '.gif', '.webp'}Logs are written to multiple destinations:
- Console output (INFO level and above)
image_analysis.logfile (all logs - INFO, WARNING, ERROR)image_analysis_errors.logfile (ERROR level only - for quick error review)
Run the test suite:
python test_analyze_images.pyThe test suite includes tests for:
- Different target dimensions (200px, 330px, 500px, etc.)
- Both matching modes (lte and exact)
- Edge cases and error handling
image-dimension-analyzer/
├── analyze_images.py # Main analysis script
├── test_analyze_images.py # Test suite
├── requirements.txt # Python dependencies
├── README.md # This file
├── PROJECT_STATUS.md # Development status tracking
├── .gitignore # Git ignore file
└── image_analysis.log # Generated log file (after first run)
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
python analyze_images.py ./media --dimension 150 --mode ltepython analyze_images.py ./photos --dimension 1920 --mode exactpython analyze_images.py ./website/images --dimension 768 --workers 8The tool includes a web-based user interface for easier interaction.
# Start the Flask web server
python app.py
# The web interface will be available at http://localhost:5001
# Note: Port 5001 is used to avoid conflicts with AirPlay Receiver on macOS- Visual Interface: User-friendly form for configuring analysis parameters
- Directory Browser: Browse and select directories directly from the UI
- Real-time Progress: Live progress updates during analysis
- Results Dashboard: View analysis statistics and download CSV reports
- Recent Analyses: Track history of recent analysis jobs
- Background Processing: Run multiple analyses without blocking the interface
The web interface provides:
- A clean form to input analysis parameters
- Real-time progress tracking with percentage and statistics
- Results summary with download options for CSV files
- History of recent analyses with their status
If you want to integrate with the web service programmatically:
POST /analyze- Start a new analysis jobGET /status/<job_id>- Get status of an analysis jobGET /download/<job_id>/<type>- Download results (type: 'all' or 'matching')GET /recent- Get list of recent analysis jobsGET /browse?path=<path>- Browse directories on the server
For issues or questions, please open an issue in the repository.