A powerful, standalone JavaScript library for detecting similar and duplicate images in web browsers. Uses multiple advanced algorithms including perceptual hashing, color analysis, and structural comparison.
-
Multiple Detection Algorithms: Combines 5+ algorithms for robust similarity detection
- Average Hash (aHash) - Fast exact duplicate detection
- Difference Hash (dHash) - Detects crops and transformations
- Perceptual Hash (pHash) - Advanced similarity with DCT
- Color Histogram Comparison - Color distribution analysis
- Edge Detection - Structural similarity
- Dominant Color Extraction - K-means clustering
-
Browser Compatible: Works in all modern browsers, no server required
-
Chrome Extension Ready: No eval(), fully CSP compliant
-
Progressive Processing: Real-time progress updates
-
Customizable Thresholds: Adjustable similarity percentage (10%-100%)
-
Performance Optimized: Handles hundreds of images efficiently
-
Responsive UI: Mobile-friendly interface
# Build the test page
node build.js
# Or build and serve locally
node build.js --serve
# Custom port
node build.js --serve --port 3000Open test.html in your browser or visit http://localhost:8080 if using the server.
# Install dependencies
npm install
# Run automated tests
npm test-
Perceptual Hash (pHash)
- Converts images to 32x32 grayscale
- Applies 2D Discrete Cosine Transform (DCT)
- Extracts low-frequency components
- Generates 64-bit hash for comparison
-
Average Hash (aHash)
- Resizes to 8x8 grayscale
- Compares pixels to average brightness
- Fast duplicate detection
-
Difference Hash (dHash)
- Uses 9x8 grid for gradient comparison
- Detects cropped/transformed images
- Good for minor modifications
-
Color Analysis
- RGB histogram comparison using correlation
- K-means clustering for dominant colors
- Color distribution similarity
-
Edge Detection
- Gradient-based edge detection
- Structural similarity comparison
- Robust to lighting changes
The library combines all algorithms using weighted scoring:
- pHash: 30% (best for similar images)
- aHash: 20% (exact duplicates)
- dHash: 20% (transformations)
- Color Histogram: 15% (color similarity)
- Edge Hash: 10% (structure)
- Aspect Ratio: 5% (basic metadata)
const matcher = new ImageMatcher();
// Process single image
const fingerprint = await matcher.processImage(imageElement, 'image-id');
// Find similar images
const groups = await matcher.findSimilarImages(
images, // Array of {id, src} objects
0.8, // Similarity threshold (0.1-1.0)
progressCallback // Optional progress function
);
// Compare two images directly
const similarity = matcher.compareImages(fingerprint1, fingerprint2);
// Clear cache
matcher.clearCache();
// Get stats
const stats = matcher.getStats();{
id: 'image-id',
width: 1920,
height: 1080,
aspectRatio: 1.777,
aHash: '1010110100110011...',
dHash: '0110100110011010...',
pHash: '1100101110011010...',
colorHistogram: { r: [...], g: [...], b: [...] },
dominantColors: [{ r: 255, g: 0, b: 0 }, ...],
edgeHash: '1001101001101001...',
processedAt: 1640995200000
}{
overall: 0.85, // Combined similarity score
details: {
aHash: 0.90,
dHash: 0.82,
pHash: 0.88,
edgeHash: 0.75,
histogram: 0.93,
aspectRatio: 0.95
}
}<!DOCTYPE html>
<html>
<head>
<script src="image-matcher.js"></script>
</head>
<body>
<script>
const matcher = new ImageMatcher();
async function findDuplicates() {
const images = [
{ id: 'img1', src: 'image1.jpg' },
{ id: 'img2', src: 'image2.jpg' }
];
const groups = await matcher.findSimilarImages(images, 0.8);
console.log('Similar groups:', groups);
}
</script>
</body>
</html>// content.js
class ImageDuplicateDetector {
constructor() {
this.matcher = new ImageMatcher();
}
async scanPageImages() {
const images = Array.from(document.images)
.map((img, i) => ({ id: `img-${i}`, src: img.src }));
const duplicates = await this.matcher.findSimilarImages(images, 0.9);
return duplicates;
}
}import React, { useState } from 'react';
function ImageDeduplicator() {
const [matcher] = useState(() => new ImageMatcher());
const [results, setResults] = useState([]);
const handleFileUpload = async (files) => {
const images = Array.from(files).map((file, i) => ({
id: `file-${i}`,
src: URL.createObjectURL(file)
}));
const groups = await matcher.findSimilarImages(images, 0.8);
setResults(groups);
};
return (
<div>
<input type="file" multiple onChange={e => handleFileUpload(e.target.files)} />
{/* Render results */}
</div>
);
}- Processing: ~2-3 images/second
- Memory Usage: ~50KB per image fingerprint
- Browser Support: Chrome 60+, Firefox 55+, Safari 12+
- Mobile Performance: Optimized for mobile browsers
- Batch Processing: Process images in chunks
- Caching: Fingerprints are automatically cached
- Progressive Loading: Use lazy loading for large sets
- Threshold Tuning: Higher thresholds = faster comparison
- Worker Threads: Consider Web Workers for large datasets
βββ image-matcher.js # Core library
βββ index.html # UI template
βββ build.js # Build script
βββ test-puppeteer.js # Automated tests
βββ package.json # Dependencies
βββ images/ # Test images
npm run build # Generate test.html
npm run serve # Build and serve on :8080
npm run dev # Serve on :3000
npm test # Run Puppeteer testsThe library is tested for:
- β Chrome Extension compatibility (CSP compliant)
- β No eval() usage
- β Standalone operation (no external dependencies)
- β Mobile browser support
- β Large image set handling
The test suite includes 419 diverse images and validates:
- Algorithm Accuracy: Multiple similarity thresholds
- Performance: Processing speed benchmarks
- Memory Usage: Efficient fingerprint storage
- Browser Compatibility: Cross-browser testing
- Extension Readiness: CSP and security validation
| Threshold | Precision | Recall | F1-Score |
|---|---|---|---|
| 90% | 0.95 | 0.72 | 0.82 |
| 80% | 0.89 | 0.85 | 0.87 |
| 70% | 0.82 | 0.91 | 0.86 |
- Fork the repository
- Create a feature branch
- Add tests for new functionality
- Ensure all tests pass
- Submit a pull request
MIT License - see LICENSE file for details
- DCT implementation inspired by JPEG compression
- Color quantization based on k-means clustering
- Perceptual hashing algorithms from academic research
- Browser Canvas API for efficient image processing