Skip to content

Extend Duplicate Image Detection to Include Near-Duplicates #1

@CodingWithCard

Description

@CodingWithCard

Description

The current duplicate image detection function identifies exact duplicates. The goal now is to extend this functionality to find near-duplicates, defined as images that are similar but might have been slightly modified (e.g., cropped, resized, color-altered).

Task

Update the duplicate detection function to incorporate Perceptual Hashing (pHash). This approach allows for the generation of an image 'fingerprint' that remains consistent even with minor image modifications. These fingerprints can then be compared to find near-duplicate images.

Steps

  1. Research Perceptual Hashing (pHash) to understand its implementation.
  2. Refactor the get_hash function to calculate the pHash of an image instead of the SHA256 hash.
  3. Validate the updated function to confirm its ability to detect near-duplicate images.

Acceptance Criteria

  • The refactored function must be capable of detecting and moving near-duplicate images to the "duplicates" folder.
  • The function should retain its ability to detect and move exact duplicate images to the "duplicates" folder.
  • The function should not move images that are neither near-duplicates nor exact duplicates.

Metadata

Metadata

Labels

enhancementNew feature or request

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions