Skip to content

An OpenCV project that automatically detects a document in an image and finds its corners for perspective correction. This notebook demonstrates a practical computer vision pipeline using morphological operations, Canny edge detection, and contour analysis to prepare for a "bird's-eye view" transformation.

License

Notifications You must be signed in to change notification settings

imehranasgari/Auto-Document-Scanner-OpenCV

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

📄 Automatic Perspective Correction for Document Scanning

🎯 Problem Statement and Goal of Project

When capturing images of documents with a phone or camera, they are often skewed due to the camera angle. This project implements a computer vision pipeline to automatically find the corners of a document in an image. The primary goal is to identify the precise four-point contour of the paper, which is the necessary first step before applying a perspective transform (or "bird's-eye view") to create a flat, top-down scanned image.

💡 Solution Approach

This project demonstrates a classic and effective pipeline for quadrilateral detection using OpenCV.

  1. Image Preprocessing:

    • The image is first loaded and resized to a maximum dimension of 1080px to ensure consistent processing speed while maintaining its aspect ratio.
    • A Morphological Closing operation is applied. This step is key to removing small details like text and noise, effectively creating a solid white blob of the paper against its background.
  2. Edge Detection:

    • The preprocessed image is converted to grayscale and a Gaussian blur is applied to further reduce noise.
    • Canny Edge Detection is used to identify the sharp outlines of the document blob.
    • A Dilation is performed to connect any small breaks in the detected edge lines, ensuring a single, solid contour.
  3. Contour & Corner Detection:

    • cv2.findContours is used to find all closed shapes in the Canny edge image.
    • The contours are sorted by area, and only the top 5 largest are considered (as the document is assumed to be the main object).
    • The script iterates through these top contours and uses cv2.approxPolyDP to find the simplest approximation of the shape.
    • The loop breaks upon finding the first contour that is a quadrilateral (4-sided polygon), which is our document.
  4. Corner Sorting:

    • A helper function, order_points, is used to sort the four detected corners into a consistent (top-left, top-right, bottom-right, bottom-left) order. This prepares the coordinates for the final perspective transformation step (which would be cv2.warpPerspective).

🛠️ Technologies & Libraries

  • OpenCV (cv2): Used for all core computer vision tasks (image loading, resizing, morphological operations, Canny edge detection, and contour finding).
  • NumPy: For numerical operations and array manipulation.
  • Matplotlib: Used within the Jupyter Notebook to visualize the output of each processing step.

💾 Description about Dataset

The project uses a single sample image, images/scan.jpg, which is a clear photo of a text document taken at an angle against a contrasting background.

⚙️ Installation & Execution Guide

  1. Clone the repository:

    git clone https://github.com/imehranasgari/Auto-Document-Scanner-OpenCV.git
    cd Auto-Document-Scanner-OpenCV
  2. Install the required libraries:

    pip install opencv-python numpy matplotlib
  3. Run the Jupyter Notebook mini_scan_project.ipynb cell by cell to see the step-by-step image transformation.

🖼️ Sample Output

The notebook visualizes each key step of the pipeline:

Step Description Image
1. Original Image The input image of the document taken at an angle.
2. Morphological Closing Text is removed, leaving a solid blob of the paper.
3. Canny Edge Detection The clear outline of the document is detected.
4. Final Corners The 4-sided contour is found and its corners are identified.

🎓 Additional Learnings / Reflections

This project also includes an initial, experimental attempt at segmentation using cv2.grabCut (cells 8-9). While GrabCut is a powerful segmentation tool, it proved less effective for this specific task than the Canny edge detection pipeline. The edge-based approach was more robust for isolating a simple, high-contrast quadrilateral shape.

The notebook successfully completes the most critical part of a document scanner: finding and ordering the corners. The next step would be to pass these corners and the orig_img to cv2.warpPerspective to generate the final top-down image.

alt text alt text

🙏 Acknowledgments

This project represents my initial steps into the practical application of computer vision. The foundational knowledge and guidance for this work were derived from the outstanding OpenCV course taught by Alireza Akhavanpour on the Maktabkhooneh platform. His ability to deconstruct complex topics into clear, actionable steps was instrumental in the successful implementation of this project.

👤 Author

Mehran Asgari

 

 

📄 License

  This project is licensed under the Apache 2.0 License – see the LICENSE file for details.


💡 Some interactive outputs (e.g., plots, widgets) may not display correctly on GitHub. If so, please view this notebook via nbviewer.org for full rendering.


About

An OpenCV project that automatically detects a document in an image and finds its corners for perspective correction. This notebook demonstrates a practical computer vision pipeline using morphological operations, Canny edge detection, and contour analysis to prepare for a "bird's-eye view" transformation.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published