📄 Automatic Perspective Correction for Document Scanning

🎯 Problem Statement and Goal of Project

When capturing images of documents with a phone or camera, they are often skewed due to the camera angle. This project implements a computer vision pipeline to automatically find the corners of a document in an image. The primary goal is to identify the precise four-point contour of the paper, which is the necessary first step before applying a perspective transform (or "bird's-eye view") to create a flat, top-down scanned image.

💡 Solution Approach

This project demonstrates a classic and effective pipeline for quadrilateral detection using OpenCV.

Image Preprocessing:
- The image is first loaded and resized to a maximum dimension of 1080px to ensure consistent processing speed while maintaining its aspect ratio.
- A Morphological Closing operation is applied. This step is key to removing small details like text and noise, effectively creating a solid white blob of the paper against its background.
Edge Detection:
- The preprocessed image is converted to grayscale and a Gaussian blur is applied to further reduce noise.
- Canny Edge Detection is used to identify the sharp outlines of the document blob.
- A Dilation is performed to connect any small breaks in the detected edge lines, ensuring a single, solid contour.
Contour & Corner Detection:
- cv2.findContours is used to find all closed shapes in the Canny edge image.
- The contours are sorted by area, and only the top 5 largest are considered (as the document is assumed to be the main object).
- The script iterates through these top contours and uses cv2.approxPolyDP to find the simplest approximation of the shape.
- The loop breaks upon finding the first contour that is a quadrilateral (4-sided polygon), which is our document.
Corner Sorting:
- A helper function, order_points, is used to sort the four detected corners into a consistent (top-left, top-right, bottom-right, bottom-left) order. This prepares the coordinates for the final perspective transformation step (which would be cv2.warpPerspective).

🛠️ Technologies & Libraries

OpenCV (cv2): Used for all core computer vision tasks (image loading, resizing, morphological operations, Canny edge detection, and contour finding).
NumPy: For numerical operations and array manipulation.
Matplotlib: Used within the Jupyter Notebook to visualize the output of each processing step.

💾 Description about Dataset

The project uses a single sample image, images/scan.jpg, which is a clear photo of a text document taken at an angle against a contrasting background.

⚙️ Installation & Execution Guide

Clone the repository:

git clone https://github.com/imehranasgari/Auto-Document-Scanner-OpenCV.git
cd Auto-Document-Scanner-OpenCV

Install the required libraries:

pip install opencv-python numpy matplotlib

Run the Jupyter Notebook mini_scan_project.ipynb cell by cell to see the step-by-step image transformation.

🖼️ Sample Output

The notebook visualizes each key step of the pipeline:

Step	Description	Image
1. Original Image	The input image of the document taken at an angle.
2. Morphological Closing	Text is removed, leaving a solid blob of the paper.
3. Canny Edge Detection	The clear outline of the document is detected.
4. Final Corners	The 4-sided contour is found and its corners are identified.

🎓 Additional Learnings / Reflections

This project also includes an initial, experimental attempt at segmentation using cv2.grabCut (cells 8-9). While GrabCut is a powerful segmentation tool, it proved less effective for this specific task than the Canny edge detection pipeline. The edge-based approach was more robust for isolating a simple, high-contrast quadrilateral shape.

The notebook successfully completes the most critical part of a document scanner: finding and ordering the corners. The next step would be to pass these corners and the orig_img to cv2.warpPerspective to generate the final top-down image.

🙏 Acknowledgments

This project represents my initial steps into the practical application of computer vision. The foundational knowledge and guidance for this work were derived from the outstanding OpenCV course taught by Alireza Akhavanpour on the Maktabkhooneh platform. His ability to deconstruct complex topics into clear, actionable steps was instrumental in the successful implementation of this project.

👤 Author

Mehran Asgari

Email: imehranasgari@gmail.com.

GitHub: https://github.com/imehranasgari.

📄 License

This project is licensed under the Apache 2.0 License – see the LICENSE file for details.

💡 Some interactive outputs (e.g., plots, widgets) may not display correctly on GitHub. If so, please view this notebook via nbviewer.org for full rendering.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
screens		screens
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
mini_scan_project.ipynb		mini_scan_project.ipynb
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

📄 Automatic Perspective Correction for Document Scanning

🎯 Problem Statement and Goal of Project

💡 Solution Approach

🛠️ Technologies & Libraries

💾 Description about Dataset

⚙️ Installation & Execution Guide

🖼️ Sample Output

🎓 Additional Learnings / Reflections

🙏 Acknowledgments

👤 Author

Mehran Asgari

Email: imehranasgari@gmail.com.

GitHub: https://github.com/imehranasgari.

📄 License

About

Uh oh!

Releases

Packages

Languages

License

imehranasgari/Auto-Document-Scanner-OpenCV

Folders and files

Latest commit

History

Repository files navigation

📄 Automatic Perspective Correction for Document Scanning

🎯 Problem Statement and Goal of Project

💡 Solution Approach

🛠️ Technologies & Libraries

💾 Description about Dataset

⚙️ Installation & Execution Guide

🖼️ Sample Output

🎓 Additional Learnings / Reflections

🙏 Acknowledgments

👤 Author

Mehran Asgari

Email: imehranasgari@gmail.com.

GitHub: https://github.com/imehranasgari.

📄 License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages