Skip to content

Add optional rotate_pages and deskew parameters to OCR endpoint#132

Merged
gabriel-piles merged 4 commits intohuridocs:mainfrom
NataliaVillegasC:feature/ocr-rotate-deskew-options
Mar 2, 2026
Merged

Add optional rotate_pages and deskew parameters to OCR endpoint#132
gabriel-piles merged 4 commits intohuridocs:mainfrom
NataliaVillegasC:feature/ocr-rotate-deskew-options

Conversation

@NataliaVillegasC
Copy link
Copy Markdown
Contributor

@NataliaVillegasC NataliaVillegasC commented Feb 26, 2026

Problem

When processing scanned PDFs with misrotated or skewed pages, the OCR output was producing incorrect text (gibberish). This is a known limitation when pages are not properly oriented before OCR is applied.

Solution

While reviewing the OCRmyPDF documentation (https://github.com/ocrmypdf/OCRmyPDF), I found two flags that address this:

  • --rotate-pages: automatically detects and fixes misrotated pages
  • --deskew: straightens crooked/skewed scanned pages

Both flags have been supported by OCRmyPDF for a long time and are compatible with the version used in this project.

These parameters are optional and default to false to avoid adding latency for users who don't need them.

Changes

  • src/adapters/infrastructure/ocr_service_adapter.py: added rotate_pages and deskew optional parameters to process_pdf_ocr
  • src/ports/services/ocr_service.py: updated abstract method signature
  • src/use_cases/ocr/process_ocr_use_case.py: propagated parameters through the use case
  • src/drivers/rest/app.py: added rotate_pages and deskew form parameters to the /ocr endpoint
  • README.md: updated /ocr endpoint documentation with new parameters and usage examples

Testing

Tested locally with a scanned PDF with rotated pages — OCR output went from garbled text to correctly extracted content after enabling rotate_pages=true.

@gabriel-piles
Copy link
Copy Markdown
Member

Thank you for your contribution.

We will review the code and will likely merge it next week.

Best

@gabriel-piles
Copy link
Copy Markdown
Member

I'm going to push a few small tweaks to speed this up, hope you don't mind!

@gabriel-piles
Copy link
Copy Markdown
Member

We are merging it

@gabriel-piles gabriel-piles merged commit c7b91e1 into huridocs:main Mar 2, 2026
1 check passed
@jsurrea
Copy link
Copy Markdown

jsurrea commented Mar 2, 2026

Very cool @NataliaVillegasC !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants