Skip to content

OCR for scanned PDFs #10

@NT1906

Description

@NT1906

Add OCR-based text extraction using pdf2image + pytesseract. Detect scanned pages and extract text.

  • Implement scanned page detection

  • Integrate OCR extraction

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requesthelp wantedExtra attention is needed

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions