Skip to content

SPIKE: Perform OCR/Text Recognition on All Document Uploads #9744

@cholly75

Description

@cholly75

As a Court, so that I can quickly search and scan within a document, I need all uploaded PDF files to be OCR'd and/or scanned for text.

We can currently scrape text-ready PDFs using PDF.js

We do not currently have the ability to OCR PDF files that only consist of a flattened/image layer.

Pre-Conditions

Acceptance Criteria

  • Non-order/opinion document text should not be incorporated into search indices

Notes

Tasks

Test Cases

Definition of Done (Updated 2026-01-28)

Product Owner

  • Acceptance criteria have been met and validated on the Court's test environment.
  • Associated test cases defined in TestRail have been updated if necessary.
  • Successful test run is performed in TestRail.
  • User guides are updated if necessary.

UX

  • All new functionality has been verified to work with keyboard navigation and screen reader software.
  • UI should be touch optimized and responsive for external users.

Engineering

  • Automated test scripts have been written, including visual tests for newly added PDFs.
  • Successful test run is performed in TestRail.
  • New screens have been added to cypress accessibility axe.
  • Interactors should validate entities before calling persistence methods.
  • Types have been added to all added and updated functions.
  • Code refactored for clarity and to remove any known technical debt.
  • Acceptance criteria for the story has been met.
  • If there are special deployment instructions, they have been added to the CHANGES.md file and the PR description.
  • Code that resides in the shared folder that only runs on the API or browser has been moved to either /web-client or /web-api.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions