Skip to content

Add OCR tutorial for DH2022 #124

@mcollardanuy

Description

@mcollardanuy

Prepare tutorial on using DeezyMatch for OCR: https://dh2022.adho.org/workshops-and-tutorials/wt-13

We will show how a DeezyMatch model can be created from token-level alignments of OCRed text and their manual corrections. We will use the aligned tokens generated in [6] using a corpus of OCRed newspaper texts (from the National Library of Australia Trove digitized newspaper collection) that are aligned with human corrections performed by volunteers [5]. We will show how to train a DeezyMatch model that learns OCR transformations from newspaper data, and will show how it can be used to find a match for a given OCRed query from a pool of potential candidates from a specific knowledge base.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions