Skip to content

Adds OCR evaluation script / Fixes WER calculation#39

Merged
p-j-smith merged 7 commits intomainfrom
tomr/ocr-analysis
Oct 6, 2025
Merged

Adds OCR evaluation script / Fixes WER calculation#39
p-j-smith merged 7 commits intomainfrom
tomr/ocr-analysis

Conversation

@tomaroberts
Copy link
Copy Markdown
Collaborator

This PR:

  • adds OCR evaluation script, run on 20 records
  • fixes WER metric code

@tomaroberts
Copy link
Copy Markdown
Collaborator Author

TODO: this PR branched off old branch prior to analysis folder being moved into src/pyonb/analysis – hence conflicts.

@tomaroberts
Copy link
Copy Markdown
Collaborator Author

Re-run analysis today – results more sensible now.

@p-j-smith p-j-smith marked this pull request as ready for review October 6, 2025 14:45
@p-j-smith p-j-smith requested a review from Copilot October 6, 2025 14:45
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds an OCR evaluation script and fixes the WER (Word Error Rate) calculation method. The changes improve the accuracy of text comparison metrics and make the evaluation functionality more modular.

  • Replaces Levenshtein string-based WER calculation with a proper word-based dynamic programming implementation
  • Refactors OCR evaluation script to be more modular by extracting a run() function
  • Adds disk space management to GitHub Actions workflow to handle build requirements

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
src/pyonb/analysis/metrics.py Fixes WER calculation with proper dynamic programming and updates metric documentation
src/pyonb/analysis/eval_ocr.py Refactors evaluation script to be more modular and adds file encoding parameter
.github/workflows/tests.yml Adds disk space cleanup step to prevent build failures

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Collaborator

@p-j-smith p-j-smith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for fixing this @tomaroberts!

@p-j-smith p-j-smith merged commit cf4a848 into main Oct 6, 2025
3 checks passed
@tomaroberts tomaroberts deleted the tomr/ocr-analysis branch October 14, 2025 08:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants