Skip to content

Handle low confidence spacing and spelling #249

@dairefagan

Description

@dairefagan

Great project, thanks for your work.

I have already seen some improvement reducing the window size to 1000, and in the chunk files generated by CORRECT_TEXT most of the confidence levels are 1.0 but some still have confidence levels between 0.90 - 0.99, where words are missing spaces between them or have misspellings.

What would be the best way to automatically handle this?

After correct text runs on a chunk, if the confidence level is less than 1.0 should I make it reiterate reducing the window size 1000 - 800 - 600 - 400 - 200, or is there a better way?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions