Skip to content

Recognition Confidence Scores #95

@sneakybatman

Description

@sneakybatman

Hey team,

When computing word-level confidence in recognition, the current implementation appears to take the arithmetic mean of per-character confidences. This can hide single-character mistakes: one very low-confidence (and incorrect) character can be washed out by many high-confidence characters, yielding an over-optimistic word score.

I propose making the aggregation strategy configurable, with options such as:

  • mean (current behavior, for backward compatibility)

  • min (pessimistic; flags single-char issues)

  • geomean (product^(1/N); reflects sequence probability)

  • avg_log_prob (exp(mean(log p_i)); numerically stable version of geomean)

  • (optional) harmonic or trimmed_mean (robust alternatives)

Downstream systems often gate automation vs. human review using word-level confidence. For IDs, names, dates, etc., a single wrong character can invalidate the field. An aggregator that better penalizes single-char errors yields more reliable thresholds and fewer false passes.

Example

Per-character confidences for a 4-char word: [0.99, 0.98, 0.12, 0.99]

Aggregator Result
Arithmetic mean 0.770
Geometric mean 0.583
Avg log-prob 0.583
Harmonic mean 0.352
Min 0.120

The arithmetic mean remains high (0.77) despite one very low-confidence character.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions