Recognition Confidence Scores

Hey team,

When computing word-level confidence in recognition, the current implementation appears to take the arithmetic mean of per-character confidences. This can hide single-character mistakes: one very low-confidence (and incorrect) character can be washed out by many high-confidence characters, yielding an over-optimistic word score.
I propose making the aggregation strategy configurable, with options such as:
<ul>
<li>
<code>mean</code> (current behavior, for backward compatibility)
</li>
<li>
<code>min</code> (pessimistic; flags single-char issues)
</li>
<li>
<code>geomean</code> (product^(1/N); reflects sequence probability)
</li>
<li>
<code>avg_log_prob</code> (exp(mean(log p_i)); numerically stable version of geomean)
</li>
<li>
(optional) <code>harmonic</code> or <code>trimmed_mean</code> (robust alternatives)
</li>
</ul>
Downstream systems often gate automation vs. human review using word-level confidence. For IDs, names, dates, etc., a single wrong character can invalidate the field. An aggregator that better penalizes single-char errors yields more reliable thresholds and fewer false passes.
<h3>Example</h3>
Per-character confidences for a 4-char word: <code>[0.99, 0.98, 0.12, 0.99]</code>

Aggregator | Result
-- | --
Arithmetic mean | 0.770
Geometric mean | 0.583
Avg log-prob | 0.583
Harmonic mean | 0.352
Min | 0.120


The arithmetic mean remains high (0.77) despite one very low-confidence character.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Recognition Confidence Scores #95

Example

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Aggregator	Result
Arithmetic mean	0.770
Geometric mean	0.583
Avg log-prob	0.583
Harmonic mean	0.352
Min	0.120

Uh oh!

Recognition Confidence Scores #95

Description

Example

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions