-
-
Notifications
You must be signed in to change notification settings - Fork 18
Description
Hey team,
When computing word-level confidence in recognition, the current implementation appears to take the arithmetic mean of per-character confidences. This can hide single-character mistakes: one very low-confidence (and incorrect) character can be washed out by many high-confidence characters, yielding an over-optimistic word score.
I propose making the aggregation strategy configurable, with options such as:
-
mean(current behavior, for backward compatibility) -
min(pessimistic; flags single-char issues) -
geomean(product^(1/N); reflects sequence probability) -
avg_log_prob(exp(mean(log p_i)); numerically stable version of geomean) -
(optional)
harmonicortrimmed_mean(robust alternatives)
Downstream systems often gate automation vs. human review using word-level confidence. For IDs, names, dates, etc., a single wrong character can invalidate the field. An aggregator that better penalizes single-char errors yields more reliable thresholds and fewer false passes.
Example
Per-character confidences for a 4-char word: [0.99, 0.98, 0.12, 0.99]
| Aggregator | Result |
|---|---|
| Arithmetic mean | 0.770 |
| Geometric mean | 0.583 |
| Avg log-prob | 0.583 |
| Harmonic mean | 0.352 |
| Min | 0.120 |
The arithmetic mean remains high (0.77) despite one very low-confidence character.