fix(engine): replace full-vocab percentile with top-N rank scoring by FlorentPoinsaut · Pull Request #74 · solution-libre/streamantix

FlorentPoinsaut · 2026-04-26T06:14:08Z

Summary

Fixes #67 — percentile compression where ranks 1–1500 all display as 99%.

Root cause

The old formula (effective_vocab - rank) / effective_vocab divided by ~150 000, making each rank step worth only 0.00067%, so all semantically meaningful guesses were rounded to 99% by Math.floor(score * 100).

Fix

Replace with a top-N neighbourhood score:

# rank within the top-N window
if rank > self._top_n:
    return 0.0
return (self._top_n - rank) / self._top_n
# rank 1 → 0.999 (99%), rank 1000 → 0.0 (0%), exact match → 1.0 (100%)

This restores a continuous, visible gradient across the full 0–99% range, with 100% reserved exclusively for exact matches.

Changes

File	Change
`game/engine.py`	New scoring formula; `top_n` constructor param (default from env); `ValueError` guard; updated docstrings
`config.py`	Add `SCORING_TOP_N` env var (default `1000`)
`overlay/static/index.html`	Recalibrate gauge gradient: blue→green at 60%, green→gold at 90%
`tests/test_engine.py`	Inject `_top_n` in test helper; add `test_word_beyond_top_n_returns_zero`

Score mapping (TOP_N = 1000)

Rank	Score	Display	Colour
exact match	1.0	100%	— (game ends)
1	0.999	99%	🟡 gold
100	0.9	90%	🟡 gold
101	0.899	89%	🟢 green
400	0.6	60%	🟢 green
401	0.599	59%	🔵 blue
1000	0.001	0%	🔵 blue
> 1000	0.0	0%	🔵 blue

Tests

20 tests pass, including the new test_word_beyond_top_n_returns_zero.

Fixes #67 The previous formula mapped rank across the entire ~150 000-word vocabulary, compressing ranks 1-1500 into 99% and destroying the score gradient that makes the game engaging. Replace with a top-N neighbourhood score: rank <= top_n → (top_n - rank) / top_n (rank 1 → 0.999, rank top_n → 0) rank > top_n → 0.0 This restores a continuous, visible gradient from 0% (outside the neighbourhood) to 99% (closest non-exact word), with 100% reserved for exact matches only. Changes: - game/engine.py: new formula + top_n constructor param + ValueError guard - config.py: add SCORING_TOP_N env var (default 1000) - overlay/static/index.html: recalibrate gauge gradient to match thresholds (blue 0%, green 60%, gold 90%, red 100%) - tests/test_engine.py: inject _top_n in helper, add beyond-top-N test

The linear formula (top_n - rank) / top_n with top_n=1000 assigned 0% to any word ranked beyond the 1000th nearest neighbour in frWac. Since the vocabulary contains ~150 000 entries, even loosely related words easily exceed rank 1000, causing every manual guess to display 0%. Replace with a logarithmic formula over a larger top-N window (100 000 by default): score = 1 - log(rank + 1) / log(top_n + 1) This gives a visible, continuous gradient with no compression: rank 1 → 94 % (very close synonyms) rank 10 → 79 % rank 100 → 61 % rank 1 000 → 42 % rank 10 000 → 22 % rank 100 000 → 0 % (hard cutoff) Both config defaults (SCORING_TOP_N) are updated from 1 000 to 100 000. Tests updated to reflect the new formula and _top_n=4 in the mock helper.

FlorentPoinsaut · 2026-04-28T08:48:20Z

Correctif : scores 0% sur les tests manuels

Diagnostic

La formule linéaire (top_n - rank) / top_n avec top_n = 1000 était trop restrictive pour le modèle frWac (≈ 150 000 mots). Tout mot ayant un rang > 1 000 recevait automatiquement 0%, y compris des mots sémantiquement liés qui dépassent fréquemment ce seuil dans un vocabulaire aussi large.

Correction

Remplacement par une formule logarithmique sur une fenêtre de 100 000 voisins :

score = 1.0 - math.log(rank + 1) / math.log(self._top_n + 1)

Distribution des scores (top_n = 100 000)

Rang	Score	Affichage	Couleur
match exact	1.0	100 %	🏆 victoire
1	0.942	94 %	🟡 or
10	0.799	79 %	🟢 vert
100	0.613	61 %	🟢 vert
1 000	0.421	42 %	🔵 bleu
10 000	0.227	22 %	🔵 bleu
100 000	0.0	0 %	🔵 bleu
> 100 000	0.0	0 %	🔵 bleu

Cette distribution résout simultanément :

✅ scores 0% sur tous les mots : les mots sémantiquement liés obtiennent maintenant un score non nul
✅ compression au sommet (l'issue [P0] Percentile compression: ranks 1–1500 all display as 99% #67) : l'échelle logarithmique écarte les rangs 1–1 000 sur une plage 42 %–94 %

20/20 tests passent.

…wer agents NLP/Data: - Lower SCORING_TOP_N default from 100 000 to 10 000 so that gold (≥90%) is reachable for a true synonym and the blue zone stays informative - Fix score_guess docstring: remove incorrect '0.99 = top 1%' claim; describe the logarithmic scale accurately Reviewer: - Import MODEL_PATH and SCORING_TOP_N from config instead of calling os.getenv() directly (violates project convention) - Fix stale class docstring: 'or 1000 when unset' → 'or 10 000 when unset' - Wrap __init__ signature to comply with PEP 8 E501 (88 chars max) - Avoid redundant clean_word() calls in score_guess (computed once) Tester: - Rename test_score_guess_raises_when_not_loaded → test_similarity_raises_when_not_loaded (it tested similarity()) - Add test_score_guess_raises_when_not_loaded (line 109 was uncovered) - Add test_invalid_top_n_raises_value_error (top_n=0, line 36 uncovered) - Add test_negative_top_n_raises_value_error (top_n=-1) - Add test_similarity_unknown_word_returns_none (line 85 uncovered) - Add test_score_at_exactly_top_n_returns_zero (boundary condition) Coverage: game/engine.py 91% → 97% — 25/25 tests pass

…rror Importing config at module level triggered _require('TWITCH_CHANNEL') during pytest collection, causing an ERROR in CI environments without a .env file. Moving 'import config' inside __init__ defers execution until actual instantiation. This also removes the duplicated os.getenv() defaults (_DEFAULT_MODEL_PATH, _DEFAULT_TOP_N): config.py remains the single source of truth for both values.

Executing _require('TWITCH_CHANNEL') at module scope caused pytest to crash during collection/instantiation in CI environments without a .env. Introduce config.validate() which must be called once at application startup (main.py). TWITCH_CHANNEL defaults to '' at import time; the guard fires at startup as before, keeping production fail-fast behaviour. game/engine.py can now import config at module level cleanly, with config.py as the single source of truth for SCORING_TOP_N and MODEL_PATH (no duplication).

Every valid guess now scores strictly > 0, with no configurable cutoff. Formula: score = 1 - log(rank+1) / log(vocab_size+1) where vocab_size = len(model.key_to_index) set at load() time. Because rank <= vocab_size - 1 < vocab_size for any in-vocab word, the score is always positive. No hard cutoff, no SCORING_TOP_N config. Score distribution (frWac ~150 000 words): rank 1 → 94 % rank 10 → 80 % rank 100 → 61 % rank 1 000 → 42 % rank 10 000 → 23 % rank 149 999 → 0.003 % Remove: top_n param, _top_n/_top_n_override attrs, SCORING_TOP_N config, _DEFAULT_TOP_N module constant, and related tests. Add: _vocab_size attr set in load(), test_all_vocab_words_score_above_zero.

… match Cache _max_score = 1 - log(2)/log(V+1) at load() time and rescale: score = score_raw * 0.99 / _max_score This maps rank 1 exactly to 0.99 (99%) while preserving the logarithmic gradient. 1.0 (100%) remains exclusive to exact matches. Score distribution (frWac ~150 000 words): rank 1 → 99 % (closest neighbour) rank 10 → 85 % rank 100 → 65 % rank 1 000 → 44 % rank 10 000 → 24 % rank 149 999 → 0.003 % (always > 0) Update tests: replace absolute 0.5 thresholds with relative comparisons (unrelated < close), add _max_score=None to _make_engine() helper.

Replace rescaled log formula with formula E (offset k=9) recommended by NLP/Data agent after analysis of cemantix.certitudes.org approach: score = 0.99 * log((V+9) / (rank+9)) / log((V+9) / 10) Mathematical guarantees (V = 150 000, frWac): - rank 1 → exactly 99% (100% reserved for exact match) - step rank 1→2 = 0.98 pp ≤ 1 pp → no integer % gaps (1–99 all reachable) - score > 0 for every in-vocabulary word - strictly monotone decreasing Score distribution: rank 1 → 99 % rank 2 → 98 % rank 3 → 97 % rank 10 → 92 % rank 100 → 74 % rank 1 000 → 51 % rank 10 000 → 27 % rank 149 999 → 0.0001 % Remove _max_score attr (no longer needed). Update test docstring.

FlorentPoinsaut and others added 2 commits April 26, 2026 06:13

FlorentPoinsaut added 6 commits April 28, 2026 09:04

FlorentPoinsaut merged commit d61393b into main Apr 28, 2026
2 checks passed

FlorentPoinsaut deleted the fix/67-top-n-scoring branch April 28, 2026 12:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(engine): replace full-vocab percentile with top-N rank scoring#74

fix(engine): replace full-vocab percentile with top-N rank scoring#74
FlorentPoinsaut merged 8 commits intomainfrom
fix/67-top-n-scoring

FlorentPoinsaut commented Apr 26, 2026

Uh oh!

FlorentPoinsaut commented Apr 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

FlorentPoinsaut commented Apr 26, 2026

Summary

Root cause

Fix

Changes

Score mapping (TOP_N = 1000)

Tests

Uh oh!

FlorentPoinsaut commented Apr 28, 2026

Correctif : scores 0% sur les tests manuels

Diagnostic

Correction

Distribution des scores (top_n = 100 000)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant