fix(engine): replace full-vocab percentile with top-N rank scoring#74
Merged
FlorentPoinsaut merged 8 commits intomainfrom Apr 28, 2026
Merged
fix(engine): replace full-vocab percentile with top-N rank scoring#74FlorentPoinsaut merged 8 commits intomainfrom
FlorentPoinsaut merged 8 commits intomainfrom
Conversation
Fixes #67 The previous formula mapped rank across the entire ~150 000-word vocabulary, compressing ranks 1-1500 into 99% and destroying the score gradient that makes the game engaging. Replace with a top-N neighbourhood score: rank <= top_n → (top_n - rank) / top_n (rank 1 → 0.999, rank top_n → 0) rank > top_n → 0.0 This restores a continuous, visible gradient from 0% (outside the neighbourhood) to 99% (closest non-exact word), with 100% reserved for exact matches only. Changes: - game/engine.py: new formula + top_n constructor param + ValueError guard - config.py: add SCORING_TOP_N env var (default 1000) - overlay/static/index.html: recalibrate gauge gradient to match thresholds (blue 0%, green 60%, gold 90%, red 100%) - tests/test_engine.py: inject _top_n in helper, add beyond-top-N test
The linear formula (top_n - rank) / top_n with top_n=1000 assigned 0%
to any word ranked beyond the 1000th nearest neighbour in frWac. Since
the vocabulary contains ~150 000 entries, even loosely related words
easily exceed rank 1000, causing every manual guess to display 0%.
Replace with a logarithmic formula over a larger top-N window (100 000
by default):
score = 1 - log(rank + 1) / log(top_n + 1)
This gives a visible, continuous gradient with no compression:
rank 1 → 94 % (very close synonyms)
rank 10 → 79 %
rank 100 → 61 %
rank 1 000 → 42 %
rank 10 000 → 22 %
rank 100 000 → 0 % (hard cutoff)
Both config defaults (SCORING_TOP_N) are updated from 1 000 to 100 000.
Tests updated to reflect the new formula and _top_n=4 in the mock helper.
Member
Author
Correctif : scores 0% sur les tests manuelsDiagnosticLa formule linéaire CorrectionRemplacement par une formule logarithmique sur une fenêtre de 100 000 voisins : score = 1.0 - math.log(rank + 1) / math.log(self._top_n + 1)Distribution des scores (top_n = 100 000)
Cette distribution résout simultanément :
20/20 tests passent. |
…wer agents NLP/Data: - Lower SCORING_TOP_N default from 100 000 to 10 000 so that gold (≥90%) is reachable for a true synonym and the blue zone stays informative - Fix score_guess docstring: remove incorrect '0.99 = top 1%' claim; describe the logarithmic scale accurately Reviewer: - Import MODEL_PATH and SCORING_TOP_N from config instead of calling os.getenv() directly (violates project convention) - Fix stale class docstring: 'or 1000 when unset' → 'or 10 000 when unset' - Wrap __init__ signature to comply with PEP 8 E501 (88 chars max) - Avoid redundant clean_word() calls in score_guess (computed once) Tester: - Rename test_score_guess_raises_when_not_loaded → test_similarity_raises_when_not_loaded (it tested similarity()) - Add test_score_guess_raises_when_not_loaded (line 109 was uncovered) - Add test_invalid_top_n_raises_value_error (top_n=0, line 36 uncovered) - Add test_negative_top_n_raises_value_error (top_n=-1) - Add test_similarity_unknown_word_returns_none (line 85 uncovered) - Add test_score_at_exactly_top_n_returns_zero (boundary condition) Coverage: game/engine.py 91% → 97% — 25/25 tests pass
…rror
Importing config at module level triggered _require('TWITCH_CHANNEL')
during pytest collection, causing an ERROR in CI environments without
a .env file.
Moving 'import config' inside __init__ defers execution until actual
instantiation. This also removes the duplicated os.getenv() defaults
(_DEFAULT_MODEL_PATH, _DEFAULT_TOP_N): config.py remains the single
source of truth for both values.
Executing _require('TWITCH_CHANNEL') at module scope caused pytest to
crash during collection/instantiation in CI environments without a .env.
Introduce config.validate() which must be called once at application
startup (main.py). TWITCH_CHANNEL defaults to '' at import time; the
guard fires at startup as before, keeping production fail-fast behaviour.
game/engine.py can now import config at module level cleanly, with
config.py as the single source of truth for SCORING_TOP_N and
MODEL_PATH (no duplication).
Every valid guess now scores strictly > 0, with no configurable cutoff.
Formula: score = 1 - log(rank+1) / log(vocab_size+1)
where vocab_size = len(model.key_to_index) set at load() time.
Because rank <= vocab_size - 1 < vocab_size for any in-vocab word,
the score is always positive. No hard cutoff, no SCORING_TOP_N config.
Score distribution (frWac ~150 000 words):
rank 1 → 94 %
rank 10 → 80 %
rank 100 → 61 %
rank 1 000 → 42 %
rank 10 000 → 23 %
rank 149 999 → 0.003 %
Remove: top_n param, _top_n/_top_n_override attrs, SCORING_TOP_N config,
_DEFAULT_TOP_N module constant, and related tests.
Add: _vocab_size attr set in load(), test_all_vocab_words_score_above_zero.
… match
Cache _max_score = 1 - log(2)/log(V+1) at load() time and rescale:
score = score_raw * 0.99 / _max_score
This maps rank 1 exactly to 0.99 (99%) while preserving the logarithmic
gradient. 1.0 (100%) remains exclusive to exact matches.
Score distribution (frWac ~150 000 words):
rank 1 → 99 % (closest neighbour)
rank 10 → 85 %
rank 100 → 65 %
rank 1 000 → 44 %
rank 10 000 → 24 %
rank 149 999 → 0.003 % (always > 0)
Update tests: replace absolute 0.5 thresholds with relative comparisons
(unrelated < close), add _max_score=None to _make_engine() helper.
Replace rescaled log formula with formula E (offset k=9) recommended
by NLP/Data agent after analysis of cemantix.certitudes.org approach:
score = 0.99 * log((V+9) / (rank+9)) / log((V+9) / 10)
Mathematical guarantees (V = 150 000, frWac):
- rank 1 → exactly 99% (100% reserved for exact match)
- step rank 1→2 = 0.98 pp ≤ 1 pp → no integer % gaps (1–99 all reachable)
- score > 0 for every in-vocabulary word
- strictly monotone decreasing
Score distribution:
rank 1 → 99 %
rank 2 → 98 %
rank 3 → 97 %
rank 10 → 92 %
rank 100 → 74 %
rank 1 000 → 51 %
rank 10 000 → 27 %
rank 149 999 → 0.0001 %
Remove _max_score attr (no longer needed). Update test docstring.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes #67 — percentile compression where ranks 1–1500 all display as 99%.
Root cause
The old formula
(effective_vocab - rank) / effective_vocabdivided by ~150 000, making each rank step worth only 0.00067%, so all semantically meaningful guesses were rounded to 99% byMath.floor(score * 100).Fix
Replace with a top-N neighbourhood score:
This restores a continuous, visible gradient across the full 0–99% range, with 100% reserved exclusively for exact matches.
Changes
game/engine.pytop_nconstructor param (default from env);ValueErrorguard; updated docstringsconfig.pySCORING_TOP_Nenv var (default1000)overlay/static/index.htmltests/test_engine.py_top_nin test helper; addtest_word_beyond_top_n_returns_zeroScore mapping (TOP_N = 1000)
Tests
20 tests pass, including the new
test_word_beyond_top_n_returns_zero.