Fix RRF tiebreak precision bug and assorted small cleanups#63
Merged
Conversation
- Replace f32::EPSILON tolerance with exact equality comparison in RRF sorting, fixing ordering for scores that differ by less than EPSILON - Fix zero_token_skipped to use zero_token_row_ids.len() for consistency - Add comment explaining why L2 normalization is omitted after mean pooling
… rank - Emit warning when embedding input is truncated to 8192 tokens - Gate test-only PackageIndex accessors with #[cfg(test)] - Remove stale #[allow(dead_code)] from insert_package_name - Pass real rank to make_single_method_fused_hit instead of hardcoded 1
- Replace unreachable early-return with debug_assert! in find_shallowest_split_scope (split_search.rs) - Use HashSet for O(n) deduplication in split_identifier (tokenizer.rs)
The early-return in find_shallowest_split_scope handles a real edge case where the node doesn't span the requested range (e.g. node 2-551 for range 1-550). The task description was incorrect.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Addresses a batch of small bugs and inconsistencies surfaced by a code review. The most substantive fix is in the RRF fusion ranking: an
f32::EPSILONtolerance in the score comparison was coarse enough to swallow real score gaps between candidates within the fusion window, silently reversing the intended order. Replaced with exact-equality fall-through; the lower-priority tiebreaks (best contributing rank, best-rank method, row_id) are unchanged.Other fixes:
Vecnow uses a parallelHashSetfor dedup while preserving insertion order.rank: 1for every entry; now threads the real 1-indexed rank through from the call site.debug_assert!of the invariant the callers already maintain.#[allow(dead_code)]are now properly#[cfg(test)]-gated; one annotation on a function that's actually used in production is removed.