Skip to content

Conversation

@meiravgri
Copy link
Collaborator

@meiravgri meiravgri commented Oct 16, 2025

Problem description

Accessing top() on empty priority queues is considered UB.
Errors were detected by AddressSanitizer on MacOS in the following tests:
HNSWTest/0.resizeNAlignIndex and hnsw_delete_entry_point_Test.

Fixes in this PR

  1. removeExtraLinks: getNeighborsByHeuristic2 might return fewer candidates than input (orig_candidates). In this fix we first iterate the modified candidates returned by getNeighborsByHeuristic2, then handle the remaining orig_candidates elements rejected by heuristic.

  2. processCandidate: Validate that candidate_set is not empty when calling prefetch.

@meiravgri meiravgri changed the title fix invalid reads Fix HNSW AddressSanitizer errors Oct 16, 2025
@meiravgri meiravgri changed the title Fix HNSW AddressSanitizer errors Fix HNSW undefined behavior in container access Oct 16, 2025
@codecov
Copy link

codecov bot commented Oct 16, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 95.08%. Comparing base (166ddca) to head (0c3b4e5).
⚠️ Report is 1 commits behind head on backport-803-to-0.6.

Additional details and impacted files
@@                   Coverage Diff                   @@
##           backport-803-to-0.6     #806      +/-   ##
=======================================================
+ Coverage                95.07%   95.08%   +0.01%     
=======================================================
  Files                       60       60              
  Lines                     3451     3460       +9     
=======================================================
+ Hits                      3281     3290       +9     
  Misses                     170      170              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@meiravgri meiravgri changed the title Fix HNSW undefined behavior in container access [MOD-11881] Fix HNSW undefined behavior in container access Oct 16, 2025
@meiravgri meiravgri force-pushed the meiravg_fix_invaid_reads branch from 2d0233e to 0c3b4e5 Compare October 16, 2025 10:03
@meiravgri meiravgri changed the base branch from 0.6 to backport-803-to-0.6 October 16, 2025 10:04
@meiravgri meiravgri requested a review from alonre24 October 16, 2025 10:11
@meiravgri meiravgri merged commit 5fcf6ee into backport-803-to-0.6 Oct 16, 2025
33 checks passed
@meiravgri meiravgri deleted the meiravg_fix_invaid_reads branch October 16, 2025 12:11
github-merge-queue bot pushed a commit that referenced this pull request Oct 16, 2025
* [0.8] Replace Valgrind with address-sanitizer [MOD-10409] (#803)

* Replace Valgrind with address-sanitizer [MOD-10409] (#734)

* add asan using augment - wip (code with svs cannot compile with llvm clang currently)

* rename run-valgrind to asan

* fix buffer overflow in test

* fix bad testSizeEstimation in int8 and uint8 + cleanup

* fix cmake install for macos

* ran sanitizer as part of every flow + remove from non latest OSs + only run debian11 with gcc11 (alignment with search)

* revert 18 from clang format

* fix syntax errors

* * rename  dataSize -> storedDataSize

*refine distinction between storedDataSize and inputBlobSize: inputBlobSize not necessarly dim *sizeof(type)

* vecsimAbstract: move data members to private when possible and not too complicated

* remove effective_input_size check from VecSimIndexAbstract<DataType, DistType>::preprocessQuery: That's logically wrong.

The previous logic incorrectly used force_copy to determine blob size:
- force_copy only indicates whether to copy the blob for memory management
- It doesn't determine the actual size of the input blob

In tiered indexes, when the backend creates a batch iterator, it receives
the query blob from the frontend's batch iterator (already preprocessed).
This preprocessed blob has the processed size (e.g., with cosine norm
appended), not the original dim * sizeof(type) size.

The old logic accidentally worked when inputBlobSize == dim * sizeof(type)
because using dataSize (processed size) happened to match the preprocessed
blob size from the frontend. However, this was a coincidence that broke
when inputBlobSize was updated to reflect actual input blob sizes.

Now preprocessQuery correctly uses inputBlobSize, which represents the
actual size of the input blob being processed.

BruteForce Factory Changes:
• Templated  NewAbstractInitParams to be reusable across different algorithm parameter types

All factories changes:
 storedDataSize: Size after preprocessing (e.g., with cosine norm appended)
 inputBlobSize: Size of input blobs the index expects to receive (depends on whether vectors are pre-normalized)
• Added  inputBlobSize calculation based on normalization status: uses  storedDataSize if normalized, otherwise dim * sizeof(type)

*svs tests testSizeEstimation: using constexpr to eliminate compiler warnings about potential division by zero when  quantBits could be 0 (VecSimSvsQuant_NONE)

* fix merge

* another fix

* have leak

* run first san

* build with debug

* upload san srtifacts

* fix condition

* fix path

* clean up

* run wuth san macos

* real leak

* fix pah

* fix security issue

* remove san from event pr

run cov with san
run mac with san

* trigger again

* unit test flow name

* add space

* split sanitizer in intel machine

* fix cpu info

* fix alpine pre deps

* add to needs

* unify codecov and sanitizer again

* disable flow temp

add cov and sanitize to nightly

* revert unnecessary changes

* remove bin before building sanitizer in codecov to use clanmg

* install clang

* install clang 18

* instsll clang 18

* remove tools

* install llvm on cov

* fix deps

---------

Co-authored-by: meiravgri <[email protected]>
(cherry picked from commit 2fcc669)

* fix debian and nightly

* remove slack notify on nightly

* change cov machine

* revert cov changes - no sanitizer

* no pre deps

* remove

---------

Co-authored-by: alonre24 <[email protected]>
(cherry picked from commit 44df9db)

* [MOD-11881] Fix HNSW undefined behavior in container access (#806)

fix invalid reads

---------

Co-authored-by: alonre24 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants