Skip to content

Conversation

@kaivalnp
Copy link
Contributor

Description

While merging vectors across multiple segments, we can re-use information from earlier HNSW graphs -- either an entire graph as a starting point, or previous connections as seeds for insertion!

These nice optimizations are used only if there are no deleted documents (#15003 is a nice PR that allows re-using some information even with deletes, which is how I got to looking at this class).

However, should the gating flag be "no deleted vectors" instead of "no deleted documents"?

This has two benefits:

  • Optimizations kick in more frequently (when a segment has deleted documents, but none of those had vectors)
  • Checking whether the optimizations can be applied is sped up too (only need to check for "liveness" of documents with vectors, instead of "liveness" of all documents)

@github-actions github-actions bot added this to the 10.4.0 milestone Nov 14, 2025
@benwtrent
Copy link
Member

I am not sure this is actually required with: #15003

let's move forward with that work instead?

@kaivalnp
Copy link
Contributor Author

@benwtrent makes sense, this PR makes the current boolean optimizations (gated by whether there are deleted vectors) kick in more appropriately -- but that PR makes it continuous by allowing to re-use information from segments with >= DELETE_PCT_THRESHOLD live vectors -- so the changes can probably be rolled into that one.

Not sure when that PR will be merged, or if the threshold is going to stay -- but if it's near completion, I can close this PR..

@benwtrent
Copy link
Member

@kaivalnp I think that PR is near completion. I would expect it to land for 10.4. All benchmarking and validation work is pretty much complete, just some edges to smooth out.

@kaivalnp
Copy link
Contributor Author

Thanks @benwtrent @Pulkitg64

@kaivalnp kaivalnp closed this Nov 14, 2025
@kaivalnp kaivalnp deleted the deleted-vectors branch November 14, 2025 19:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants