Skip to content

Comments

improvement: performance of update vector #1289

Merged
ShawnShawnYou merged 11 commits intomainfrom
improve-performance-of-update-vector
Oct 30, 2025
Merged

improvement: performance of update vector #1289
ShawnShawnYou merged 11 commits intomainfrom
improve-performance-of-update-vector

Conversation

@ShawnShawnYou
Copy link
Collaborator

@ShawnShawnYou ShawnShawnYou commented Oct 28, 2025

close #1288

Summary by Sourcery

Enhancements:

  • Replace KNN-based neighbor search with direct neighbor list retrieval in update_vector for both HNSW and HGraph to reduce data copy and search overhead

Signed-off-by: zhongxiaoyao.zxy <zhongxiaoyao.zxy@antgroup.com>
@sourcery-ai
Copy link

sourcery-ai bot commented Oct 28, 2025

Reviewer's guide (collapsed on small PRs)

Reviewer's Guide

Replaces expensive dataset copy and knn search in update_vector for both HNSW and HGraph with direct neighbor retrieval APIs, reducing overhead and boosting performance.

Sequence diagram for optimized neighbor retrieval in update_vector

sequenceDiagram
participant HNSW
participant HNSWLib
HNSW->>HNSWLib: get_linklist0(id)
HNSW->>HNSWLib: getListCount(linklist)
loop For each neighbor
    HNSW->>HNSWLib: getDistanceByLabel(neighbor_id, new_base_vec)
end
Loading

Sequence diagram for optimized neighbor retrieval in HGraph::UpdateVector

sequenceDiagram
participant HGraph
participant BottomGraph
HGraph->>BottomGraph: GetNeighbors(inner_id, neighbors)
loop For each neighbor
    HGraph->>HGraph: CalcDistanceById(new_base_vec, neighbor_id)
end
Loading

Class diagram for updated neighbor retrieval in HNSW and HGraph

classDiagram
class HNSW {
  +update_vector(id, new_base, force_update)
}
class HNSWLib {
  +get_linklist0(id)
  +getListCount(linklist)
  +getDistanceByLabel(id, vec)
}
HNSW --> HNSWLib

class HGraph {
  +UpdateVector(id, new_base, force_update)
  -bottom_graph_: BottomGraph
}
class BottomGraph {
  +GetNeighbors(inner_id, neighbors)
}
HGraph --> BottomGraph
Loading

File-Level Changes

Change Details Files
Optimize HNSW update_vector by eliminating dataset copy and knn search in favor of direct neighbor link retrieval
  • Removed base_data allocation and dataset construction
  • Removed knn_search and neighbor result handling
  • Used get_linklist0 and getListCount to fetch neighbors
  • Iterated direct neighbor IDs for distance comparison
src/index/hnsw.cpp
Optimize HGraph UpdateVector by replacing knn search with direct bottom_graph neighbor retrieval
  • Removed base_data allocation and dataset setup
  • Removed KnnSearch invocation
  • Called bottom_graph_->GetNeighbors for neighbor list
  • Iterated over neighbors vector for distance checks
src/algorithm/hgraph.cpp

Assessment against linked issues

Issue Objective Addressed Explanation
#1288 Improve the performance of the update_vector function in the codebase.

Possibly linked issues


Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@gemini-code-assist
Copy link

Summary of Changes

Hello @ShawnShawnYou, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request focuses on enhancing the performance of vector update operations within the HGraph and HNSW indexing algorithms. By streamlining the process of identifying existing neighbors, it significantly reduces computational overhead and improves the efficiency of updating vectors in the index.

Highlights

  • Performance Improvement: Optimized the UpdateVector function in both HGraph and HNSW by replacing k-NN searches for neighbor retrieval with direct access to the graph's existing neighborhood information.
  • Reduced Computational Overhead: Eliminated the need to create temporary Dataset objects and perform potentially costly search operations during vector updates, leading to more efficient processing.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - I've reviewed your changes and they look great!


Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request aims to improve the performance of UpdateVector by replacing KNN search with direct neighbor retrieval from the graph. This is a good optimization.

My review has identified a critical issue in hnsw.cpp and a high-severity issue in hgraph.cpp. Both are related to incorrect handling of internal vs. external IDs, which could lead to incorrect behavior or crashes. Please see the detailed comments for suggestions on how to fix them.

Signed-off-by: zhongxiaoyao.zxy <zhongxiaoyao.zxy@antgroup.com>
Signed-off-by: zhongxiaoyao.zxy <zhongxiaoyao.zxy@antgroup.com>
Signed-off-by: zhongxiaoyao.zxy <zhongxiaoyao.zxy@antgroup.com>
@pull-request-size pull-request-size bot added size/L and removed size/M labels Oct 28, 2025
Signed-off-by: zhongxiaoyao.zxy <zhongxiaoyao.zxy@antgroup.com>
@ShawnShawnYou ShawnShawnYou self-assigned this Oct 28, 2025
Signed-off-by: zhongxiaoyao.zxy <zhongxiaoyao.zxy@antgroup.com>
@wxyucs wxyucs added version/0.18 kind/improvement Code improvements (variable/function renaming, refactoring, etc. ) labels Oct 28, 2025
Copy link
Collaborator

@wxyucs wxyucs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Signed-off-by: zhongxiaoyao.zxy <zhongxiaoyao.zxy@antgroup.com>
@codecov
Copy link

codecov bot commented Oct 28, 2025

Codecov Report

❌ Patch coverage is 84.90566% with 8 lines in your changes missing coverage. Please review.

@@            Coverage Diff             @@
##             main    #1289      +/-   ##
==========================================
+ Coverage   91.42%   92.02%   +0.60%     
==========================================
  Files         318      320       +2     
  Lines       17622    17687      +65     
==========================================
+ Hits        16111    16277     +166     
+ Misses       1511     1410     -101     
Flag Coverage Δ
cpp 92.02% <84.90%> (+0.60%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
common 91.15% <ø> (-0.63%) ⬇️
datacell 93.27% <ø> (+0.29%) ⬆️
index 91.04% <84.61%> (+0.59%) ⬆️
simd 100.00% <ø> (ø)

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ae9220f...f28cb9c. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Signed-off-by: zhongxiaoyao.zxy <zhongxiaoyao.zxy@antgroup.com>
Copy link
Collaborator

@inabao inabao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Signed-off-by: zhongxiaoyao.zxy <zhongxiaoyao.zxy@antgroup.com>
Signed-off-by: zhongxiaoyao.zxy <zhongxiaoyao.zxy@antgroup.com>
Signed-off-by: zhongxiaoyao.zxy <zhongxiaoyao.zxy@antgroup.com>
@ShawnShawnYou ShawnShawnYou merged commit 149d7a3 into main Oct 30, 2025
51 of 54 checks passed
@ShawnShawnYou ShawnShawnYou deleted the improve-performance-of-update-vector branch October 30, 2025 01:33
@wxyucs
Copy link
Collaborator

wxyucs commented Oct 30, 2025

@ShawnShawnYou this pull request cannot cherry-pick to the branch 0.16 (CONFLICT), please create a new pull request to the branch 0.16.

Roxanne0321 pushed a commit to Roxanne0321/vsag that referenced this pull request Nov 9, 2025
Signed-off-by: zhongxiaoyao.zxy <zhongxiaoyao.zxy@antgroup.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind/improvement Code improvements (variable/function renaming, refactoring, etc. ) module/testing size/L version/0.18

Projects

None yet

Development

Successfully merging this pull request may close these issues.

improve performance of update vector

3 participants