Skip to content

Conversation

@LHT129
Copy link
Collaborator

@LHT129 LHT129 commented Jan 28, 2026

closed: #1541

  • add and refactor new interfaces
  • implement if mark remove for HGraph, IVF

@gemini-code-assist
Copy link

Summary of Changes

Hello @LHT129, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the vector management capabilities within the indexing system by introducing explicit modes for adding and removing vectors. The core change revolves around a 'mark remove' feature, which allows vectors to be logically deleted without immediate physical removal, providing greater flexibility for index maintenance and recovery. This refactoring impacts the fundamental Index interface and its various concrete implementations, ensuring consistent behavior and improved control over index state.

Highlights

  • New Enums for Add and Remove Operations: Introduced AddMode and RemoveMode enums in include/vsag/index.h to provide explicit control over how vectors are added (e.g., DEFAULT, KEEP_TOMBSTONE) and removed (e.g., MARK_REMOVE, REMOVE_AND_REPAIR).
  • Refactored Index Interface for Removal: The Index::Remove method has been refactored to accept a std::vector<int64_t> of IDs and a RemoveMode, returning uint32_t for the count of successfully removed vectors. A new ShrinkAndRepair method was also added to the Index interface.
  • Enhanced LabelTable with Mark Remove Functionality: The LabelTable implementation now supports 'mark remove' operations, where vectors are logically marked as deleted using a deleted_ids_ set and a GetDeletedIdsFilter. This allows for soft deletions without immediate physical removal.
  • Integrated Combined Filter for Search Operations: A new CombinedFilter class was introduced and integrated into HGraph and IVF search methods. This filter automatically includes the deleted_ids_filter from the LabelTable, ensuring that marked-for-removal vectors are excluded from search results.
  • Updated Index Implementations: Various index implementations including SimpleFlat, BruteForce, HGraph, IVF, Pyramid, SINDI, SparseIndex, and HNSW have been updated to conform to the new Add and Remove method signatures and leverage the new removal modes.
  • Comprehensive Unit and Integration Tests: New unit tests for the LabelTable and integration tests for HGraph and IVF have been added to validate the new 'mark remove' functionality and ensure correctness across different index types.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a significant refactoring of the Remove functionality to support a "mark for removal" (soft delete) mode. This is a well-executed change that improves the flexibility of the index. The API for Remove has been updated to handle batch operations and return a count of removed items, which is a good improvement. The implementation of soft deletes in HGraph and IVF through LabelTable and CombinedFilter is solid. I've included a few suggestions to improve performance and correctness in specific implementations.

I am having trouble creating individual review comments. Click here to see my feedback.

mockimpl/vsag/simpleflat.cpp (432-451)

high

The current implementation of Remove is inefficient for batch operations. It iterates through the IDs to be removed and, for each ID, it finds the element and resizes the underlying vectors. This results in O(M*N) complexity where M is the number of IDs to remove and N is the total number of elements. The repeated resizing of ids_ and data_ inside the loop is particularly expensive.

I suggest a more efficient implementation with O(N) complexity using a two-pointer approach to remove elements in-place in a single pass.

tl::expected<uint32_t, Error>
SimpleFlat::Remove(const std::vector<int64_t>& ids, RemoveMode mode) {
    if (ids.empty()) {
        return 0;
    }
    std::unordered_set<int64_t> ids_to_remove(ids.begin(), ids.end());

    int64_t original_num_elements = num_elements_;
    int64_t write_idx = 0;
    for (int64_t read_idx = 0; read_idx < original_num_elements; ++read_idx) {
        if (ids_to_remove.find(ids_[read_idx]) == ids_to_remove.end()) {
            if (read_idx != write_idx) {
                ids_[write_idx] = ids_[read_idx];
                std::memcpy(data_.data() + write_idx * dim_,
                            data_.data() + read_idx * dim_,
                            dim_ * sizeof(float));
            }
            write_idx++;
        }
    }

    uint32_t removed_count = original_num_elements - write_idx;
    if (removed_count > 0) {
        num_elements_ = write_idx;
        ids_.resize(num_elements_);
        data_.resize(num_elements_ * dim_);
    }

    return removed_count;
}

src/algorithm/brute_force.cpp (150-179)

high

The Remove function incorrectly returns 1 regardless of the number of IDs successfully removed. The function signature indicates it should return uint32_t, which is the count of removed items. This is a bug. The function should count the number of successful removals and return that count.

uint32_t
BruteForce::Remove(const std::vector<int64_t>& ids, RemoveMode mode) {
    CHECK_ARGUMENT(not use_attribute_filter_,
                   "remove is not supported when use_attribute_filter is true");

    std::scoped_lock lock(this->add_mutex_, this->label_lookup_mutex_);
    uint32_t removed_count = 0;
    for (auto label : ids) {
        const auto last_inner_id = static_cast<InnerIdType>(this->total_count_ - 1);
        const auto inner_id = this->label_table_->GetIdByLabel(label);

        CHECK_ARGUMENT(inner_id <= last_inner_id, "the element to be remove is invalid");

        const auto last_label = this->label_table_->GetLabelById(last_inner_id);
        this->label_table_->MarkRemove(label);
        --this->label_table_->total_count_;

        if (inner_id < last_inner_id) {
            Vector<float> data(dim_, allocator_);
            GetVectorByInnerId(last_inner_id, data.data());

            this->label_table_->MarkRemove(last_label);
            --this->label_table_->total_count_;

            this->inner_codes_->InsertVector(data.data(), inner_id);
            this->label_table_->Insert(inner_id, last_label);
        }

        this->total_count_--;
        removed_count++;
    }
    return removed_count;
}

src/algorithm/ivf.cpp (542-551)

medium

The Remove implementation only handles RemoveMode::MARK_REMOVE. If another mode like REMOVE_AND_REPAIR is passed, the function does nothing and returns 0. This could be misleading. It would be better to explicitly handle unsupported modes, for example by throwing an exception, to make the behavior clear.

uint32_t
IVF::Remove(const std::vector<int64_t>& ids, RemoveMode mode) {
    if (mode != RemoveMode::MARK_REMOVE) {
        throw VsagException(ErrorType::UNSUPPORTED_INDEX_OPERATION, "IVF only supports MARK_REMOVE mode.");
    }
    uint32_t delete_count = 0;
    std::scoped_lock label_lock(this->label_lookup_mutex_);
    delete_count = this->label_table_->MarkRemove(ids);
    delete_count_ += delete_count;
    return delete_count;
}

@LHT129 LHT129 force-pushed the remove branch 2 times, most recently from 74b36e9 to f2c6246 Compare January 28, 2026 09:17
return true;
this->total_count_--;
}
return 1;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove should return the number of successfully deleted

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only support mark remove, the repair remove will be supported soon

GetNumElements() const override;

[[nodiscard]] int64_t
GetNumberRemoved() const override {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename GetNumberRemoved to GetNumRemoved for short


using CombinedFilterPtr = std::shared_ptr<CombinedFilter>;

} // namespace vsag No newline at end of file
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a blank line at EOF

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

}
for (auto id : ids) {
try {
std::unique_lock lock(rw_mutex_);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use std::scoped_lock instead

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@LHT129 LHT129 force-pushed the remove branch 4 times, most recently from f4726c0 to cb0be73 Compare January 29, 2026 07:10
@codecov
Copy link

codecov bot commented Jan 29, 2026

Codecov Report

❌ Patch coverage is 78.98833% with 54 lines in your changes missing coverage. Please review.

❌ Your patch check has failed because the patch coverage (78.98%) is below the target coverage (80.00%). You can increase the patch coverage or adjust the target coverage.

@@            Coverage Diff             @@
##             main    #1551      +/-   ##
==========================================
- Coverage   91.57%   91.29%   -0.29%     
==========================================
  Files         328      329       +1     
  Lines       19263    19396     +133     
==========================================
+ Hits        17641    17707      +66     
- Misses       1622     1689      +67     
Flag Coverage Δ
cpp 91.29% <78.98%> (-0.29%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
common 85.81% <0.00%> (ø)
datacell 93.74% <ø> (+0.27%) ⬆️
index 90.72% <76.32%> (-0.75%) ⬇️
simd 100.00% <ø> (ø)

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a33e9c2...2c02b23. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@LHT129 LHT129 force-pushed the remove branch 2 times, most recently from 1c1f44a to 0534b39 Compare January 30, 2026 03:39
- add and refactor new interfaces
- implement if mark remove for HGraph, IVF

Signed-off-by: LHT129 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support Remove with RemoveMode:MARKREMOVE

2 participants