Skip to content

Comments

feat(index): add write function for Serialize#1291

Merged
LHT129 merged 1 commit intoantgroup:mainfrom
LHT129:writer
Nov 6, 2025
Merged

feat(index): add write function for Serialize#1291
LHT129 merged 1 commit intoantgroup:mainfrom
LHT129:writer

Conversation

@LHT129
Copy link
Collaborator

@LHT129 LHT129 commented Oct 28, 2025

closed: #1282

Summary by Sourcery

Add support for serializing indexes via a custom write function

New Features:

  • Introduce WriteFuncType and offset/size typedefs in index.h
  • Add Serialize(write_func) method to Index interface with default implementation
  • Implement write-function-based serialization in InnerIndexInterface and IndexImpl
  • Enable SUPPORT_SERIALIZE_WRITE_FUNC feature for algorithms (BruteForce, HGraph, IVF)

Tests:

  • Add TestSerializeWriteFunc unit test to validate serialization with a write function and ensure round-trip deserialization preserves search results

@LHT129 LHT129 self-assigned this Oct 28, 2025
@LHT129 LHT129 requested a review from inabao as a code owner October 28, 2025 06:46
@LHT129 LHT129 added the kind/feature New feature or request label Oct 28, 2025
@sourcery-ai
Copy link

sourcery-ai bot commented Oct 28, 2025

Reviewer's Guide

This PR adds a new serialization API that accepts a user‐provided write function. It extends the Index interface, provides implementations in InnerIndexInterface and IndexImpl, updates algorithm feature flags to advertise support, and adds a test to verify write‐function–based serialization and round‐trip deserialization.

Sequence diagram for serialization using a user-provided write function

sequenceDiagram
    participant User
    participant IndexImpl
    participant InnerIndexInterface
    participant WriteFuncStreamWriter
    User->>IndexImpl: Serialize(write_func)
    IndexImpl->>InnerIndexInterface: Serialize(write_func)
    InnerIndexInterface->>WriteFuncStreamWriter: Create writer with write_func
    InnerIndexInterface->>WriteFuncStreamWriter: Serialize(writer)
    WriteFuncStreamWriter->>User: write_func(offset, size, data)
    Note over User,WriteFuncStreamWriter: Data is written via user-provided function
Loading

ER diagram for updated IndexFeature enum

erDiagram
    INDEX_FEATURE {
        SUPPORT_METRIC_TYPE_COSINE string
        SUPPORT_SERIALIZE_FILE string
        SUPPORT_SERIALIZE_BINARY_SET string
        SUPPORT_SERIALIZE_WRITE_FUNC string
        SUPPORT_DESERIALIZE_FILE string
        SUPPORT_DESERIALIZE_BINARY_SET string
        SUPPORT_DESERIALIZE_READER_SET string
    }
    ALGORITHM ||--o{ INDEX_FEATURE : supports
Loading

Class diagram for new and updated serialization methods in Index and related classes

classDiagram
    class Index {
        <<abstract>>
        +Serialize() tl::expected<BinarySet, Error>
        +Serialize(WriteFuncType write_func) tl::expected<void, Error>
    }
    class IndexImpl {
        +Serialize() tl::expected<BinarySet, Error>
        +Serialize(WriteFuncType write_func) tl::expected<void, Error>
        +Serialize(std::ostream& out_stream) tl::expected<void, Error>
    }
    class InnerIndexInterface {
        +Serialize() tl::expected<BinarySet, Error>
        +Serialize(WriteFuncType write_func) void
        +Serialize(std::ostream& out_stream) void
        +Serialize(StreamWriter& writer) void
    }
    Index <|-- IndexImpl
    IndexImpl o-- InnerIndexInterface
    class WriteFuncType {
        std::function<void(OffsetType, SizeType, const void*)>
    }
    class OffsetType {
        uint64_t
    }
    class SizeType {
        uint64_t
    }
Loading

File-Level Changes

Change Details Files
Define write‐function–based serialization API in Index interface
  • Add OffsetType, SizeType, WriteFuncType aliases
  • Declare virtual Serialize(WriteFuncType) with default throw
include/vsag/index.h
Implement write‐function serialization in InnerIndexInterface
  • Declare overload Serialize(WriteFuncType) in header
  • Define method to compute size and drive StreamWriter with write_func
src/algorithm/inner_index_interface.h
src/algorithm/inner_index_interface.cpp
Forward new Serialize API in IndexImpl
  • Override Serialize(WriteFuncType) to call inner_index_->Serialize(write_func)
src/index/index_impl.h
Advertise write‐function support in feature flags
  • Add SUPPORT_SERIALIZE_WRITE_FUNC enum value
  • Include this feature in InitFeatures for brute_force, hgraph, ivf
include/vsag/index_features.h
src/algorithm/brute_force.cpp
src/algorithm/hgraph.cpp
src/algorithm/ivf.cpp
Add TestSerializeWriteFunc to validate write‐function serialization
  • Implement test logic writing to temp file via write_func and round‐trip deserializing
  • Declare test in TestIndex class
tests/test_index.cpp
tests/test_index.h

Possibly linked issues


Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@gemini-code-assist
Copy link

Summary of Changes

Hello @LHT129, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the Index serialization capabilities by introducing a new method that accepts a custom write function. This allows users to define how the serialized data is written, offering greater control and flexibility over the storage mechanism. The change involves updating the core Index interface, implementing the new serialization logic, and ensuring various index algorithms correctly declare and support this feature, along with a dedicated test case to verify its functionality.

Highlights

  • New Serialization Method: Introduced an overloaded Serialize method in the Index class that accepts a WriteFuncType, allowing for custom write operations during serialization.
  • Custom Write Function Type: Defined WriteFuncType as a std::function signature for writing data, providing flexibility for how serialized index data is handled.
  • Feature Flag: Added a new SUPPORT_SERIALIZE_WRITE_FUNC feature flag to indicate which index implementations support this new serialization mechanism.
  • Implementation and Support: Implemented the new Serialize(WriteFuncType) in InnerIndexInterface and updated BruteForce, HGraph, and IVF algorithms to declare support for this feature.
  • New Test Case: Added TestSerializeWriteFunc to validate the correctness and functionality of the new write-function-based serialization.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - I've reviewed your changes - here's some feedback:

  • Align the WriteFuncType Serialize overload’s error handling with the tl::expected return pattern instead of throwing runtime exceptions for unsupported indexes.
  • Extract common serialization logic between the ostream and WriteFuncType overloads into a shared helper to avoid duplication in InnerIndexInterface.
  • Verify that all index implementations supporting file-based serialization also advertise SUPPORT_SERIALIZE_WRITE_FUNC to keep feature flags consistent.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- Align the WriteFuncType Serialize overload’s error handling with the tl::expected return pattern instead of throwing runtime exceptions for unsupported indexes.
- Extract common serialization logic between the ostream and WriteFuncType overloads into a shared helper to avoid duplication in InnerIndexInterface.
- Verify that all index implementations supporting file-based serialization also advertise SUPPORT_SERIALIZE_WRITE_FUNC to keep feature flags consistent.

## Individual Comments

### Comment 1
<location> `src/algorithm/inner_index_interface.cpp:147-153` </location>
<code_context>
 }

+void
+InnerIndexInterface::Serialize(WriteFuncType write_func) const {
+    std::string time_record_name = this->GetName() + " Serialize";
+    SlowTaskTimer t(time_record_name);
+
+    uint64_t num_bytes = this->CalSerializeSize();
+    WriteFuncStreamWriter writer(write_func, 0);
+    this->Serialize(writer);
+}
+
</code_context>

<issue_to_address>
**issue:** Missing error handling for WriteFuncType serialization.

This method should return a tl::expected or similar error type to match the error handling approach used in other Serialize methods.
</issue_to_address>

### Comment 2
<location> `src/index/index_impl.h:380-381` </location>
<code_context>
     }

+    [[nodiscard]] tl::expected<void, Error>
+    Serialize(WriteFuncType write_func) const override {
+        SAFE_CALL(this->inner_index_->Serialize(write_func));
+    }
+
</code_context>

<issue_to_address>
**issue (bug_risk):** Return value from SAFE_CALL is not propagated.

Please return the result of inner_index_->Serialize to ensure error information is not lost.
</issue_to_address>

### Comment 3
<location> `src/algorithm/inner_index_interface.h:329-333` </location>
<code_context>
     virtual void
     Serialize(std::ostream& out_stream) const;

+    virtual void
+    Serialize(WriteFuncType write_func) const;
+
     virtual void
</code_context>

<issue_to_address>
**issue:** Inconsistent return type for Serialize with WriteFuncType.

Standardize the return type of Serialize(WriteFuncType) to match other implementations and improve error handling.
</issue_to_address>

### Comment 4
<location> `include/vsag/index.h:649-652` </location>
<code_context>
+      *
+      * @param write_func is a function to write serialized index
+      */
+    [[nodiscard]] virtual tl::expected<void, Error>
+    Serialize(WriteFuncType write_func) const {
+        throw std::runtime_error("Index doesn't support Serialize with write function");
+    }
</code_context>

<issue_to_address>
**suggestion:** Default implementation throws runtime_error but returns tl::expected.

Instead of throwing, return tl::unexpected(Error) to align with tl::expected error handling.

```suggestion
    [[nodiscard]] virtual tl::expected<void, Error>
    Serialize(WriteFuncType write_func) const {
        return tl::unexpected(Error("Index doesn't support Serialize with write function"));
    }
```
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines 147 to 153
InnerIndexInterface::Serialize(WriteFuncType write_func) const {
std::string time_record_name = this->GetName() + " Serialize";
SlowTaskTimer t(time_record_name);

uint64_t num_bytes = this->CalSerializeSize();
WriteFuncStreamWriter writer(write_func, 0);
this->Serialize(writer);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: Missing error handling for WriteFuncType serialization.

This method should return a tl::expected or similar error type to match the error handling approach used in other Serialize methods.

Comment on lines +380 to +381
Serialize(WriteFuncType write_func) const override {
SAFE_CALL(this->inner_index_->Serialize(write_func));
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (bug_risk): Return value from SAFE_CALL is not propagated.

Please return the result of inner_index_->Serialize to ensure error information is not lost.

Comment on lines 329 to 333
virtual void
Serialize(std::ostream& out_stream) const;

virtual void
Serialize(WriteFuncType write_func) const;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: Inconsistent return type for Serialize with WriteFuncType.

Standardize the return type of Serialize(WriteFuncType) to match other implementations and improve error handling.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new serialization method using a write function, which is a great addition for flexibility. The implementation is generally clean and includes a good test case. I have two suggestions for improvement: one is to remove an unused variable, and the other is to align the default implementation of the new virtual function with its return type to avoid throwing exceptions, which would improve the API's robustness.

Comment on lines +650 to +652
Serialize(WriteFuncType write_func) const {
throw std::runtime_error("Index doesn't support Serialize with write function");
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The default implementation of this function throws a std::runtime_error, but its signature indicates it should return a tl::expected<void, Error>. While this seems to be a pattern in this class, it's generally better to return an error value that matches the function signature, like tl::unexpected, to avoid surprising callers who might not expect an exception from a function returning tl::expected. This would make the interface safer to use directly.

Suggested change
Serialize(WriteFuncType write_func) const {
throw std::runtime_error("Index doesn't support Serialize with write function");
}
Serialize(WriteFuncType write_func) const {
return tl::unexpected(vsag::Error(vsag::ErrorType::UNSUPPORTED_INDEX_OPERATION, "Index doesn't support Serialize with write function"));
}

std::string time_record_name = this->GetName() + " Serialize";
SlowTaskTimer t(time_record_name);

uint64_t num_bytes = this->CalSerializeSize();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The variable num_bytes is calculated but its value is never used. This unused variable should be removed to improve code clarity.

@LHT129 LHT129 force-pushed the writer branch 2 times, most recently from 1e9fbdf to 39aac7b Compare October 28, 2025 09:15
@codecov
Copy link

codecov bot commented Oct 28, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.

@@            Coverage Diff             @@
##             main    #1291      +/-   ##
==========================================
+ Coverage   91.28%   91.99%   +0.71%     
==========================================
  Files         320      320              
  Lines       17693    17702       +9     
==========================================
+ Hits        16151    16285     +134     
+ Misses       1542     1417     -125     
Flag Coverage Δ
cpp 91.99% <100.00%> (+0.71%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
common 91.15% <ø> (ø)
datacell 93.37% <ø> (+0.65%) ⬆️
index 90.91% <100.00%> (+0.55%) ⬆️
simd 100.00% <ø> (ø)

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fa2bd21...8189173. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Collaborator

@ShawnShawnYou ShawnShawnYou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Copy link
Collaborator

@wxyucs wxyucs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@LHT129 LHT129 force-pushed the writer branch 2 times, most recently from 3807d74 to 2d355a8 Compare November 6, 2025 06:22
Signed-off-by: LHT129 <tianlan.lht@antgroup.com>
Copy link
Collaborator

@inabao inabao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@LHT129 LHT129 merged commit 99906c9 into antgroup:main Nov 6, 2025
24 checks passed
Roxanne0321 pushed a commit to Roxanne0321/vsag that referenced this pull request Nov 9, 2025
Signed-off-by: LHT129 <tianlan.lht@antgroup.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support Write Function for Index serialize (like reader for deserialize)

4 participants