Skip to content

enh: enable insertion of documents using map merge sql approach#69

Open
VarinThakur01 wants to merge 3 commits intomainfrom
map_merge
Open

enh: enable insertion of documents using map merge sql approach#69
VarinThakur01 wants to merge 3 commits intomainfrom
map_merge

Conversation

@VarinThakur01
Copy link
Collaborator

This PR adds the use_map_merge flag to functions from_texts, add_texts.
This flag only works when internal embeddings are used and raises an error otherwise.

Using this flag, documents are inserted into the vectorstore using the map merge method.

The corresponding test cases are also added to test the functionality.
Examples are also updated to depict this new functionality

Using this approach should result in 4-5x speedup while inserting large amount of documents in the vectorstore.

@VarinThakur01 VarinThakur01 requested a review from yonarw February 16, 2026 11:37
try:
cur.executemany(insert_temp_table_sql, [(i, text) for i,text in enumerate(texts)])
except Exception as e:
raise Exception(f"Error while inserting rows for map merge :{e}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we fail here or an exception occurs, the temp table above isn't cleaned up.


try:
cur.execute(create_map_merge_function_sql)
except Exception as e:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here (and below).

assert number_of_rows == number_of_texts


def test_hanavector_add_texts_with_map_merge(vectorDB) -> None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to parameterise these tests?

)

create_map_merge_function_sql = f"""
CREATE OR REPLACE FUNCTION F_VECTOR_EMBEDDING(
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Append universal unique id to prevent parallel creation conflicts.

)

create_map_merge_function_sql = f"""
CREATE OR REPLACE FUNCTION F_VECTOR_EMBEDDING(
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Create suffix -> hash of all params for parallel sessions with diff embedding models)
Params : VECTOR_COLUMN_TYPE, EMBEDDING_ID, REMOTE_SOURCE

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants