enh: enable insertion of documents using map merge sql approach#69
Open
VarinThakur01 wants to merge 3 commits intomainfrom
Open
enh: enable insertion of documents using map merge sql approach#69VarinThakur01 wants to merge 3 commits intomainfrom
VarinThakur01 wants to merge 3 commits intomainfrom
Conversation
yonarw
reviewed
Feb 17, 2026
| try: | ||
| cur.executemany(insert_temp_table_sql, [(i, text) for i,text in enumerate(texts)]) | ||
| except Exception as e: | ||
| raise Exception(f"Error while inserting rows for map merge :{e}") |
Contributor
There was a problem hiding this comment.
If we fail here or an exception occurs, the temp table above isn't cleaned up.
|
|
||
| try: | ||
| cur.execute(create_map_merge_function_sql) | ||
| except Exception as e: |
| assert number_of_rows == number_of_texts | ||
|
|
||
|
|
||
| def test_hanavector_add_texts_with_map_merge(vectorDB) -> None: |
Contributor
There was a problem hiding this comment.
Does it make sense to parameterise these tests?
VarinThakur01
commented
Feb 18, 2026
| ) | ||
|
|
||
| create_map_merge_function_sql = f""" | ||
| CREATE OR REPLACE FUNCTION F_VECTOR_EMBEDDING( |
Collaborator
Author
There was a problem hiding this comment.
Append universal unique id to prevent parallel creation conflicts.
VarinThakur01
commented
Feb 18, 2026
| ) | ||
|
|
||
| create_map_merge_function_sql = f""" | ||
| CREATE OR REPLACE FUNCTION F_VECTOR_EMBEDDING( |
Collaborator
Author
There was a problem hiding this comment.
Create suffix -> hash of all params for parallel sessions with diff embedding models)
Params : VECTOR_COLUMN_TYPE, EMBEDDING_ID, REMOTE_SOURCE
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR adds the
use_map_mergeflag to functionsfrom_texts,add_texts.This flag only works when internal embeddings are used and raises an error otherwise.
Using this flag, documents are inserted into the vectorstore using the
map mergemethod.The corresponding test cases are also added to test the functionality.
Examples are also updated to depict this new functionality
Using this approach should result in 4-5x speedup while inserting large amount of documents in the vectorstore.