Skip to content

Conversation

gianielsevier
Copy link
Contributor

@gianielsevier gianielsevier commented Dec 28, 2023

Using jdbcTemplate batchUpdate improving the performance of multiple upsert

Thank you for taking time to contribute this pull request!
You might have already read the [contributor guide][1], but as a reminder, please make sure to:

  • Sign the contributor license agreement
  • Rebase your changes on the latest main branch and squash your commits
  • Add/Update unit tests as needed
  • Run a build and make sure all tests pass prior to submission

@markpollack markpollack added this to the 0.8.0 milestone Jan 2, 2024
@markpollack markpollack requested a review from tzolov January 2, 2024 18:01
@markpollack
Copy link
Member

Thanks! I fixed one unrelated issue in reviewing this. I do wonder how good using batch as a default strategy is, for example if the number of documents gets to be very large. Perhaps a case for having something like Spring Batch manage this with 'chunks' would be useful to have in the future.

@markpollack
Copy link
Member

merged in 2a60426

@markpollack markpollack closed this Jan 2, 2024
@markpollack markpollack removed the request for review from tzolov January 2, 2024 18:22
@markpollack markpollack self-assigned this Jan 2, 2024
@gianielsevier
Copy link
Contributor Author

gianielsevier commented Jan 5, 2024

merged in 2a60426

Thanks for accepting my PR @markpollack.
I've tested the original version using a dataset of 126k records from the medium csv file here. and it took more or less 17 minutes to ingest the whole file.
Using the batch update approach it reduced to 11 minutes.

I also compared ingesting the same dataset using the neo4j vector store which took around 12 minutes to process the whole file and Redis that had a similar time to process (12 minutes)

I can give more implementation details about the benchmark if you like.

Cheers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants