Merged
Conversation
2 tasks
nnethercott
reviewed
Dec 23, 2025
02ca250 to
8166cd6
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
We tested the
force_buildmethod to rebuild the entire graph at Meilisearch, but we have some performance concerns.We can see that most of the time spent rebuilding is actually deleting the just-inserted "updated" items from the database in a massive operation. Before calling the actual
buildmethod, we mark all items as updated so that we can use the build method as before and iterate over all updated items, deleting those entries to determine which items were just touched.However, in the case of a complete rebuild, we can optimize this by simply indicating that no items were deleted and all items were inserted. In this PR, I introduce a
relink_all_itemsboolean that can be handled by the build method as an optimization to avoid fetching and deleting the "updated" markers. This specific part consumed more than 63% of the reindexing time (2h20min over 3h30min).I'm not sure about the terminology I used here, but the optimization seems essential to me. I am performing a benchmark today and will update the Meilisearch PR with the new measurements soon 👀
We should find a better, and possibly breaking, format or way to store the set of updated or deleted items, as LMDB is not performant enough to in-place delete database entries. I would either use a completely separate database to store updates or update a bitmap in a specific entry. The positive aspect of this breaking change is that it should not impact Meilisearch, as the behavior only changes during indexation, and any changes to the database made during this time are cleaned up afterward.