Skip to content
This repository was archived by the owner on Apr 4, 2023. It is now read-only.

Commit 0a301b5

Browse files
bors[bot]loiclec
andauthored
Merge #723
723: Fix bug in handling of soft deleted documents when updating settings r=Kerollmops a=loiclec # Pull Request ## Related issue Fixes (partially, until merged into meilisearch) meilisearch/meilisearch#3021 ## What does this PR do? This PR fixes the bug where a `missing key in documents database` internal error message could appear when indexing documents. When updating the settings, before clearing the database and before creating the transform output, we now modify the `ExternalDocumentsIds` structure to get rid of all references to soft deleted document ids in its FSTs. It used to be that updating the settings would clear the soft-deleted document ids, but keep the original `ExternalDocumentsIds` structure. As a consequence of this, when processing a future document addition, we could wrongly believe that a document was being replaced when, in fact, it was a completely new document. See the tests `bug_3021_first`, `bug_3021_second`, and `bug_3021` for a minimal test case that would have reproduced the issue. We need to take special care to: - evaluate how users should update to v0.30.1 (containing this fix): dump? reimporting all documents from scratch? - understand IF/HOW this bug could have caused duplicate documents to be returned - and evaluate the correctness of the fix, of course :) Co-authored-by: Loïc Lecrenier <[email protected]>
2 parents 2a846aa + a993b68 commit 0a301b5

File tree

4 files changed

+311
-183
lines changed

4 files changed

+311
-183
lines changed

milli/src/external_documents_ids.rs

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -71,6 +71,30 @@ impl<'a> ExternalDocumentsIds<'a> {
7171
self.merge_soft_into_hard()
7272
}
7373

74+
/// Rebuild the internal FSTs in the ExternalDocumentsIds structure such that they
75+
/// don't contain any soft deleted document id.
76+
pub fn delete_soft_deleted_documents_ids_from_fsts(&mut self) -> fst::Result<()> {
77+
let mut new_hard_builder = fst::MapBuilder::memory();
78+
79+
let union_op = self.hard.op().add(&self.soft).r#union();
80+
let mut iter = union_op.into_stream();
81+
while let Some((external_id, docids)) = iter.next() {
82+
// prefer selecting the ids from soft, always
83+
let id = indexed_last_value(docids).unwrap();
84+
if id != DELETED_ID && !self.soft_deleted_docids.contains(id as u32) {
85+
new_hard_builder.insert(external_id, id)?;
86+
}
87+
}
88+
drop(iter);
89+
90+
// Delete soft map completely
91+
self.soft = fst::Map::default().map_data(Cow::Owned)?;
92+
// We save the new map as the new hard map.
93+
self.hard = new_hard_builder.into_map().map_data(Cow::Owned)?;
94+
95+
Ok(())
96+
}
97+
7498
pub fn insert_ids<A: AsRef<[u8]>>(&mut self, other: &fst::Map<A>) -> fst::Result<()> {
7599
let union_op = self.soft.op().add(other).r#union();
76100

0 commit comments

Comments
 (0)