Skip to content

Commit 18c3a2b

Browse files
committed
Implement __contains__ for vocab.
Python performs linear search over sequences if __contains__ is not explicitly implemented. This is painfully slow over large vocabularies, therefore implement __contains__ explicitly.
1 parent 3d7531d commit 18c3a2b

File tree

1 file changed

+12
-0
lines changed

1 file changed

+12
-0
lines changed

src/vocab.rs

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -75,4 +75,16 @@ impl PySequenceProtocol for PyVocab {
7575
Ok(words[idx as usize].clone())
7676
}
7777
}
78+
79+
fn __contains__(&self, word: String) -> PyResult<bool> {
80+
let embeds = self.embeddings.borrow();
81+
Ok(embeds
82+
.vocab()
83+
.idx(&word)
84+
.map(|word_idx| match word_idx {
85+
WordIndex::Word(_) => true,
86+
WordIndex::Subword(_) => false,
87+
})
88+
.unwrap_or(false))
89+
}
7890
}

0 commit comments

Comments
 (0)