Releases: Michael-JB/bm25
Releases · Michael-JB/bm25
v2.3.2
Changed
- Bump
rayonfrom 0.10.0 to 0.11.0 - Bump
stop-wordsfrom 0.8.1 to 0.9.0
Full Changelog: v2.3.1...v2.3.2
v2.3.1
v2.3.0
Fixed
- Fix negative scoring of high-frequency terms. Scores returned by this version
will differ from the previous version, hence this is a minor version bump
rather than a patch. This closes the bug raised in
#20. Thank you to
hwiorn for this contribution!
Changed
- Bump
deunicodefrom 1.6.0 to 1.6.2
Full Changelog: v2.2.1...v2.3.0
v2.2.1
Changed
- Bump
stop-wordsfrom 0.8.0 to 0.8.1 - Bump
whichlangfrom 0.1.0 to 0.1.1 - Bump
cachedfrom 0.54.0 to 0.55.1
Full Changelog: v2.2.0...v2.2.1
v2.2.0
Changed
- Use
unicode-segmentationfor better word splitting. Decimal numbers and words with apostrophes
no longer generate multiple tokens. This is a (minor) breaking change for the default tokenizer.
Added
DefaultTokenizerBuilderis nowDefault.
Full Changelog: v2.1.1...v2.2.0
v2.1.1
Added
SearchResultis nowClone.- Add WebAssembly bm25-demo to README.
- Miscellaneous documentation improvements.
Full Changelog: v2.1.0...v2.1.1
v2.1.0
Added
- Customisation of the
DefaultTokenizer. You can now enable/disable normalization, stemming
and stop word removal via the newDefaultTokenizer::builder().
Changed
DefaultTokenizernow normalizes unicode. This makes search more lenient for languages with
non-ASCII characters. Note that this is a breaking change for the default tokenizer. If you
require the behaviour of the previous version, you can create your default tokenizer with the
new builder:DefaultTokenizer::builder().normalization(false).build().
Full Changelog: v2.0.1...v2.1.0
v2.0.1
v2.0.0
Changed
- Introduces
TokenEmbedder::EmbeddingSpaceto decouple the output ofTokenEmbedderfromSelf.
This lets you customise the output of yourTokenEmbedderwithout changing its type.
Full Changelog: v1.0.1...v2.0.0
v1.0.1
Fixed
- Correctly embed the README in the crate documentation. docs.rs should now display the README
correctly.
Full Changelog: v1.0.0...v1.0.1