Releases: rapidfuzz/RapidFuzz
Releases · rapidfuzz/RapidFuzz
Release 1.1.2
Fixed
- Fix reference counting in process.extract (see #81)
Release 1.1.1
Fixed
- Fix result conversion in process.extract (see #79)
Release 1.1.0
Changed
- string_metric.normalized_levenshtein supports now all weights
- when different weights are used for Insertion and Deletion the strings are not swapped inside the Levenshtein implementation anymore. So different weights for Insertion and Deletion are now supported.
- replace C++ implementation with a Cython implementation. This has the following advantages:
- The implementation is less error prone, since a lot of the complex things are done by Cython
- slighly faster than the current implementation (up to 10% for some parts)
- about 33% smaller binary size
- reduced compile time
- Added **kwargs argument to process.extract/extractOne/extract_iter that is passed to the scorer
- Add max argument to hamming distance
- Add support for whole Unicode range to utils.default_process
Performance
- replaced Wagner Fischer usage in the normal Levenshtein distance with a bitparallel implementation
Release 1.0.2
Fixed
- The bitparallel LCS algorithm in fuzz.partial_ratio did not find the longest common substring properly in some cases.
The old algorithm is used again until this bug is fixed.
Release 1.0.1
Changed
- string_metric.normalized_levenshtein supports now the weights (1, 1, N) with N >= 1
Performance
- The Levenshtein distance with the weights (1, 1, >2) do now use the same implementation as the weight (1, 1, 2), since
Substitution > Insertion + Deletionhas no effect
Fixed
- fix uninitialized variable in bitparallel Levenshtein distance with the weight (1, 1, 1)
Release 1.0.0
Changed
- all normalized string_metrics can now be used as scorer for process.extract/extractOne
- Implementation of the C++ Wrapper completely refactored to make it easier to add more scorers, processors and string matching algorithms in the future.
- increased test coverage, that already helped to fix some bugs and help to prevent regressions in the future
- improved docstrings of functions
Performance
- Added bit-parallel implementation of the Levenshtein distance for the weights (1,1,1) and (1,1,2).
- Added specialized implementation of the Levenshtein distance for cases with a small maximum edit distance, that is even faster, than the bit-parallel implementation.
- Improved performance of
fuzz.partial_ratio
-> Sincefuzz.ratioandfuzz.partial_ratioare used in most scorers, this improves the overall performance. - Improved performance of
process.extractandprocess.extractOne
Deprecated
- the
rapidfuzz.levenshteinmodule is now deprecated and will be removed in v2.0.0
These functions are now placed inrapidfuzz.string_metric.distance,normalized_distance,weighted_distanceandweighted_normalized_distanceare combined intolevenshteinandnormalized_levenshtein.
Added
- added normalized version of the hamming distance in
string_metric.normalized_hamming - process.extract_iter as a generator, that yields the similarity of all elements, that have a similarity >= score_cutoff
Fixed
- multiple bugs in extractOne when used with a scorer, that's not from RapidFuzz
- fixed bug in
token_ratio - fixed bug in result normalization causing zero division
Release 0.14.2
Fixed
- utf8 usage in the copyright header caused problems with python2.7 on some platforms (see #70)
Release 0.14.1
Fixed
- when a custom processor like
lambda s: swas used with any of the methods inside fuzz.* it always returned a score of 100. This release fixes this and adds a better test coverage to prevent this bug in the future.
Release 0.14.0
Added
- added hamming distance metric in the levenshtein module
Performance
- improved performance of default_process by using lookup table
Release 0.13.4
Fixed
- Add missing virtual destructor that caused a segmentation fault on Mac Os