-
Notifications
You must be signed in to change notification settings - Fork 2
Description
Great package but I just noticed a bug with the the score in certain situations. If I run
damerauLevenshtein('some string', 'another one but longer', deleteWeight=1, insertWeight=3, replaceWeight=6, swapWeight=6, similarity=True)
I get a score of 0.03636... but if I run
damerauLevenshtein('some string', 'another one but longer and longer', deleteWeight=1, insertWeight=3, replaceWeight=6, swapWeight=6, similarity=True)
I get a score of 1.0 implying the two strings are identical.
From what I could see, it looks like the issue stems from the line of code
maxDist = min(len1, len2) * min(replaceWeight, deleteWeight + insertWeight) + (max(len1, len2) - min(len1, len2)) * min(deleteWeight, insertWeight)
which is (assuming I've understood your code) supposed to calculate the maximum distance as the cost of swapping out letters in the shorter word + the cost of adding/removing any excess letters
But for my example strings, I believe it should use the insertWeight at the end rather than min(deleteWeight, insertWeight) - there's no way to get from string1 to string2 by deletion, it definitely needs insertion. So I think basically the min() needs to be replaced with an if that checks whether insertions or deletions will be required to get from string1 to string2.
I'm running python 3.7.3 and fastDamerauLevenshtein v1.0.7