Incorrect score for similarity=True

Great package but I just noticed a bug with the the score in certain situations. If I run
`damerauLevenshtein('some string', 'another one but longer', deleteWeight=1, insertWeight=3, replaceWeight=6, swapWeight=6, similarity=True)`
I get a score of 0.03636... but if I run 
`damerauLevenshtein('some string', 'another one but longer and longer', deleteWeight=1, insertWeight=3, replaceWeight=6, swapWeight=6, similarity=True)`
I get a score of 1.0 implying the two strings are identical.

From what I could see, it looks like the issue stems from the line of code
`maxDist = min(len1, len2) * min(replaceWeight, deleteWeight + insertWeight) + (max(len1, len2) - min(len1, len2)) * min(deleteWeight, insertWeight)`
which is (assuming I've understood your code) supposed to calculate the maximum distance as the cost of swapping out letters in the shorter word + the cost of adding/removing any excess letters

But for my example strings, I believe it should use the insertWeight at the end rather than min(deleteWeight, insertWeight) - there's no way to get from string1 to string2 by deletion, it definitely needs insertion. So I think basically the min() needs to be replaced with an if that checks whether insertions or deletions will be required to get from string1 to string2.

I'm running python 3.7.3 and fastDamerauLevenshtein v1.0.7


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorrect score for similarity=True #3

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Incorrect score for similarity=True #3

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions