-
-
Notifications
You must be signed in to change notification settings - Fork 152
Open
Description
I am looking at token_set_ratio computation in fuzzy_py.py
When comparing the differences between two strings
dist = indel_distance(diff_ab_joined, diff_ba_joined, score_cutoff=cutoff_distance)
if dist <= cutoff_distance:
result = _norm_distance(dist, sect_ab_len + sect_ba_len, score_cutoff)
Why is "sect_ab_len+sect_ba_len" used for normalization?
We are comparing diff_ab_joined, diff_ba_joined.
So, shouldn't we be using "ab_len+ba_len" instead of "sect_ab_len+sect_ba_len" ?
By using "sect_ab_len+sect_ba_len", generous scores are given.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels