Question regarding token_set_ratio

I am looking at token_set_ratio computation in fuzzy_py.py

When comparing the differences between two strings

```
    dist = indel_distance(diff_ab_joined, diff_ba_joined, score_cutoff=cutoff_distance)

    if dist <= cutoff_distance:
        result = _norm_distance(dist, sect_ab_len + sect_ba_len, score_cutoff)
```

Why is "sect_ab_len+sect_ba_len" used for normalization?
We are comparing diff_ab_joined, diff_ba_joined.
So, shouldn't we be using  "ab_len+ba_len" instead of "sect_ab_len+sect_ba_len" ?
By using "sect_ab_len+sect_ba_len", generous scores are given.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Question regarding token_set_ratio #468

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Question regarding token_set_ratio #468

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions