Skip to content

Implement Jaccard blocking #162

@nbgl

Description

@nbgl

See [1], page 112. We have Hamming distance-based blocking implemented as anonlink.blocking.bit_blocking. Let's implement Jaccard index-based blocking, since this should (?) do a better job finding records with high Dice coefficients.

This is a good opportunity to change the name of anonlink.blocking.bit_blocking to something that makes it clear that it does Hamming distance-based ANN (as opposed to using other metrics).

[1]Durham, Elizabeth Ashley. A framework for accurate, efficient private record linkage. Diss. Vanderbilt University, 2012. https://etd.library.vanderbilt.edu/available/etd-03262012-144837/unrestricted/dissertation.pdf

Aha! Link: https://csiro.aha.io/features/ANONLINK-59

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions