-
Notifications
You must be signed in to change notification settings - Fork 8
Open
Labels
Description
See [1], page 112. We have Hamming distance-based blocking implemented as anonlink.blocking.bit_blocking. Let's implement Jaccard index-based blocking, since this should (?) do a better job finding records with high Dice coefficients.
This is a good opportunity to change the name of anonlink.blocking.bit_blocking to something that makes it clear that it does Hamming distance-based ANN (as opposed to using other metrics).
[1]Durham, Elizabeth Ashley. A framework for accurate, efficient private record linkage. Diss. Vanderbilt University, 2012. https://etd.library.vanderbilt.edu/available/etd-03262012-144837/unrestricted/dissertation.pdf
Aha! Link: https://csiro.aha.io/features/ANONLINK-59