Skip to content

Commit fc87424

Browse files
committed
feat(deduplication): ignore periods when comparing names
This is a small extension to our deduplication logic so that periods in names will be ignored when comparing for dedupication. For example, a query for `3929 St Marks Avenue, Niagara Falls, ON, Canada` returns two duplicate addresses from OpenAddresses. One is sourced from a countrywide dataset, and another a regional dataset. One has a period after the abbreviation for Saint, one doesn't. We should probably evaluate ignoring most or all punctuation, but this fixes a somewhat common case for now.
1 parent f0e1b14 commit fc87424

File tree

2 files changed

+2
-1
lines changed

2 files changed

+2
-1
lines changed

helper/diffPlaces.js

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -386,7 +386,7 @@ function layerDependentNormalization(names, layer) {
386386
* lowercase characters and remove diacritics and some punctuation
387387
*/
388388
function normalizeString(str){
389-
return removeAccents(unicode.normalize(str)).toLowerCase().split(/[ ,-]+/).join(' ');
389+
return removeAccents(unicode.normalize(str)).toLowerCase().split(/[ ,-.]+/).join(' ');
390390
}
391391

392392
module.exports.isDifferent = isDifferent;

test/unit/helper/diffPlaces.js

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -737,6 +737,7 @@ module.exports.tests.normalizeString = function (test, common) {
737737
t.equal(normalizeString('foo, bar'), 'foo bar');
738738
t.equal(normalizeString('foo-bar'), 'foo bar');
739739
t.equal(normalizeString('foo , - , - bar'), 'foo bar');
740+
t.equal(normalizeString('St. Marks'), 'st marks');
740741
t.end();
741742
});
742743

0 commit comments

Comments
 (0)