HMNI

Fuzzy name matching with machine learning. Perform common fuzzy name matching tasks including similarity scoring, record linkage, deduplication and normalization.

HMNI is trained on an internationally-transliterated Latin firstname dataset, where precision is afforded priority.

Model	Accuracy	Precision	Recall	F1-Score
HMNI-Latin	0.9393	0.9255	0.7548	0.8315

Installation

pip install hmni

Quick Usage Guide

Initialize a Matcher Object

import hmni
matcher = hmni.Matcher(model='latin')

Single Pair Similarity

matcher.similarity('Alan', 'Al')
# 0.6838303319889133

matcher.similarity('Alan', 'Al', prob=False)
# 1

matcher.similarity('Alan Turing', 'Al Turing', surname_first=False)
# 0.6838303319889133

Record Linkage

import pandas as pd

df1 = pd.DataFrame({'name': ['Al', 'Mark', 'James', 'Harold']})
df2 = pd.DataFrame({'name': ['Mark', 'Alan', 'James', 'Harold']})

merged = matcher.fuzzymerge(df1, df2, how='left', on='name')

Name Deduplication and Normalization

names_list = ['Alan', 'Al', 'Al', 'James']

matcher.dedupe(names_list, keep='longest')
# ['Alan', 'James']

matcher.dedupe(names_list, keep='frequent')
# ['Al', 'James']

matcher.dedupe(names_list, keep='longest', replace=True)
# ['Alan', 'Alan', 'Alan', 'James']

Requirements

Python >=3.9
numpy
pandas
torch
joblib
unidecode
fuzzywuzzy
editdistance
abydos

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 79 Commits
.idea		.idea
dev		dev
hmni.egg-info		hmni.egg-info
hmni		hmni
tests		tests
Dockerfile		Dockerfile
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HMNI

Installation

Quick Usage Guide

Initialize a Matcher Object

Single Pair Similarity

Record Linkage

Name Deduplication and Normalization

Requirements

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

License

avieini-sunbit/hmni

Folders and files

Latest commit

History

Repository files navigation

HMNI

Installation

Quick Usage Guide

Initialize a Matcher Object

Single Pair Similarity

Record Linkage

Name Deduplication and Normalization

Requirements

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages