Skip to content

Commit 2d267da

Browse files
committed
improve string format handling of nickname punctuation #41
remove empty quotes and parenthesis so they can be included in the formatting string
1 parent 4d7f187 commit 2d267da

File tree

4 files changed

+114
-17
lines changed

4 files changed

+114
-17
lines changed

README.rst

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -47,15 +47,16 @@ Quick Start Example
4747
'Juan de la Vega'
4848

4949

50-
3 different comma placement variations are supported for the string that you pass.
50+
3 different comma placement variations are supported:
5151

5252
* Title Firstname "Nickname" Middle Middle Lastname Suffix
5353
* Lastname [Suffix], Title Firstname (Nickname) Middle Middle[,] Suffix [, Suffix]
5454
* Title Firstname M Lastname [Suffix], Suffix [Suffix] [, Suffix]
5555

56-
The parser does not make any attempt to clean the data. It mostly just splits on white
56+
The parser does not make any attempt to clean the input. It mostly just splits on white
5757
space and puts things in buckets based on their position in the string. This also means
58-
the difference between 'title' and 'suffix' is positional, not semantic. ("Pre-nominal"
58+
the difference between 'title' and 'suffix' is positional, not semantic. "Dr" is a title
59+
when it comes before the name and a suffix when it comes after. ("Pre-nominal"
5960
and "post-nominal" would probably be better names.)
6061

6162
::

docs/customize.rst

Lines changed: 43 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,10 @@ Pre-processing
55
Name buckets
66
++++++++++++++
77

8-
Each attribute has a corresponding ordered list of name pieces.
8+
Each attribute has a corresponding ordered list of name pieces. If you're doing
9+
pre- or post-processing you may wish to manipulate these lists directly.
10+
The strings returned by the attribute names just join these lists with spaces.
11+
912

1013
* o.title_list
1114
* o.first_list
@@ -14,9 +17,6 @@ Each attribute has a corresponding ordered list of name pieces.
1417
* o.suffix_list
1518
* o.nickname_list
1619

17-
If you're doing pre- or post-processing you may wish to manipulate these lists directly.
18-
The strings returned by the attribute names just join these lists with spaces.
19-
2020
::
2121

2222
>>> hn = HumanName("Juan Q. Xavier Velasquez y Garcia, Jr.")
@@ -45,6 +45,33 @@ directly to the attribute.
4545
nickname: ''
4646
]>
4747

48+
Controlling the string representation with string formatting
49+
============================================================
50+
51+
You can control which name fields are included in the `str()` representation of a `HumanName` instance by changing its `string_format` attribute. Don't want to include nicknames in your output? No problem.
52+
53+
::
54+
55+
>>> name = HumanName("Dr. Juan de la Vega (Doc Vega)")
56+
>>> str(name)
57+
'Dr. Juan de la Vega Doc Vega'
58+
>>> name.string_format = "{title} {first} {middle} {last}, {suffix}"
59+
>>> str(name)
60+
'Dr. Juan de la Vega'
61+
62+
Trailing commas and empty quotes and parenthesis are automatically removed.
63+
64+
::
65+
66+
>>> name = HumanName('Robert Johnson')
67+
>>> name.string_format = "{title} {first} {middle} {last} {suffix} ({nickname})"
68+
>>> str(name)
69+
'Robert Johnson'
70+
>>> name = HumanName('Robert "Rob" Johnson')
71+
>>> name.string_format = "{title} {first} {middle} {last} {suffix} ({nickname})"
72+
>>> str(name)
73+
'Robert Johnson (Rob)'
74+
4875

4976
Customizing the Parser with Your Own Configuration
5077
==================================================
@@ -54,17 +81,20 @@ matching the lower case characters of a name piece with pre-defined sets
5481
of strings located in :py:mod:`nameparser.config`. You can adjust
5582
these predefined sets to help fine tune the parser for your dataset.
5683

57-
Parser Constants:
84+
Editable CONSTANTS sets:
5885

59-
* `CONSTANTS.titles` - Pieces that come before the name. Cannot include things that may be first names
60-
* `CONSTANTS.first_name_titles` - Titles that, when followed by a single name, that name is a first name, e.g. "King David"
61-
* `CONSTANTS.suffix_acronyms` - Pieces that come at the end of the name that may or may not have periods separating the letters, e.g. "m.d."
62-
* `CONSTANTS.suffix_not_acronyms` - Pieces that come at the end of the name that never have periods separating the letters, e.g. "Jr."
63-
* `CONSTANTS.conjunctions` - Connectors like "and" that join the preceeding piece to the following piece.
64-
* `CONSTANTS.prefixes` - Connectors like "del" and "bin" that join to the following piece but not the preceeding
65-
* `CONSTANTS.capitalization_exceptions` - Dictionary of pieces that do not capitalize the first letter, e.g. "Ph.D"
66-
* `CONSTANTS.regexes` - Regular expressions used to find words, initials, nicknames, etc.
86+
* `titles` - Pieces that come before the name. Cannot include things that may be first names
87+
* `first_name_titles` - Titles that, when followed by a single name, that name is a first name, e.g. "King David"
88+
* `suffix_acronyms` - Pieces that come at the end of the name that may or may not have periods separating the letters, e.g. "m.d."
89+
* `suffix_not_acronyms` - Pieces that come at the end of the name that never have periods separating the letters, e.g. "Jr."
90+
* `conjunctions` - Connectors like "and" that join the preceeding piece to the following piece.
91+
* `prefixes` - Connectors like "del" and "bin" that join to the following piece but not the preceeding
92+
* `capitalization_exceptions` - Dictionary of pieces that do not capitalize the first letter, e.g. "Ph.D"
93+
* `regexes` - Regular expressions used to find words, initials, nicknames, etc.
6794

95+
Each set of constants comes with `add()` and `remove()` methods for tuning
96+
the constants for your project. These methods automatically lower case and
97+
remove punctuation to normalize them for comparison.
6898

6999
Changing the Parser Constants
70100
+++++++++++++++++++++++++++++++++

nameparser/parser.py

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -117,7 +117,10 @@ def __next__(self):
117117
def __unicode__(self):
118118
if self.string_format:
119119
# string_format = "{title} {first} {middle} {last} {suffix} ({nickname})"
120-
return self.collapse_whitespace(self.string_format.format(**self.as_dict())).strip(', ')
120+
_s = self.string_format.format(**self.as_dict())
121+
# remove trailing punctation from missing nicknames
122+
_s = _s.replace(" ()","").replace(" ''","").replace(' ""',"")
123+
return self.collapse_whitespace(_s).strip(', ')
121124
return " ".join(self)
122125

123126
def __str__(self):

tests.py

Lines changed: 63 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1433,6 +1433,13 @@ def test_suffix_with_double_comma_format(self):
14331433
self.m(hn.last, "Doe", hn)
14341434
self.m(hn.suffix, "jr., MD", hn)
14351435

1436+
@unittest.expectedFailure
1437+
def test_phd_with_erroneous_space(self):
1438+
hn = HumanName("John Smith, Ph. D.")
1439+
self.m(hn.first, "John", hn)
1440+
self.m(hn.last, "Smith", hn)
1441+
self.m(hn.suffix, "Ph. D.", hn)
1442+
14361443
#http://en.wikipedia.org/wiki/Ma_(surname)
14371444
def test_potential_suffix_that_is_also_last_name(self):
14381445
hn = HumanName("Jack Ma")
@@ -1750,13 +1757,69 @@ def test_downcasing_mc(self):
17501757

17511758

17521759
class HumanNameOutputFormatTests(HumanNameTestBase):
1760+
17531761
def test_formating(self):
17541762
hn = HumanName("Rev John A. Kenneth Doe III (Kenny)")
17551763
hn.string_format = "{title} {first} {middle} {last} {suffix} ({nickname})"
17561764
self.assertEqual(u(hn), "Rev John A. Kenneth Doe III (Kenny)")
17571765
hn.string_format = "{last}, {title} {first} {middle}, {suffix} ({nickname})"
17581766
self.assertEqual(u(hn), "Doe, Rev John A. Kenneth, III (Kenny)")
17591767

1768+
def test_quote_nickname_formating(self):
1769+
hn = HumanName("Rev John A. Kenneth Doe III (Kenny)")
1770+
hn.string_format = "{title} {first} {middle} {last} {suffix} '{nickname}'"
1771+
self.assertEqual(u(hn), "Rev John A. Kenneth Doe III 'Kenny'")
1772+
hn.string_format = "{last}, {title} {first} {middle}, {suffix} '{nickname}'"
1773+
self.assertEqual(u(hn), "Doe, Rev John A. Kenneth, III 'Kenny'")
1774+
1775+
def test_formating_removing_keys_from_format_string(self):
1776+
hn = HumanName("Rev John A. Kenneth Doe III (Kenny)")
1777+
hn.string_format = "{title} {first} {middle} {last} {suffix} '{nickname}'"
1778+
self.assertEqual(u(hn), "Rev John A. Kenneth Doe III 'Kenny'")
1779+
hn.string_format = "{last}, {title} {first} {middle}, {suffix}"
1780+
self.assertEqual(u(hn), "Doe, Rev John A. Kenneth, III")
1781+
hn.string_format = "{last}, {title} {first} {middle}"
1782+
self.assertEqual(u(hn), "Doe, Rev John A. Kenneth")
1783+
hn.string_format = "{last}, {first} {middle}"
1784+
self.assertEqual(u(hn), "Doe, John A. Kenneth")
1785+
hn.string_format = "{last}, {first}"
1786+
self.assertEqual(u(hn), "Doe, John")
1787+
hn.string_format = "{first} {last}"
1788+
self.assertEqual(u(hn), "John Doe")
1789+
1790+
def test_formating_removing_pieces_from_name_buckets(self):
1791+
hn = HumanName("Rev John A. Kenneth Doe III (Kenny)")
1792+
hn.string_format = "{title} {first} {middle} {last} {suffix} '{nickname}'"
1793+
self.assertEqual(u(hn), "Rev John A. Kenneth Doe III 'Kenny'")
1794+
hn.string_format = "{title} {first} {middle} {last} {suffix}"
1795+
self.assertEqual(u(hn), "Rev John A. Kenneth Doe III")
1796+
hn.middle=''
1797+
self.assertEqual(u(hn), "Rev John Doe III")
1798+
hn.suffix=''
1799+
self.assertEqual(u(hn), "Rev John Doe")
1800+
hn.title=''
1801+
self.assertEqual(u(hn), "John Doe")
1802+
1803+
def test_formating_of_nicknames_with_parenthesis(self):
1804+
hn = HumanName("Rev John A. Kenneth Doe III (Kenny)")
1805+
hn.string_format = "{title} {first} {middle} {last} {suffix} ({nickname})"
1806+
self.assertEqual(u(hn), "Rev John A. Kenneth Doe III (Kenny)")
1807+
hn.nickname=''
1808+
self.assertEqual(u(hn), "Rev John A. Kenneth Doe III")
1809+
1810+
def test_formating_of_nicknames_with_single_quotes(self):
1811+
hn = HumanName("Rev John A. Kenneth Doe III (Kenny)")
1812+
hn.string_format = "{title} {first} {middle} {last} {suffix} '{nickname}'"
1813+
self.assertEqual(u(hn), "Rev John A. Kenneth Doe III 'Kenny'")
1814+
hn.nickname=''
1815+
self.assertEqual(u(hn), "Rev John A. Kenneth Doe III")
1816+
1817+
def test_formating_of_nicknames_with_double_quotes(self):
1818+
hn = HumanName("Rev John A. Kenneth Doe III (Kenny)")
1819+
hn.string_format = "{title} {first} {middle} {last} {suffix} \"{nickname}\""
1820+
self.assertEqual(u(hn), "Rev John A. Kenneth Doe III \"Kenny\"")
1821+
hn.nickname=''
1822+
self.assertEqual(u(hn), "Rev John A. Kenneth Doe III")
17601823

17611824
TEST_NAMES = (
17621825
"John Doe",

0 commit comments

Comments
 (0)