@@ -25,80 +25,133 @@ Standard Annex #44, `"Unicode Character Database"
2525<https://www.unicode.org/reports/tr44/> `_. It defines the
2626following functions:
2727
28+ .. seealso ::
29+
30+ The :ref: `unicode-howto ` for more information about Unicode and how to use
31+ this module.
32+
2833
2934.. function :: lookup(name)
3035
3136 Look up character by name. If a character with the given name is found, return
3237 the corresponding character. If not found, :exc: `KeyError ` is raised.
38+ For example::
39+
40+ >>> unicodedata.lookup('LEFT CURLY BRACKET')
41+ '{'
42+
43+ The characters returned by this function are the same as those produced by
44+ ``\N `` escape sequence in string literals. For example::
45+
46+ >>> unicodedata.lookup('MIDDLE DOT') == '\N{MIDDLE DOT}'
47+ True
3348
3449 .. versionchanged :: 3.3
3550 Support for name aliases [# ]_ and named sequences [# ]_ has been added.
3651
3752
38- .. function :: name(chr[ , default] )
53+ .. function :: name(chr, default=None, / )
3954
4055 Returns the name assigned to the character *chr * as a string. If no
4156 name is defined, *default * is returned, or, if not given, :exc: `ValueError ` is
42- raised.
57+ raised. For example::
58+
59+ >>> unicodedata.name('½')
60+ 'VULGAR FRACTION ONE HALF'
61+ >>> unicodedata.name('\uFFFF', 'fallback')
62+ 'fallback'
4363
4464
45- .. function :: decimal(chr[ , default] )
65+ .. function :: decimal(chr, default=None, / )
4666
4767 Returns the decimal value assigned to the character *chr * as integer.
4868 If no such value is defined, *default * is returned, or, if not given,
49- :exc: `ValueError ` is raised.
69+ :exc: `ValueError ` is raised. For example::
5070
71+ >>> unicodedata.decimal('\N{ARABIC-INDIC DIGIT NINE}')
72+ 9
73+ >>> unicodedata.decimal('\N{SUPERSCRIPT NINE}', -1)
74+ -1
5175
52- .. function :: digit(chr[, default])
76+
77+ .. function :: digit(chr, default=None, /)
5378
5479 Returns the digit value assigned to the character *chr * as integer.
5580 If no such value is defined, *default * is returned, or, if not given,
56- :exc: `ValueError ` is raised.
81+ :exc: `ValueError ` is raised::
82+
83+ >>> unicodedata.digit('\N{SUPERSCRIPT NINE}')
84+ 9
5785
5886
59- .. function :: numeric(chr[ , default] )
87+ .. function :: numeric(chr, default=None, / )
6088
6189 Returns the numeric value assigned to the character *chr * as float.
6290 If no such value is defined, *default * is returned, or, if not given,
63- :exc: `ValueError ` is raised.
91+ :exc: `ValueError ` is raised::
92+
93+ >>> unicodedata.numeric('½')
94+ 0.5
6495
6596
6697.. function :: category(chr)
6798
6899 Returns the general category assigned to the character *chr * as
69- string.
100+ string. General category names consist of two letters.
101+ See the `General Category Values section of the Unicode Character
102+ Database documentation <https://www.unicode.org/reports/tr44/#General_Category_Values> `_
103+ for a list of category codes. For example::
104+
105+ >>> unicodedata.category('A') # 'L'etter, 'u'ppercase
106+ 'Lu'
70107
71108
72109.. function :: bidirectional(chr)
73110
74111 Returns the bidirectional class assigned to the character *chr * as
75112 string. If no such value is defined, an empty string is returned.
113+ See the `Bidirectional Class Values section of the Unicode Character
114+ Database <https://www.unicode.org/reports/tr44/#Bidi_Class_Values> `_
115+ documentation for a list of bidirectional codes. For example::
116+
117+ >>> unicodedata.bidirectional('\N{ARABIC-INDIC DIGIT SEVEN}') # 'A'rabic, 'N'umber
118+ 'AN'
76119
77120
78121.. function :: combining(chr)
79122
80123 Returns the canonical combining class assigned to the character *chr *
81124 as integer. Returns ``0 `` if no combining class is defined.
125+ See the `Canonical Combining Class Values section of the Unicode Character
126+ Database <www.unicode.org/reports/tr44/#Canonical_Combining_Class_Values> `_
127+ for more information.
82128
83129
84130.. function :: east_asian_width(chr)
85131
86132 Returns the east asian width assigned to the character *chr * as
87- string.
133+ string. For a list of widths and or more information, see the
134+ `Unicode Standard Annex #11 <https://www.unicode.org/reports/tr11/ >`_.
88135
89136
90137.. function :: mirrored(chr)
91138
92139 Returns the mirrored property assigned to the character *chr * as
93140 integer. Returns ``1 `` if the character has been identified as a "mirrored"
94- character in bidirectional text, ``0 `` otherwise.
141+ character in bidirectional text, ``0 `` otherwise. For example::
142+
143+ >>> unicodedata.mirrored('>')
144+ 1
95145
96146
97147.. function :: decomposition(chr)
98148
99149 Returns the character decomposition mapping assigned to the character
100150 *chr * as string. An empty string is returned in case no such mapping is
101- defined.
151+ defined. For example::
152+
153+ >>> unicodedata.decomposition('Ã')
154+ '0041 0303'
102155
103156
104157.. function :: normalize(form, unistr)
@@ -122,9 +175,9 @@ following functions:
122175 normally would be unified with other characters. For example, U+2160 (ROMAN
123176 NUMERAL ONE) is really the same thing as U+0049 (LATIN CAPITAL LETTER I).
124177 However, it is supported in Unicode for compatibility with existing character
125- sets (e.g. gb2312).
178+ sets (for example, gb2312).
126179
127- The normal form KD (NFKD) will apply the compatibility decomposition, i.e.
180+ The normal form KD (NFKD) will apply the compatibility decomposition, that is,
128181 replace all compatibility characters with their equivalents. The normal form KC
129182 (NFKC) first applies the compatibility decomposition, followed by the canonical
130183 composition.
@@ -133,6 +186,7 @@ following functions:
133186 a human reader, if one has combining characters and the other
134187 doesn't, they may not compare equal.
135188
189+
136190.. function :: is_normalized(form, unistr)
137191
138192 Return whether the Unicode string *unistr * is in the normal form *form *. Valid
@@ -154,24 +208,6 @@ In addition, the module exposes the following constant:
154208 Unicode database version 3.2 instead, for applications that require this
155209 specific version of the Unicode database (such as IDNA).
156210
157- Examples:
158-
159- >>> import unicodedata
160- >>> unicodedata.lookup(' LEFT CURLY BRACKET' )
161- '{'
162- >>> unicodedata.name(' /' )
163- 'SOLIDUS'
164- >>> unicodedata.decimal(' 9' )
165- 9
166- >>> unicodedata.decimal(' a' )
167- Traceback (most recent call last):
168- File "<stdin>", line 1, in <module>
169- ValueError: not a decimal
170- >>> unicodedata.category(' A' ) # 'L'etter, 'u'ppercase
171- 'Lu'
172- >>> unicodedata.bidirectional(' \u0660 ' ) # 'A'rabic, 'N'umber
173- 'AN'
174-
175211
176212.. rubric :: Footnotes
177213
0 commit comments