@@ -25,80 +25,131 @@ Standard Annex #44, `"Unicode Character Database"
2525<https://www.unicode.org/reports/tr44/> `_. It defines the
2626following functions:
2727
28+ .. seealso ::
29+
30+ The :ref: `unicode-howto ` for more information about Unicode and how to use
31+ this module.
32+
2833
2934.. function :: lookup(name)
3035
3136 Look up character by name. If a character with the given name is found, return
3237 the corresponding character. If not found, :exc: `KeyError ` is raised.
38+ For example::
39+
40+ >>> unicodedata.lookup('LEFT CURLY BRACKET')
41+ '{'
42+
43+ The characters returned by this function are the same as those produced by
44+ ``\N `` escape sequence in string literals. For example::
45+
46+ >>> unicodedata.lookup('MIDDLE DOT') == '\N{MIDDLE DOT}'
47+ True
3348
3449 .. versionchanged :: 3.3
3550 Support for name aliases [# ]_ and named sequences [# ]_ has been added.
3651
3752
38- .. function :: name(chr[ , default] )
53+ .. function :: name(chr, default=None, / )
3954
4055 Returns the name assigned to the character *chr * as a string. If no
4156 name is defined, *default * is returned, or, if not given, :exc: `ValueError ` is
42- raised.
57+ raised. For example::
58+
59+ >>> unicodedata.name('½')
60+ 'VULGAR FRACTION ONE HALF'
61+ >>> unicodedata.name('\uFFFF', 'fallback')
62+ 'fallback'
4363
4464
45- .. function :: decimal(chr[ , default] )
65+ .. function :: decimal(chr, default=None, / )
4666
4767 Returns the decimal value assigned to the character *chr * as integer.
4868 If no such value is defined, *default * is returned, or, if not given,
49- :exc: `ValueError ` is raised.
69+ :exc: `ValueError ` is raised. For example::
5070
71+ >>> unicodedata.decimal('\N{ARABIC-INDIC DIGIT NINE}')
72+ 9
73+ >>> unicodedata.decimal('\N{SUPERSCRIPT NINE}', -1)
74+ -1
5175
52- .. function :: digit(chr[, default])
76+
77+ .. function :: digit(chr, default=None, /)
5378
5479 Returns the digit value assigned to the character *chr * as integer.
5580 If no such value is defined, *default * is returned, or, if not given,
5681 :exc: `ValueError ` is raised.
5782
83+ >>> unicodedata.digit(' \N{SUPERSCRIPT NINE} ' )
84+ 9
5885
59- .. function :: numeric(chr[ , default] )
86+ .. function :: numeric(chr, default=None, / )
6087
6188 Returns the numeric value assigned to the character *chr * as float.
6289 If no such value is defined, *default * is returned, or, if not given,
6390 :exc: `ValueError ` is raised.
6491
92+ >>> unicodedata.numeric(' ½' )
93+ 0.5
6594
6695.. function :: category(chr)
6796
6897 Returns the general category assigned to the character *chr * as
69- string.
98+ string. General category names consist of two letters.
99+ See the `General Category Values section of the Unicode Character
100+ Database documentation <https://www.unicode.org/reports/tr44/#General_Category_Values> `_
101+ for a list of category codes. For example::
102+
103+ >>> unicodedata.category('A') # 'L'etter, 'u'ppercase
104+ 'Lu'
70105
71106
72107.. function :: bidirectional(chr)
73108
74109 Returns the bidirectional class assigned to the character *chr * as
75110 string. If no such value is defined, an empty string is returned.
111+ See the `Bidirectional Class Values section of the Unicode Character
112+ Database <https://www.unicode.org/reports/tr44/#Bidi_Class_Values> `_
113+ documentation for a list of bidirectional codes. For example::
114+
115+ >>> unicodedata.bidirectional('\N{ARABIC-INDIC DIGIT SEVEN}') # 'A'rabic, 'N'umber
116+ 'AN'
76117
77118
78119.. function :: combining(chr)
79120
80121 Returns the canonical combining class assigned to the character *chr *
81122 as integer. Returns ``0 `` if no combining class is defined.
123+ See the `Canonical Combining Class Values section of the Unicode Character
124+ Database <www.unicode.org/reports/tr44/#Canonical_Combining_Class_Values> `_
125+ for more information.
82126
83127
84128.. function :: east_asian_width(chr)
85129
86130 Returns the east asian width assigned to the character *chr * as
87- string.
131+ string. For a list of widths and or more information, see the
132+ `Unicode Standard Annex #11 <https://www.unicode.org/reports/tr11/ >`_.
88133
89134
90135.. function :: mirrored(chr)
91136
92137 Returns the mirrored property assigned to the character *chr * as
93138 integer. Returns ``1 `` if the character has been identified as a "mirrored"
94- character in bidirectional text, ``0 `` otherwise.
139+ character in bidirectional text, ``0 `` otherwise. For example::
140+
141+ >>> unicodedata.mirrored('>')
142+ 1
95143
96144
97145.. function :: decomposition(chr)
98146
99147 Returns the character decomposition mapping assigned to the character
100148 *chr * as string. An empty string is returned in case no such mapping is
101- defined.
149+ defined. For example::
150+
151+ >>> unicodedata.decomposition('Ã')
152+ '0041 0303'
102153
103154
104155.. function :: normalize(form, unistr)
@@ -122,9 +173,9 @@ following functions:
122173 normally would be unified with other characters. For example, U+2160 (ROMAN
123174 NUMERAL ONE) is really the same thing as U+0049 (LATIN CAPITAL LETTER I).
124175 However, it is supported in Unicode for compatibility with existing character
125- sets (e.g. gb2312).
176+ sets (for example, gb2312).
126177
127- The normal form KD (NFKD) will apply the compatibility decomposition, i.e.
178+ The normal form KD (NFKD) will apply the compatibility decomposition, that is,
128179 replace all compatibility characters with their equivalents. The normal form KC
129180 (NFKC) first applies the compatibility decomposition, followed by the canonical
130181 composition.
@@ -154,24 +205,6 @@ In addition, the module exposes the following constant:
154205 Unicode database version 3.2 instead, for applications that require this
155206 specific version of the Unicode database (such as IDNA).
156207
157- Examples:
158-
159- >>> import unicodedata
160- >>> unicodedata.lookup(' LEFT CURLY BRACKET' )
161- '{'
162- >>> unicodedata.name(' /' )
163- 'SOLIDUS'
164- >>> unicodedata.decimal(' 9' )
165- 9
166- >>> unicodedata.decimal(' a' )
167- Traceback (most recent call last):
168- File "<stdin>", line 1, in <module>
169- ValueError: not a decimal
170- >>> unicodedata.category(' A' ) # 'L'etter, 'u'ppercase
171- 'Lu'
172- >>> unicodedata.bidirectional(' \u0660 ' ) # 'A'rabic, 'N'umber
173- 'AN'
174-
175208
176209.. rubric :: Footnotes
177210
0 commit comments