Skip to content

Commit 3491224

Browse files
committed
Unicode:UCD Clarify pod for num()
Add example of how it handles numbers that aren't decimal positional. I thought it would clarify things to expand and correct the flawed example pointed out in GH #23003.
1 parent f96e212 commit 3491224

File tree

5 files changed

+33
-7
lines changed

5 files changed

+33
-7
lines changed

charclass_invlists.h

Lines changed: 1 addition & 1 deletion
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

lib/Unicode/UCD.pm

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2496,6 +2496,32 @@ match them. A single-character string containing one of these digits will
24962496
have its decimal value returned by C<num>, but any longer string containing
24972497
only these digits will return C<undef>.
24982498
2499+
To illustrate further, the Rumi numeric symbols were used in centuries past in
2500+
and around North Africa and the Iberian peninsula. In order to be able to
2501+
digitize the many historical documents that use them, Unicode has encoded the
2502+
set. There is no character representing zero. There are characters for one
2503+
through nine, ten, twenty, and so forth. C<num> correctly returns the values
2504+
of these in isolation.
2505+
2506+
my $rumi_one = num("\N{RUMI DIGIT ONE}");
2507+
my $rumi_two = num("\N{RUMI DIGIT TWO}");
2508+
my $rumi_twenty = num("\N{RUMI NUMBER TWENTY}");
2509+
say "$rumi_one $rumi_two $rumi_twenty"; # 1 2 20
2510+
2511+
Because these do not follow modern decimal positional notation, stringing more
2512+
than one of these together doesn't mean what you likely would think it means.
2513+
So, C<num> correctly returns C<undef> if you try. If you add a second
2514+
parameter to get returned to you the initial legal numeric length, it returns
2515+
indicating that to get a numeric value, you'd have to stop after just the
2516+
first character in the string.
2517+
2518+
my $len;
2519+
my $value = num("\N{RUMI DIGIT ONE}\N{RUMI DIGIT TWO}", \$len);
2520+
say $len, " ", (defined $value) ? $value : "undef"; # 1 undef
2521+
2522+
How to represent numbers like twelve gets complicated, and Unicode doesn't
2523+
give any guidance, so C<num> can't either.
2524+
24992525
Strings of multiple sub- and superscripts are not recognized as numbers. You
25002526
can use either of the compatibility decompositions in Unicode::Normalize to
25012527
change these into digits, and then call C<num> on the result.

lib/unicore/uni_keywords.pl

Lines changed: 1 addition & 1 deletion
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

regcharclass.h

Lines changed: 3 additions & 3 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

uni_keywords.h

Lines changed: 2 additions & 2 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

0 commit comments

Comments
 (0)