Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion charclass_invlists.inc
Original file line number Diff line number Diff line change
Expand Up @@ -456662,7 +456662,7 @@ static const U8 WB_dfa_table[] = {
#endif /* defined(PERL_IN_REGEXEC_C) */

/* Generated from:
* 7229a97216f54f7d47d5cff56fc8dbc185dcfe40db20533f8034a1215af787fe lib/Unicode/UCD.pm
* b7f46fc1010fd83f5a678b268a23fef0142a18d0ab2a142edd0bb03328e667c3 lib/Unicode/UCD.pm
* 764f420cedfc8b43d9fec251c957a5d55fc45d40f6573f162990ed1dce7e36e0 lib/unicore/ArabicShaping.txt
* b8f32554c6f658821fb0ee742d21c5b1f2086b9bf13071fed04894b022f93d67 lib/unicore/BidiBrackets.txt
* d7afdadd1bbd66f5a663ac0e8f7958f18fd9491fc0bc59ec5877cb82db71db7d lib/unicore/BidiMirroring.txt
Expand Down
26 changes: 25 additions & 1 deletion lib/Unicode/UCD.pm
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ use warnings;
no warnings 'surrogate'; # surrogates can be inputs to this
use charnames ();

our $VERSION = '0.80';
our $VERSION = '0.81';

sub DEBUG () { 0 }
$|=1 if DEBUG;
Expand Down Expand Up @@ -2496,6 +2496,30 @@ match them. A single-character string containing one of these digits will
have its decimal value returned by C<num>, but any longer string containing
only these digits will return C<undef>.

To illustrate further, the Rumi numeric symbols were used in centuries past in
and around North Africa and the Iberian peninsula. In order to be able to
digitize the many historical documents that use them, Unicode has encoded the
set. There is no character representing zero. There are characters for one
through nine, ten, twenty, and so forth. C<num> correctly returns the values
of these in isolation.

my $rumi_one = num("\N{RUMI DIGIT ONE}");
my $rumi_two = num("\N{RUMI DIGIT TWO}");
my $rumi_twenty = num("\N{RUMI NUMBER TWENTY}");
say "$rumi_one $rumi_two $rumi_twenty"; # 1 2 20

Because these do not follow modern decimal positional notation, stringing more
than one of these together doesn't mean what you likely would think it means.
So, C<num> correctly returns C<undef> if you try. If you request the length
of the valid initial substring in this case, that length would be one.

my $len;
my $value = num("\N{RUMI DIGIT ONE}\N{RUMI DIGIT TWO}", \$len);
say $len, " ", (defined $value) ? $value : "undef"; # 1 undef

How to represent numbers like twelve gets complicated, and Unicode doesn't
give any guidance, so C<num> can't either.

Strings of multiple sub- and superscripts are not recognized as numbers. You
can use either of the compatibility decompositions in Unicode::Normalize to
change these into digits, and then call C<num> on the result.
Expand Down
2 changes: 1 addition & 1 deletion lib/unicore/uni_keywords.pl

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion regcharclass.h

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion regexp_constants.h
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@
#define MAX_FOLD_FROMS 3

/* Generated from:
* 7229a97216f54f7d47d5cff56fc8dbc185dcfe40db20533f8034a1215af787fe lib/Unicode/UCD.pm
* b7f46fc1010fd83f5a678b268a23fef0142a18d0ab2a142edd0bb03328e667c3 lib/Unicode/UCD.pm
* 764f420cedfc8b43d9fec251c957a5d55fc45d40f6573f162990ed1dce7e36e0 lib/unicore/ArabicShaping.txt
* b8f32554c6f658821fb0ee742d21c5b1f2086b9bf13071fed04894b022f93d67 lib/unicore/BidiBrackets.txt
* d7afdadd1bbd66f5a663ac0e8f7958f18fd9491fc0bc59ec5877cb82db71db7d lib/unicore/BidiMirroring.txt
Expand Down
2 changes: 1 addition & 1 deletion uni_keywords.h

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading