@@ -2496,6 +2496,32 @@ match them. A single-character string containing one of these digits will
24962496have its decimal value returned by C<num > , but any longer string containing
24972497only these digits will return C<undef > .
24982498
2499+ To illustrate further, the Rumi numeric symbols were used in centuries past in
2500+ and around North Africa and the Iberian peninsula. In order to be able to
2501+ digitize the many historical documents that use them, Unicode has encoded the
2502+ set. There is no character representing zero. There are characters for one
2503+ through nine, ten, twenty, and so forth. C<num > correctly returns the values
2504+ of these in isolation.
2505+
2506+ my $rumi_one = num("\N{RUMI DIGIT ONE}");
2507+ my $rumi_two = num("\N{RUMI DIGIT TWO}");
2508+ my $rumi_twenty = num("\N{RUMI NUMBER TWENTY}");
2509+ say "$rumi_one $rumi_two $rumi_twenty"; # 1 2 20
2510+
2511+ Because these do not follow modern decimal positional notation, stringing more
2512+ than one of these together doesn't mean what you likely would think it means.
2513+ So, C<num > correctly returns C<undef > if you try. If you add a second
2514+ parameter to get returned to you the initial legal numeric length, it returns
2515+ indicating that to get a numeric value, you'd have to stop after just the
2516+ first character in the string.
2517+
2518+ my $len;
2519+ my $value = num("\N{RUMI DIGIT ONE}\N{RUMI DIGIT TWO}", \$len);
2520+ say $len, " ", (defined $value) ? $value : "undef"; # 1 undef
2521+
2522+ How to represent numbers like twelve gets complicated, and Unicode doesn't
2523+ give any guidance, so C<num > can't either.
2524+
24992525Strings of multiple sub- and superscripts are not recognized as numbers. You
25002526can use either of the compatibility decompositions in Unicode::Normalize to
25012527change these into digits, and then call C<num > on the result.
0 commit comments