Skip to content

Commit 1b5d247

Browse files
minor #414 Support escaped unicode character sequences in idna test files (TRowbotham)
This PR was merged into the 1.26-dev branch. Discussion ---------- Support escaped unicode character sequences in idna test files The IDNA [test file](https://www.unicode.org/Public/idna/15.0.0/IdnaTestV2.txt) for Unicode 15.0.0 has started using escaped unicode character sequences. The [spec](https://www.unicode.org/reports/tr46/#Format) says that they can be in the form of either \uXXXX or \x{XXXX}. Additionally, remove a now irrelevant conditional that worked around a bug in previous test files. Commits ------- ca44f85 Support escaped unicode character sequences
2 parents 4475c7a + ca44f85 commit 1b5d247

File tree

1 file changed

+7
-11
lines changed

1 file changed

+7
-11
lines changed

tests/Intl/Idn/IdnTest.php

Lines changed: 7 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -81,7 +81,13 @@ public function getData()
8181
}
8282

8383
[$line] = explode('#', $line);
84-
[$source, $toUnicode, $toUnicodeStatus, $toAsciiN, $toAsciiNStatus, $toAsciiT, $toAsciiTStatus] = array_map('trim', explode(';', $line));
84+
[$source, $toUnicode, $toUnicodeStatus, $toAsciiN, $toAsciiNStatus, $toAsciiT, $toAsciiTStatus] = preg_replace_callback(
85+
'/\\\\(?:u([[:xdigit:]]{4})|x{([[:xdigit:]]{4})})/u',
86+
static function (array $matches): string {
87+
return mb_chr(hexdec($matches[1]), 'utf-8');
88+
},
89+
array_map('trim', explode(';', $line))
90+
);
8591

8692
if ('' === $toUnicode) {
8793
$toUnicode = $source;
@@ -182,16 +188,6 @@ public function testToAsciiTransitional($source, $toUnicode, $toUnicodeStatus, $
182188
$this->markTestSkipped('PHP Bug #72506.');
183189
}
184190

185-
// There is currently a bug in the test data, where it is expected that the following 2
186-
// source strings result in an empty string. However, due to the way the test files are setup
187-
// it currently isn't possible to represent an empty string as an expected value. So, we
188-
// skip these 2 problem tests. I have notified the Unicode Consortium about this and they
189-
// have passed the information along to the spec editors.
190-
// U+200C or U+200D
191-
if ("\xE2\x80\x8C" === $source || "\xE2\x80\x8D" === $source) {
192-
$toAsciiT = '';
193-
}
194-
195191
if ($toAsciiTStatus === []) {
196192
$this->assertSame($toAsciiT, $info['result']);
197193
$this->assertSame(0, $info['errors'], sprintf('Expected no errors, but found %d.', $info['errors']));

0 commit comments

Comments
 (0)