Skip to content

Proper handling of UTF-8 character in bitwise xor when using $1 #23552

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: blead
Choose a base branch
from

Conversation

jkeenan
Copy link
Contributor

@jkeenan jkeenan commented Aug 8, 2025

Adapt Tux's rt70652.pl in #9972 (comment).

Fixes GH #9972.


  • This set of changes does not require a perldelta entry.

@@ -19,7 +19,7 @@ use warnings;
# If you find tests are failing, please try adding names to tests to track
# down where the failure is, and supply your new names as a patch.
# (Just-in-time test naming)
plan tests => 510 + 6 * 2;
plan tests => 512 + 6 * 2;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The commit message is misleading.

The title is Proper handling of UTF-8 character in bitwise xor when using $1 but this commit doesn't fix anything.
It's adding tests for an issue that was fixed.

Ideally it would also link to the commit that fixed the issue but looking at #9972 that might not be easy to find because it was fixed in separate steps (#9972 (comment)) so might not be worth the effort.

At the very least adding something in the summary based on #9972 (comment) would be good so that there is a reference to when it was fixed (from a quick glance at the ticket: partially between 5.8 and 5.12 and fully fixed between 5.12 and 5.14)

Comment on lines +775 to +777
my $got = [@t];
my $exp = [1, 1, "", "", "", "", "", "", ""];
ok( eq_array($got, $exp), "GH 9972: no malformed UTF-8 character in bitwise xor");

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I'm not a fan of this construct.
It think it would be much nicer to do replace (for example):

   # Now we take 8 Bytes of a normal string with m/(.{8})/
    push @t, utf8::is_utf8 ($normalstring);

with:

   # Now we take 8 Bytes of a normal string with m/(.{8})/
    is(utf8::is_utf8 ($normalstring) ,1, "\$normalstring has the UTF-8 flag set");

now it all ends up in one big eq_array at the end which makes it difficult to trace back to what is happening.

Comment on lines +746 to +751
# $1 is assigned but not yet unicode: UTF8-Flag ($1)
push @t, utf8::is_utf8 ($1);

# After we copy $1 the Flag is on: UTF8-Flag ($1)
my $copy = $1;
push @t, utf8::is_utf8 ($1);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is confusing / contradictory..

These are the first two items in @t and should match the first two items in the $exp arrayref.
The $exp arrayref starts with: [1, 1, ...]
So both of the utf8::is_utf_8 call return 1?
Based on the comments about the block I would have expected these to be [0, 1, ...
The comments the first says: but not yet unicode while the second says: the Flag is on which to me implies a difference..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants