-
Notifications
You must be signed in to change notification settings - Fork 589
sv_vcatpvfn_flags: Use utf8_to_uv #23083
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I'm confused, you're calling utf8_to_uv_or_die(), which croaks on a decoding failure, but the commit message says it produces a replacement character instead. |
f34ffac
to
80ec811
Compare
It looks like I was the one who was confused. I hope it was just a copy paste error. Anyway, it doesn't return, but croaks in the face of malformed input. I'm now a bit reticent about doing this given that a couple of such commits had to be backed out. But I don't know that there is a decent recovery path forward here when this condition occurs. And there are already places in this function where it panics when it doesn't know how to go on |
If this and other similar changes remain I do think we do need some sort of perldelta entry. |
I agree. |
5.43.0 has been released; development on this p.r. may resume. |
@khwilliamson, it appears from your earlier comments that you're somewhat dubious about proceeding with this p.r. If that is so, would we better off closing this p.r. and opening a new one at an appropriate time? Thanks. |
This is a change in behavior for malformed input. Previously it warned and substituted 0; now it warns and substitutes the REPLACEMENT CHARACTER, which is the better choice.
80f3a41
to
e45ec1c
Compare
I changed it to not croak, but to insert the REPLACEMENT CHARACTER instead of a NUL. The new way is the Unicode-sanctioned behavior.. I didn't add a test, because this shouldn't happen except if someone fiddles with the internals. |
If you can add a link to that governance in Unicode, I think that would be good for future reference. |
This is a change in behavior for malformed input. Previously it warned and substituted 0; the warnings remain, but now it substitutes the REPLACEMENT CHARACTER.