Skip to content

Conversation

@serhiy-storchaka
Copy link
Member

@serhiy-storchaka serhiy-storchaka commented Jul 27, 2025

On modern systems, the result of wcsxfrm() is much larger the size of the input string (from 4+2n on Windows to 4+5n on Linux for simple ASCII strings), so optimistic allocation of the buffer of the same size never works.

On modern systems, the result of wcsxfrm() is much larger the size of
the input string (from 4+2*n on Windows to 4+5*n on Linux for simple
ASCII strings), so optimistic allocation of the buffer of the same size
never works.
goto exit;
}

/* assume no change in size, first */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment should be updated to match changed code.

@picnixz picnixz changed the title Remove optimistic allocation in locale.strxfrm() gh-130567: Remove optimistic allocation in locale.strxfrm() Jul 27, 2025
@serhiy-storchaka
Copy link
Member Author

If this is a bug fix, it needs a NEWS entry. If the bug will be fixed in other way -- it is just cleanup and minor optimization not worth a NEWS entry.

Copy link
Member

@encukou encukou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not fix the bug; macOS raises EINVAL in wcsxfrm(NULL, s, 0) on the Czech and Chinese strings.

So, it's just cleanup and minor optimization.

@serhiy-storchaka
Copy link
Member Author

Actually, optimistic allocation works if the locale was not set or set to "C".

>>> import locale
>>> locale.strxfrm('abc')
'abc'
>>> locale.setlocale(locale.LC_ALL, 'C')
'C'
>>> locale.strxfrm('abc')
'abc'
>>> locale.setlocale(locale.LC_ALL, 'en_US.UTF-8')
'en_US.UTF-8'
>>> locale.strxfrm('abc')
'šŮŹ\x01\x1d\x1d\x1d\x01\x02\x02\x02\x01\x01悝\x01惹\x01愝'

But why would you use locale.strxfrm() in the C locale? And since the call of wcsxfrm() is cheap in the C locale, I believe that the loss in the worst case is less that the gain in average.

@serhiy-storchaka
Copy link
Member Author

This PR should fix a crash discussed in #130567 (comment). So this is a bug fix. If we are not going to backport it, we need another PR to fix it.

@encukou
Copy link
Member

encukou commented Sep 10, 2025

Let's backport it [edit: to 3.14.1], even if can't reproduce the corruption on my system.
“Fix possible crash in strxfrm” should be a good blurb?

@serhiy-storchaka
Copy link
Member Author

Created a simpler PR #138940 for the fix.

@encukou
Copy link
Member

encukou commented Oct 15, 2025

Do you want to update this one?

@encukou encukou merged commit 2a2bc82 into python:main Oct 16, 2025
43 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants