Skip to content

Conversation

@serhiy-storchaka
Copy link
Member

@serhiy-storchaka serhiy-storchaka commented May 12, 2025

If the error handler is used, a new bytes object is created to set as the object attribute of UnicodeDecodeError, and that bytes object then replaces the original data. A pointer to the decoded data will became invalid after destroying that temporary bytes object. So we need other way to return the first invalid escape from _PyUnicode_DecodeUnicodeEscapeInternal().

_PyBytes_DecodeEscape() does not have such issue, because it does not use the error handlers registry, but it should be changed for compatibility with _PyUnicode_DecodeUnicodeEscapeInternal().
(cherry picked from commit 9f69a58)

…der with an error handler (pythonGH-129648)

If the error handler is used, a new bytes object is created to set as
the object attribute of UnicodeDecodeError, and that bytes object then
replaces the original data. A pointer to the decoded data will became invalid
after destroying that temporary bytes object. So we need other way to return
the first invalid escape from _PyUnicode_DecodeUnicodeEscapeInternal().

_PyBytes_DecodeEscape() does not have such issue, because it does not
use the error handlers registry, but it should be changed for compatibility
with _PyUnicode_DecodeUnicodeEscapeInternal().
(cherry picked from commit 9f69a58)

Co-authored-by: Serhiy Storchaka <[email protected]>
@serhiy-storchaka
Copy link
Member Author

serhiy-storchaka commented May 13, 2025

Removing _PyUnicode_DecodeUnicodeEscapeInternal and _PyBytes_DecodeEscape breaks the C ABI. We have the following options:

  • Remove them anyway. They are not for use in the user code, and are guarded against this. You have to try hard to use it in your code.
  • Add stub functions which:
    • Always raise an exception.
    • Work as before, but may ignore invalid escape sequences (set first_escape_char to NULL).
    • Work as before, but may raise an exception if found an invalid escape sequence.
    • Work as before, but may set first_escape_char to some other point in input if found an invalid escape sequences.

May means that it depends on the error handler and if an encoding error happens before an invalid escape sequence.

@serhiy-storchaka serhiy-storchaka requested a review from Yhg1s May 13, 2025 13:37
@serhiy-storchaka
Copy link
Member Author

@Yhg1s, what do you prefer?

@encukou
Copy link
Member

encukou commented May 19, 2025

I'm not @Yhg1s, but, I'd vote for making them raise.

@serhiy-storchaka
Copy link
Member Author

The next beta is close, so I implemented a simple, but with minimal impact, solution. The users of _PyUnicode_DecodeUnicodeEscapeInternal() may lose some warnings if they use non-strict error handler. Nothing changed for users of _PyBytes_DecodeEscape().

Anything simpler can break the user code if they actually use these functions. Anything more complex increases probability of adding bugs in non-tested code.

Copy link
Member

@encukou encukou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good, thank you!

@encukou encukou merged commit 6279eb8 into python:3.13 May 20, 2025
39 checks passed
@miss-islington-app
Copy link

Thanks @serhiy-storchaka for the PR, and @encukou for merging it 🌮🎉.. I'm working now to backport this PR to: 3.9, 3.10, 3.11, 3.12.
🐍🍒⛏🤖

@miss-islington-app
Copy link

Sorry, @serhiy-storchaka and @encukou, I could not cleanly backport this to 3.12 due to a conflict.
Please backport using cherry_picker on command line.

cherry_picker 6279eb8c076d89d3739a6edb393e43c7929b429d 3.12

@miss-islington-app
Copy link

Sorry, @serhiy-storchaka and @encukou, I could not cleanly backport this to 3.11 due to a conflict.
Please backport using cherry_picker on command line.

cherry_picker 6279eb8c076d89d3739a6edb393e43c7929b429d 3.11

@miss-islington-app
Copy link

Sorry, @serhiy-storchaka and @encukou, I could not cleanly backport this to 3.10 due to a conflict.
Please backport using cherry_picker on command line.

cherry_picker 6279eb8c076d89d3739a6edb393e43c7929b429d 3.10

@miss-islington-app
Copy link

Sorry, @serhiy-storchaka and @encukou, I could not cleanly backport this to 3.9 due to a conflict.
Please backport using cherry_picker on command line.

cherry_picker 6279eb8c076d89d3739a6edb393e43c7929b429d 3.9

serhiy-storchaka added a commit to serhiy-storchaka/cpython that referenced this pull request May 20, 2025
…der with an error handler (pythonGH-129648) (pythonGH-133944)

If the error handler is used, a new bytes object is created to set as
the object attribute of UnicodeDecodeError, and that bytes object then
replaces the original data. A pointer to the decoded data will became invalid
after destroying that temporary bytes object. So we need other way to return
the first invalid escape from _PyUnicode_DecodeUnicodeEscapeInternal().

_PyBytes_DecodeEscape() does not have such issue, because it does not
use the error handlers registry, but it should be changed for compatibility
with _PyUnicode_DecodeUnicodeEscapeInternal().
(cherry picked from commit 9f69a58)
(cherry picked from commit 6279eb8)

Co-authored-by: Serhiy Storchaka <[email protected]>
@bedevere-app
Copy link

bedevere-app bot commented May 20, 2025

GH-134337 is a backport of this pull request to the 3.12 branch.

@bedevere-app bedevere-app bot removed the needs backport to 3.12 only security fixes label May 20, 2025
serhiy-storchaka added a commit to serhiy-storchaka/cpython that referenced this pull request May 20, 2025
…der with an error handler (pythonGH-129648) (pythonGH-133944)

If the error handler is used, a new bytes object is created to set as
the object attribute of UnicodeDecodeError, and that bytes object then
replaces the original data. A pointer to the decoded data will became invalid
after destroying that temporary bytes object. So we need other way to return
the first invalid escape from _PyUnicode_DecodeUnicodeEscapeInternal().

_PyBytes_DecodeEscape() does not have such issue, because it does not
use the error handlers registry, but it should be changed for compatibility
with _PyUnicode_DecodeUnicodeEscapeInternal().
(cherry picked from commit 9f69a58)
(cherry picked from commit 6279eb8)
(cherry picked from commit a75953b)

Co-authored-by: Serhiy Storchaka <[email protected]>
@bedevere-app
Copy link

bedevere-app bot commented May 20, 2025

GH-134341 is a backport of this pull request to the 3.11 branch.

@bedevere-app bedevere-app bot removed the needs backport to 3.11 only security fixes label May 20, 2025
@bedevere-app
Copy link

bedevere-app bot commented May 20, 2025

GH-134345 is a backport of this pull request to the 3.10 branch.

@bedevere-app bedevere-app bot removed the needs backport to 3.10 only security fixes label May 20, 2025
serhiy-storchaka added a commit to serhiy-storchaka/cpython that referenced this pull request May 20, 2025
…der with an error handler (pythonGH-129648) (pythonGH-133944)

If the error handler is used, a new bytes object is created to set as
the object attribute of UnicodeDecodeError, and that bytes object then
replaces the original data. A pointer to the decoded data will became invalid
after destroying that temporary bytes object. So we need other way to return
the first invalid escape from _PyUnicode_DecodeUnicodeEscapeInternal().

_PyBytes_DecodeEscape() does not have such issue, because it does not
use the error handlers registry, but it should be changed for compatibility
with _PyUnicode_DecodeUnicodeEscapeInternal().
(cherry picked from commit 9f69a58)
(cherry picked from commit 6279eb8)
(cherry picked from commit a75953b)
(cherry picked from commit 0c33e5b)

Co-authored-by: Serhiy Storchaka <[email protected]>
serhiy-storchaka added a commit to serhiy-storchaka/cpython that referenced this pull request May 20, 2025
…er with an error handler (pythonGH-129648) (pythonGH-133944)

If the error handler is used, a new bytes object is created to set as
the object attribute of UnicodeDecodeError, and that bytes object then
replaces the original data. A pointer to the decoded data will became invalid
after destroying that temporary bytes object. So we need other way to return
the first invalid escape from _PyUnicode_DecodeUnicodeEscapeInternal().

_PyBytes_DecodeEscape() does not have such issue, because it does not
use the error handlers registry, but it should be changed for compatibility
with _PyUnicode_DecodeUnicodeEscapeInternal().
(cherry picked from commit 9f69a58)
(cherry picked from commit 6279eb8)
(cherry picked from commit a75953b)
(cherry picked from commit 0c33e5b)
(cherry picked from commit 8b528ca)

Co-authored-by: Serhiy Storchaka <[email protected]>
@bedevere-app
Copy link

bedevere-app bot commented May 20, 2025

GH-134346 is a backport of this pull request to the 3.9 branch.

@bedevere-app bedevere-app bot removed the needs backport to 3.9 only security fixes label May 20, 2025
Yhg1s pushed a commit that referenced this pull request May 26, 2025
…th an error handler (GH-129648) (GH-133944) (#134337)

If the error handler is used, a new bytes object is created to set as
the object attribute of UnicodeDecodeError, and that bytes object then
replaces the original data. A pointer to the decoded data will became invalid
after destroying that temporary bytes object. So we need other way to return
the first invalid escape from _PyUnicode_DecodeUnicodeEscapeInternal().

_PyBytes_DecodeEscape() does not have such issue, because it does not
use the error handlers registry, but it should be changed for compatibility
with _PyUnicode_DecodeUnicodeEscapeInternal().
(cherry picked from commit 9f69a58)
(cherry picked from commit 6279eb8)
dakaneye added a commit to wolfi-dev/os that referenced this pull request May 29, 2025
## Summary
Fix use-after-free vulnerability in the unicode-escape decoder with
non-strict error handlers.

## Details
- **CVE**: CVE-2025-4516
- **Severity**: Medium
- **Issue**: Use-after-free crash when using
`bytes.decode("unicode_escape", error="ignore|replace")`

## Changes
- Add CVE-2025-4516.patch from upstream merged PRs
- Python 3.12: [PR
#134337](python/cpython#134337)
- Python 3.13: [PR
#133944](python/cpython#133944)
- Increment epoch to 2 for both packages

## Status
- ✅ Python 3.12: Upstream patch merged and applied
- ✅ Python 3.13: Upstream patch merged and applied
- ⏳ Python 3.9, 3.10, 3.11: Waiting for upstream PRs to be merged

## Testing
CI will validate that:
- Patches apply cleanly
- Packages build successfully
- Tests pass

## References
- [CVE-2025-4516
Details](https://www.cve.org/CVERecord?id=CVE-2025-4516)
- [Security
Advisory](https://mail.python.org/archives/list/[email protected]/thread/L75IPBBTSCYEF56I2M4KIW353BB3AY74/)
- Related to: chainguard-dev/internal-dev#12589
ambv pushed a commit that referenced this pull request Jun 2, 2025
…th an error handler (GH-129648) (GH-133944) (GH-134341)

If the error handler is used, a new bytes object is created to set as
the object attribute of UnicodeDecodeError, and that bytes object then
replaces the original data. A pointer to the decoded data will became invalid
after destroying that temporary bytes object. So we need other way to return
the first invalid escape from _PyUnicode_DecodeUnicodeEscapeInternal().

_PyBytes_DecodeEscape() does not have such issue, because it does not
use the error handlers registry, but it should be changed for compatibility
with _PyUnicode_DecodeUnicodeEscapeInternal().
(cherry picked from commit 9f69a58)
(cherry picked from commit 6279eb8)
(cherry picked from commit a75953b)

Co-authored-by: Serhiy Storchaka <[email protected]>
ambv pushed a commit that referenced this pull request Jun 2, 2025
…th an error handler (GH-129648) (GH-133944) (GH-134345)

If the error handler is used, a new bytes object is created to set as
the object attribute of UnicodeDecodeError, and that bytes object then
replaces the original data. A pointer to the decoded data will became invalid
after destroying that temporary bytes object. So we need other way to return
the first invalid escape from _PyUnicode_DecodeUnicodeEscapeInternal().

_PyBytes_DecodeEscape() does not have such issue, because it does not
use the error handlers registry, but it should be changed for compatibility
with _PyUnicode_DecodeUnicodeEscapeInternal().
(cherry picked from commit 9f69a58)
(cherry picked from commit 6279eb8)
(cherry picked from commit a75953b)
(cherry picked from commit 0c33e5b)

Co-authored-by: Serhiy Storchaka <[email protected]>
ambv pushed a commit that referenced this pull request Jun 2, 2025
…h an error handler (GH-129648) (GH-133944) (#134346)

* [3.9] gh-133767: Fix use-after-free in the unicode-escape decoder with an error handler (GH-129648) (GH-133944)

If the error handler is used, a new bytes object is created to set as
the object attribute of UnicodeDecodeError, and that bytes object then
replaces the original data. A pointer to the decoded data will became invalid
after destroying that temporary bytes object. So we need other way to return
the first invalid escape from _PyUnicode_DecodeUnicodeEscapeInternal().

_PyBytes_DecodeEscape() does not have such issue, because it does not
use the error handlers registry, but it should be changed for compatibility
with _PyUnicode_DecodeUnicodeEscapeInternal().
(cherry picked from commit 9f69a58)
(cherry picked from commit 6279eb8)
(cherry picked from commit a75953b)
(cherry picked from commit 0c33e5b)
(cherry picked from commit 8b528ca)

Co-authored-by: Serhiy Storchaka <[email protected]>
gentoo-bot pushed a commit to gentoo/cpython that referenced this pull request Jul 30, 2025
…h an error handler (pythonGH-129648) (pythonGH-133944)

If the error handler is used, a new bytes object is created to set as
the object attribute of UnicodeDecodeError, and that bytes object then
replaces the original data. A pointer to the decoded data will became invalid
after destroying that temporary bytes object. So we need other way to return
the first invalid escape from _PyUnicode_DecodeUnicodeEscapeInternal().

_PyBytes_DecodeEscape() does not have such issue, because it does not
use the error handlers registry, but it should be changed for compatibility
with _PyUnicode_DecodeUnicodeEscapeInternal().
(cherry picked from commit 9f69a58)
(cherry picked from commit 6279eb8)
(cherry picked from commit a75953b)
(cherry picked from commit 0c33e5b)

Co-authored-by: Serhiy Storchaka <[email protected]>
Part-of: python#134345
Closes: python#134345
Signed-off-by: Michał Górny <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

type-security A security issue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants