Skip to content

Conversation

Fidget-Spinner
Copy link
Member

@Fidget-Spinner Fidget-Spinner commented Sep 21, 2025

@Fidget-Spinner
Copy link
Member Author

Does anyone know how to add a test for this? I don't see how as the normal interpreter doesn't crash with this.

Copy link
Member

@ZeroIntensity ZeroIntensity left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The core change looks good. I would just add a test case using something like the original repro. We shouldn't need anything from gettext or fancy Unicode characters to reproduce this, as long as the iterable to events has a reference count of 1 and the string inside it is not interned/immortal.

I think this only crashes with the JIT because the JIT avoids reference count increases, right? On the normal build, the interpreter is seemingly holding an extra reference to the event name on its stack, which keeps it alive just long enough to be used in the exception. That step is probably optimized out when the JIT is enabled, so the Py_DECREF(events_seq) will actually deallocate the string.

@@ -0,0 +1 @@
Fix use after free when handling unicode characters in ``xml.etree.ElementTree.iterparse``. Patch by Ken Jin.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some nitpicks:

Suggested change
Fix use after free when handling unicode characters in ``xml.etree.ElementTree.iterparse``. Patch by Ken Jin.
Fix use-after-free when handling unicode characters in :func:`xml.etree.ElementTree.iterparse`. Patch by Ken Jin.

This also isn't really specific to Unicode characters, but I don't mind keeping the mention.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe "reporting unknown event"?

@Fidget-Spinner
Copy link
Member Author

I think this only crashes with the JIT because the JIT avoids reference count increases, right? On the normal build, the interpreter is seemingly holding an extra reference to the event name on its stack, which keeps it alive just long enough to be used in the exception. That step is probably optimized out when the JIT is enabled, so the Py_DECREF(events_seq) will actually deallocate the string.

No ASAN triggers on the normal build as well it seems.

@ZeroIntensity
Copy link
Member

Yeah, that makes sense. The memory isn't freed on the default build (at least in the repro), because something else has an extra reference (likely the interpreter stack). It'd probably be possible to get a repro for the default build by constructing some iterator in C that the interpreter can't touch.

@vstinner
Copy link
Member

Does anyone know how to add a test for this? I don't see how as the normal interpreter doesn't crash with this.

The modified code path is already tested by these test_xml_etree_c tests:

  • test_iterparse()
  • test_unknown_event()

I don't know how to trigger a crash on these tests with the old (current) code. I don't think that it's worth it to write more tests.

Copy link
Member

@vstinner vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@serhiy-storchaka serhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add a test?

@vstinner
Copy link
Member

Could you please add a test?

As I wrote, there are already two tests on the modified code. It's just that it doesn't crash for subtle reasons (see previous comments).

@serhiy-storchaka
Copy link
Member

Something like this:

diff --git a/Lib/test/test_xml_etree.py b/Lib/test/test_xml_etree.py
index bf6d5074fde..f65baa0cfae 100644
--- a/Lib/test/test_xml_etree.py
+++ b/Lib/test/test_xml_etree.py
@@ -1749,6 +1749,8 @@ def __next__(self):
     def test_unknown_event(self):
         with self.assertRaises(ValueError):
             ET.XMLPullParser(events=('start', 'end', 'bogus'))
+        with self.assertRaisesRegex(ValueError, "unknown event 'bogus'"):
+            ET.XMLPullParser(events=(x.decode() for x in (b'start', b'end', b'bogus')))
 
     @unittest.skipIf(pyexpat.version_info < (2, 6, 0),
                      f'Expat {pyexpat.version_info} does not '

Copy link
Member

@ZeroIntensity ZeroIntensity left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, LGTM as well.

Copy link
Member

@serhiy-storchaka serhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. 👍

@serhiy-storchaka serhiy-storchaka merged commit c86eb4d into python:main Sep 30, 2025
45 checks passed
@serhiy-storchaka serhiy-storchaka added the needs backport to 3.14 bugs and security fixes label Sep 30, 2025
@miss-islington-app
Copy link

Thanks @Fidget-Spinner for the PR, and @serhiy-storchaka for merging it 🌮🎉.. I'm working now to backport this PR to: 3.14.
🐍🍒⛏🤖

@serhiy-storchaka serhiy-storchaka added the needs backport to 3.13 bugs and security fixes label Sep 30, 2025
miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Sep 30, 2025
@miss-islington-app
Copy link

Thanks @Fidget-Spinner for the PR, and @serhiy-storchaka for merging it 🌮🎉.. I'm working now to backport this PR to: 3.13.
🐍🍒⛏🤖

miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Sep 30, 2025
@bedevere-app
Copy link

bedevere-app bot commented Sep 30, 2025

GH-139455 is a backport of this pull request to the 3.14 branch.

@bedevere-app bedevere-app bot removed the needs backport to 3.14 bugs and security fixes label Sep 30, 2025
@bedevere-app
Copy link

bedevere-app bot commented Sep 30, 2025

GH-139456 is a backport of this pull request to the 3.13 branch.

@bedevere-app bedevere-app bot removed the needs backport to 3.13 bugs and security fixes label Sep 30, 2025
serhiy-storchaka pushed a commit that referenced this pull request Sep 30, 2025
@serhiy-storchaka
Copy link
Member

I just noticed that the NEWS entry was added in the wrong section. This is not a builtin and not a part of the interpreter core. @Fidget-Spinner, could you please create a PR for moving it in the "Library" section?

@Fidget-Spinner
Copy link
Member Author

@serhiy-storchaka sorry I didnt realize either. Will fix it tomorrow

@Fidget-Spinner Fidget-Spinner deleted the fix-elm-uaf branch October 1, 2025 08:48
serhiy-storchaka pushed a commit to miss-islington/cpython that referenced this pull request Oct 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants