Skip to content

Conversation

@picnixz
Copy link
Member

@picnixz picnixz commented Aug 26, 2024

By adding a test, I found a bug in PyUnicodeEncodeError_GetStart where constructing an UnicodeEncodeError with start = 0 and an empty string object raises an assertion error. While this should not happen at runtime, this is something that should still be avoided.

Test coverage is currently insufficient I think but I want an initial feedback first.

EDIT 1: Tests that are known to crash due to UnicodeError's start handling are disabled. I'd like to merge this one even though UnicodeError crashes (I removed the temporary fix from this PR).

@skirpichev
Copy link
Contributor

I found a bug in PyUnicodeEncodeError_GetStart where constructing an UnicodeEncodeError with start = 0 and an empty string object raises an assertion error.

I think you should open an issue and create a separate pr with bugfix.

@picnixz
Copy link
Member Author

picnixz commented Aug 27, 2024

I think you should open an issue and create a separate pr with bugfix.

Good idea. I separated the commit so that it can easily be cherry-picked (hehe).

@picnixz picnixz requested a review from vstinner September 25, 2024 14:39
Copy link
Member

@vstinner vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A test does crash if run more than once.

$ ./python -m test -m test_codec_error_handlers test_capi.test_codecs  
0:00:00 load avg: 2.46 [1/2] test_capi.test_codecs
0:00:00 load avg: 2.46 [2/2] test_capi.test_codecs

Fatal Python error: Segmentation fault

Current thread 0x00007f8ceb3a3740 (most recent call first):
  File "/home/vstinner/python/main/Lib/test/test_capi/test_codecs.py", line 800 in test_codec_error_handlers
  File "/home/vstinner/python/main/Lib/unittest/case.py", line 606 in _callTestMethod
  (...)

I was trying to check if there are refleaks.

@picnixz
Copy link
Member Author

picnixz commented Sep 26, 2024

So we need a way to unregister the codecs. Otherwise I don't know how to do it... (would reloading the module work? or do you want an API for explicitly unregistering an error handler?)

@vstinner
Copy link
Member

an API for explicitly unregistering an error handler?

That would be nice, yes. It should be a separated PR if you want to work on that.

I'm not sure if it's strictly needed right now if your test creates/loads a separated extension just for C tests. But the test should not leak :-)

@picnixz
Copy link
Member Author

picnixz commented Sep 26, 2024

That would be nice, yes. It should be a separated PR if you want to work on that.

I'll work on that tomorrow (I wanted to work today but I have no energy at all).

I'm not sure if it's strictly needed right now if your test creates/loads a separated extension just for C tests

Before working on the above feature, I'll try to see how I can make it work without it. (at least we should be able to check if it leaks...).

@picnixz
Copy link
Member Author

picnixz commented Sep 27, 2024

So, the issue is with _testcapi.codec_namereplace_errors and the setUp(). I'll try to see what happens out there.

So something is wrong when I add those lines and uses -R ::

self.enterContext(import_helper.isolated_modules())
self.enterContext(import_helper.CleanImport('codecs'))

(Still searching where it leaks...)

So I managed to fix the leaks. The crash seems to be because of this line in PyCodec_NameReplaceErrors:

if (ucnhash_capi->getname(c, buffer, sizeof(buffer), 1))

I have no idea why no using fresh imports seem fine. However, the getname pointer is different just before it crashes.

@vstinner
Copy link
Member

Why do you need a fresh import?

@picnixz
Copy link
Member Author

picnixz commented Sep 27, 2024

I don't need them anymore so I can remove them normally. I'll now work on the PR for removing a registered error handler so that we can run the test suite. For now, the PR is not ready since there will be a leak due to the error handler being held.

@picnixz picnixz force-pushed the test/c-api-codec-111495 branch from 850a441 to 7be1f55 Compare September 29, 2024 08:00
@picnixz picnixz requested a review from a team as a code owner September 29, 2024 08:52
@picnixz picnixz requested review from vstinner and removed request for a team and erlend-aasland September 29, 2024 09:01
Copy link
Member

@vstinner vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@vstinner vstinner enabled auto-merge (squash) September 29, 2024 15:00
@vstinner vstinner merged commit 6d0d26e into python:main Sep 29, 2024
37 checks passed
@picnixz picnixz deleted the test/c-api-codec-111495 branch September 29, 2024 15:22
@picnixz
Copy link
Member Author

picnixz commented Sep 29, 2024

Mmh, apparently tests were backported for the other modules, but this one relies on the internal private feature of being able to unregister error handlers. I don't really want to add it 3.12 and 3.13 so I think we can keep these tests for 3.14+.

@vstinner
Copy link
Member

Yeah, it makes sense to not backport these tests.

@vstinner
Copy link
Member

Merged, thanks for this nice enhancement!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants