-
-
Notifications
You must be signed in to change notification settings - Fork 33k
Description
Feature or enhancement
Proposal:
TL;DR: I am not suggesting adding the assert at runtime. I am only suggesting that we avoid setting exceptions if another exception is already set, unless we explicitly clear that one before calling PyErr_Format
.
TL;DR 2: This is NOT an easy work and I plan to work on it myself.
I've seen many times where PyErr_*
functions are called to set an exception while another exception is already set. It doesn't cause a crash because we always clear the error indicator in PyErr_Format
since we need to use PyUnicode_FromFormatV
. However, this means that we may possibly be hiding some exceptions that actually shouldn't be hidden.
One example is PyLong_FromUnicodeObject
which calls PyLong_FromString
and sets an exception on failure that is identical to the one that could be set by PyLong_FromString
. In particular, we actually do not propagate a MemoryError from PyLong_FromString
(which happens if PyUnicode_FromStringAndSize
called in PyLong_FromString
fails).
I am currently investigating the other cases, but many tests would be failing if I add an assert(!PyErr_Occurred());
inside PyErr_Format()
(mostly because of double-setting the same exception). I would suggest that we first try to change the functions that actually set double exceptions, and then explicitly document that PyErr_Format
shouldn't be called with an exception set as it will replace the latter.
Analysis
I was more or less able to fix assert(!PyErr_Occurred());
and the patterns I've seen until are the following:
Exact matching exceptions overwritten
Sometimes, we explicitly match some exception types and then overwrite the exception. This happens in the numerical API with the _range_error
function.
I've created something locally for that one but won't open a PR for now. The idea is to just look at places that match some specific exceptions, clear it, and then set the new exception. The intent here was anyway to clear any existing exception.
Exceptions redundantly overwritten
Sometimes, we overwrite TypeError
or ValueError
exceptions with either the ... same exception message (because one C API function calls another C API, and both of them need to set exceptions as they can be used distinctly). The solution to that is to read the code and only set a "fresh" exception in the caller if no error is already set (it's cheaper than checking if the callee's exception matches a specific type).
Exceptions in Python/getargs.c
The parsing process in getargs.c
is messy on the exception handling side. Whenever there is an error, we use some converterr
function, which is only here for creating error messages. However, we don't "chain" possible exceptions here. If during the conversion, the C API is called and failed, then we overwrite that error with a more general error saying "cannot convert" or something like this. I think I'll need to rewrite the entire logic of setting exceptions to know whether we should set a new exception message or not depending on what we are currently parsing. This will be a separate PR.
Exceptions from PyBuffer_GetObject
The most common "bad" occurrences I found are related to the usage of PyBuffer_GetObject
. As per PEP-3118, any object implementing the Buffer API in C must use PyExc_BuffreError
if the request cannot be fulfilled and return -1 (and it's highlighted in the docs https://docs.python.org/3/c-api/buffer.html#c.PyObject_GetBuffer with a huge MUST, so it's not a "maybe", it's explicit). In other words, if a custom implementation uses the C API, all their exceptions must be converted into PyExc_BufferError
. I'm not sure that all implementations respect this, as it's easy to simply propagate the exception.
Unfortunately, almost all places that use PyBuffer_GetObject
and set an exception afterwards disregard a possible PyExc_BuffreError
that could have happened and overwrite any exception by usually raising a simple TypeError. Otherwise, they just return -1 and propagate any existing exception (which is fine). For those overwriting the exception, this is not good. Instead, I decided on the following approach:
- If
PyBuffer_GetObject
raised aTypeError
, we can safely overwrite it. We may lose information though if it was raised not from the base implementation ofPyBuffer_GetObject
but if it was raised from a custom implementation. However, there is currently no way to distinguish between an exception set outside CPython and an exception set by CPython. We could try checking against the matched messages but if we were to change the error message, then the entire machinery collapses. - If
PyBuffer_GetObject
raised a non-TypeError
exception and we actually overwrite with aTypeError
, I suggest adding that exception as a__context__
of thatTypeError
. There shouldn't be cycles here because theTypeError
would be newly created. I doubt people would be happy if I were to add some eager assert about the fact that onlyBufferError
can be raised fromPyBufferProcs.bf_getbuffer
so I'll abstain for now. Maybe adding a warning would be a good idea in the first place (but it may become annoying as most of the time, this warning would come up in extension modules that extensively use the Buffer API, and usually outside the user's control).
Has this already been discussed elsewhere?
This is a minor feature, which does not need previous discussion elsewhere
Links to previous discussion of this feature:
There was a previous suggestion here: #67759 where it was mentioned that "PyUnicode_FromFormatV()
must not be called with an exception set".