- 
          
- 
                Notifications
    You must be signed in to change notification settings 
- Fork 33.2k
gh-133465: Allow PyErr_CheckSignals to be called without holding the GIL. #133466
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool. If this change has little impact on Python users, wait for a maintainer to apply the  | 
| :c:func:`PyErr_CheckSignals` has been changed to acquire the global | ||
| interpreter lock (GIL) itself, only when necessary (i.e. when it has work to | ||
| do). This means that modules that perform lengthy computations with the GIL | ||
| released may now call :c:func:`PyErr_CheckSignals` during those computations | ||
| without re-acquiring the GIL first. (However, it must be *safe to* acquire | ||
| the GIL at each point where :c:func:`PyErr_CheckSignals` is called. Also, | ||
| keep in mind that it can run arbitrary Python code before returning to you.) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A NEWS entry should be more concise, users can refer to docs for in depth explanations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this better?
:c:func:`PyErr_CheckSignals` has been made safe to call without holding the GIL.
It will acquire the GIL itself when it needs it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A NEWS entry should be more concise, users can refer to docs for in depth explanations.
@StanFromIreland, I disagree. AFAIK you aren't a core dev or triager, I wish you wouldn't give other contributors questionable advice without a reference in such an authoritive-sounding way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would update this and focus on how the new API differs from the old one: IIUC, "can be called without the GIL [or whatever PC phrasing] and doesn't acquire it unless a signal's handler needs to be called."
| I am unable to reproduce the failure of  | 
Compiled-code modules that implement time-consuming operations that don’t require manipulating Python objects, are supposed to call PyErr_CheckSignals frequently throughout each such operation, so that if the user interrupts the operation with control-C, it is cancelled promptly. In the normal case where no signals are pending, PyErr_CheckSignals is cheap; however, callers must hold the GIL, and compiled-code modules that implement time-consuming operations are also supposed to release the GIL during each such operation. The overhead of reclaiming the GIL in order to call PyErr_CheckSignals, and then releasing it again, sufficiently often for reasonable user responsiveness, can be substantial. If my understanding of the thread-state rules is correct, PyErr_CheckSignals only *needs* the GIL if it has work to do. *Checking* whether there is a pending signal, or a pending request to run the cycle collector, requires only a couple of atomic loads. Therefore: Reorganize the logic of PyErr_CheckSignals and its close relatives (_PyErr_CheckSignals and _PyErr_CheckSignalsTstate) so that all the “do we have anything to do” checks are done in a batch before anything that needs the GIL. If any of them are true, acquire the GIL, repeat the check (because another thread could have stolen the event while we were waiting for the GIL), and then actually do the work, enabling callers to *not* hold the GIL. (There are some fine details here that I’d really appreciate a second pair of eyes on — see the comments in the new functions _PyErr_CheckSignalsHoldingGIL and _PyErr_CheckSignalsNoGIL.)
The source tree contains dozens of loops of this form:
    int res;
    do {
        Py_BEGIN_ALLOW_THREADS
        res = some_system_call(arguments...);
        Py_END_ALLOW_THREADS
    } while (res < 0 && errno == EINTR && !PyErr_CheckSignals());
Now that it is possible to call PyErr_CheckSignals without holding the
GIL, the locking operations can be moved out of the loop:
    Py_BEGIN_ALLOW_THREADS
    do {
        res = some_system_call(arguments...);
    } while (res < 0 && errno == EINTR && !PyErr_CheckSignals());
    Py_END_ALLOW_THREADS
This demonstrates the motivation for making it possible to call
PyErr_CheckSignals without holding the GIL.  It shouldn’t make any
measurable difference performance-wise for _these_ loops, which almost
never actually cycle; but for loops that do cycle many times it’s very
much desirable to not take and release the GIL every time through.
In some cases I also moved uses of _Py_(BEGIN|END)_SUPPRESS_IPH, which
is often paired with Py_(BEGIN|END)_ALLOW_THREADS, to keep the pairing
intact.  It was already considered safe to call PyErr_CheckSignals
from both inside and outside an IPH suppression region.
More could be done in this vein: I didn’t change any loops where the
inside of the loop was more complicated than a single system call,
_except_ that I did refactor py_getentropy and py_getrandom (in
bootstrap_hash.c) to make it possible to move the unlock and lock
outside the loop, demonstrating a more complicated case.
    40097b8    to
    3e25a93      
    Compare
  
    There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't fully decided how I feel about this yet. I agree with the motivation, but PyGILState_Ensure is evil. We might be able to sidestep most of the issues here, though (for one, signal handling isn't done in subinterpreters, so we don't have to worry about interpreter-guessing issues).
My main concern is that we're changing something that's in the stable ABI. That's generally a big no-no, because those are supposed to have a "frozen" interface. We might want this in a new API (e.g., something like PyErr_CheckSignalsFast).
| /* FIXME: Given that we already have 'tstate', is there a more efficient | ||
| way to do this? */ | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you want PyThreadState_Swap(tstate).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The documentation for that function says "Swap the current thread state with the thread state given by the argument tstate, which may be NULL. The global interpreter lock must be held and is not released." So that really doesn't sound like what I should be using here.
What I think I need is more like PyEval_RestoreThread(tstate), except that (in GIL builds) if tstate already holds the GIL it should not deadlock attempting to acquire it again, and (in free-threaded builds) whatever the equivalent of that statement is.  (I am only just now beginning to learn how free-threaded mode works.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The 3.13 docs for PyThreadState_Swap are wrong!
Ignore the phrase "holds the GIL". Read it as "hold an attached thread state". You always need a thread state to call the C API, in both FT and GIL-icious builds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zackw Please fix this.
        
          
                Modules/signalmodule.c
              
                Outdated
          
        
      | Determine whether there is actually any work needing to be done. | ||
| If so, acquire the GIL if necessary, and do that work. */ | ||
| static int | ||
| _PyErr_CheckSignalsNoGIL(PyThreadState *tstate, bool cycle_collect) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bikeshedding: let's call this "NoTstate", because you'll still need a thread state on free-threaded builds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a little weird to name a function BlahNoTstate when its first argument is a tstate.  How about _PyErr_CheckSignals_MaybeDetached for this one and _PyErr_CheckSignals_Attached for the one that does require an attached thread state to call?
(Also, given that these are static functions, possibly they should be named more like check_signals_maybe_detached and check_signals_attached?  I do not fully grok the coding style in this file.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, whatever. Just don't call it "GIL" :)
| /* If this thread does not have a thread state at all, then it has | ||
| never been associated with the Python runtime, so it should not | ||
| attempt to handle signals or run the cycle collector. */ | ||
| if (!tstate) { | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure we should ignore this. This seems like blatant misuse--either return a failure or emit a fatal error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All callers of this function get the tstate argument from PyGILState_GetThisThreadState().  I could be wrong about this, but I got the impression that that function could return NULL under circumstances where it would not be misuse to call PyErr_CheckSignals, such as the thread state merely having been set to NULL via PyThreadState_Swap.  Perhaps it is the comment that is wrong?  Anyway, the contract of PyErr_CheckSignals has always been that it fails if and only if a Python signal handler raised an exception, so it seemed safest to me to treat "we don't have a thread state" as "nothing to do" rather than some form of failure.  I'm happy to be persuaded otherwise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PyGILState_GetThisThreadState only returns NULL when the thread hasn't ever had a thread state. That hasn't ever been supported for PyErr_CheckSignals, which is why I'm worried about implicitly failing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks all right to me in the new version of the PR -- the new function might be called under those circumstances, and no tstate means that it's definitely not the main thread of the main interpreter, so ignoring the call feels like the right thing to do. Right?
| TSan failure looks unrelated. I've restarted the job for you. | 
| @ZeroIntensity wrote: 
 Yikes. I knew it could be trouble but I didn't know it could be that bad. I'm still reading the PEP and will have more to say once I have finished. 
 I would argue that relaxing a requirement is not a breaking change, but I'm willing to give the function a new name if you all think it is necessary. | 
| So I think PEP 788 is trying to solve a related problem with thread states to the one I have in this PR, but it doesn't directly address my problem. I set out to make       int res;
+    Py_BEGIN_ALLOW_THREADS
     do {
-        Py_BEGIN_ALLOW_THREADS
         res = some_system_call(arguments...);
-        Py_END_ALLOW_THREADS
     } while (res < 0 && errno == EINTR && !PyErr_CheckSignals());
+    Py_END_ALLOW_THREADSto some but not all of the loops with that structure in the CPython codebase. 
 But  Tangentially, part of the problem is that      int res;
    Py_WITH_DETACHED_TSTATE (ts) {
        do {
            res = some_system_call(arguments...);
         } while (res < 0 && errno == EINTR
                  && !PyErr_CheckSignalsDetached(ts));
    }(A macro like this can be implemented in plain C by off-label use of a  I'm open to helping with work toward this end, but I would very much like to find a way to do it independently of this PR, which I would like to keep focused on the goal described in #133465. | 
| I think we don't have to worry about the cross-interpreter problems because subinterpreters can't handle signals anyway--that's a job for the main thread and main interpreter. My approach would be to check  | 
| The main issue with this kind of extension is that code tested with new versions of Python will fail in older ones. | 
| So I think we have consensus that this PR should introduce a new function rather than change the semantics of an existing one.  I propose to call the new function  I see some related design issues that need to be resolved, though: 
 | 
| .. note:: | ||
| Any code that executes for a long time without returning to the | ||
| Python interpreter should call :c:func:`PyErr_CheckSignals()` | ||
| at reasonable intervals (at least once a millisecond) so that | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is too frequent for my taste. Use interactions feel "delay-less" under 10 msec, and for ^C 100 even msec feels very quick, and 1sec would still be acceptable. After a few seconds I would hit ^C again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is briefly discussed in the talk.  If you add enough PyErr_CheckSignals calls to your extension module that every long-running loop can be interrupted, in practice this makes you do the check way too often, once every few tens of micro seconds. This is fine with the hypothetical new PyErr_CheckSignals_Detached, but with the old version, where you have to reclaim the GIL, it's way too costly.  So in the talk I recommended looking at the actual system clock (with clock_gettime) and only calling PyErr_CheckSignals if a millisecond or more had gone by.  That's faster than required for human responsiveness, yes, but the remaining overhead is the overhead of looking at the clock and you can't reduce that by doing the actual calls even less often.  So that's where "at least once a millisecond" came from.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe explain that better in the docs then.
| Let's open an issue with the C API working group so they can bicker about bikeshedding and the API. | 
| In person at the sprints, Guido suggested that this PR should be split up into three: one each for the core functional changes, the docs changes, and the "look what we can do now" changes to various stdlib modules. I'll make that happen, but not until Wednesday. Meantime, can we please focus discussion on the list of questions I posted? @ZeroIntensity I'd appreciate it if you could open that issue; I'm tired enough that I'm making foolish mistakes. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have some nits, but generally I am happy with the new API.
@zackw, please split the PR up in three:
- 
This PR, keep only the changes to signalmodule.c (and a few needed changes including NEWS). Also add brief docs for the new API, emphasizing that it should be called without holding the GIL. (You may or may not care about the whatsnew/3.15.rst file, other will eventually update it.) 
- 
A separate PR, attached to the same issue, showing the various ways that using the new API can make things better. 
- 
A third PR that revamps the docs, linked to the new issue you created about that. 
- 
A doc-only PR 
| :c:func:`PyErr_CheckSignals` has been changed to acquire the global | ||
| interpreter lock (GIL) itself, only when necessary (i.e. when it has work to | ||
| do). This means that modules that perform lengthy computations with the GIL | ||
| released may now call :c:func:`PyErr_CheckSignals` during those computations | ||
| without re-acquiring the GIL first. (However, it must be *safe to* acquire | ||
| the GIL at each point where :c:func:`PyErr_CheckSignals` is called. Also, | ||
| keep in mind that it can run arbitrary Python code before returning to you.) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would update this and focus on how the new API differs from the old one: IIUC, "can be called without the GIL [or whatever PC phrasing] and doesn't acquire it unless a signal's handler needs to be called."
| #endif | ||
|  | ||
| /* It is necessary to repeat all of the checks of global flags | ||
| that were done in _PyErr_CheckSignalsNoGIL. At the time of | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That function does not exist -- not sure what you meant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Editing mistake, will fix.
| checks and when we acquired the GIL, some other thread may have | ||
| processed the events that were flagged. Since we now hold the | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No other thread could have processed the events (only the main thread of the main interpreter can) but new events might have been added.
| checks and when we acquired the GIL, some other thread may have | |
| processed the events that were flagged. Since we now hold the | |
| checks and when we acquired the GIL, events may have been | |
| flagged or unflagged. Since we now hold the | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will correct.
| be the thread state for the current thread, and it must be attached. */ | ||
| static int | ||
| check_signals_attached(PyThreadState *tstate, bool cycle_collect) | ||
| { | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possibly start with assert(tstate != NULL); ???
| /* If this thread does not have a thread state at all, then it has | ||
| never been associated with the Python runtime, so it should not | ||
| attempt to handle signals or run the cycle collector. */ | ||
| if (!tstate) { | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks all right to me in the new version of the PR -- the new function might be called under those circumstances, and no tstate means that it's definitely not the main thread of the main interpreter, so ignoring the call feels like the right thing to do. Right?
| VERIFYME: I *think* every piece of this expression is safe to | ||
| execute without holding the GIL and is already sufficiently | ||
| atomic. */ | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I double checked and it looks safe to me, so you can drop this comment.
| /* FIXME: Given that we already have 'tstate', is there a more efficient | ||
| way to do this? */ | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zackw Please fix this.
| @zackw: Sorry we didn't get a chance to say goodbye at the sprint. Let me try answering your questions, briefly. 
 I agree with your current choice to not ask for a thread state -- we want the function to be as convenient as possible, and to add minimum friction. 
 I think only without. I can imagine many patterns that make it easier for extensions to choose between  
 I think not. Calling the cycle collector is important when holding the GIL because we might be creating/updating/destroying Python objects. But when not holding the GIL we shouldn't be doing that, so the set of objects shouldn't change at all! Other threads that are creating lots of objects from C should be calling  
 I don't know, but it sounds like a "here be dragons" area to me. Let's not try to fix this and then found out we shouldn't have. Someone else (or you :-) could endeavor to remove  | 
| It does look like you need to do a manual merge of a newer main branch before we merge this, since the branch is flagged as having conflicts. I recommend doing that at the latest possible moment (e.g. when you've received all the Approvals you need) so you won't have to do twice. Maybe moving the docs and other files to new PRs will remove the conflict. | 
| VERIFYME: I *think* every piece of this expression is safe to | ||
| execute without holding the GIL and is already sufficiently | ||
| atomic. */ | ||
| if ((!_Py_ThreadCanHandleSignals(tstate->interp) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is missing a check for the remote debugger interface. Otherwise this will not run the remote debugger code when users call this function without the GIL
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The checks are the same ones that _PyRunRemoteDebugger do before entering the actual code
| 
 I'll dissent here. It should, just like  A new macro like  (For the record, it's documented, the question being is whether that's a guarantee of how things work or just an illustration. In my opinion, it is at least an implementation guide for non-C/C++ users.) | 
| 
 I don't understand why those save/restore operations need to be done using a macro. They could just as easily be done by a function that does that save/restore behavior itself. I don't want to turn the old PyErr_CheckSignals (unchanged in functionality, just refactored) into a macro, and I don't believe things are so perf-critical that the first checks in PyErr_CheckSignals_Detached need to be a macro. So I still don't think there's anything wrong with calling a function that retrieves and restores the tstate. | 
| Well, explicit is better than implicit. The code between  For example: #include <Python.h>
#include <stdio.h>
#define N_ITERATIONS 2
static void call_into_some_library(void);
// For simplicity, store our main interpreter globally
PyInterpreterState *main_interp = NULL;
static PyObject *
demo_func(PyObject *mod, PyObject *arg) {
    main_interp = PyInterpreterState_Get();
    printf("Start! This thread state = %p\n",
           PyGILState_GetThisThreadState());
    Py_BEGIN_ALLOW_THREADS;
    for (int i=0; i<N_ITERATIONS; i++) {
        // Do some C work using some library...
        call_into_some_library();
        // if PyGILState_GetThisThreadState() is NULL, the
        // PyErr_CheckSignals from this PR will crash
        printf("%d/%d: this thread state = %p\n", i, N_ITERATIONS,
               PyGILState_GetThisThreadState());
    }
    Py_END_ALLOW_THREADS;
    Py_RETURN_NONE;
}
// Meanwhile, the library wants to do some Python...
static void
call_into_some_library(void) {
    /* ... */
    if (PyThreadState_GetUnchecked() == NULL) {
        // no thread state attached!
        // so make one & attach it
        PyThreadState *ts = PyThreadState_New(main_interp);
        PyEval_AcquireThread(ts);
        // call some fun Python API
        PyObject *s = PyUnicode_FromString("all OK down here\n");
        PyObject_Print(s, stdout, Py_PRINT_RAW);
        Py_DECREF(s);
        // and get rid of the thread state
        PyThreadState_Clear(ts);
        PyThreadState_DeleteCurrent();
        // by this we have reset "this" thread state to NULL
    }
    else { /* ... */ }
}
static PyModuleDef module = {
    .m_name = "extension",
    .m_methods = (PyMethodDef[]) {
        {"demo_func", demo_func, METH_NOARGS},
        {0},
    },
};
PyMODINIT_FUNC
PyInit_extension(void) {
    return PyModuleDef_Init(&module);
} | 
| 
 I agree with that part, the macros could just have been functions. But they should be explicit about the state they want to get back to. | 
| The new function is implemented with      PyGILState_STATE st = PyGILState_Ensure();
    int err = check_signals_attached(tstate, cycle_collect);
    PyGILState_Release(st); | 
| It might be fine, because subinterpreters don't currently support handling signals anyway. But, I don't know if that will always be the case. I was talking to Eric at the sprints about handling signals in any interpreter active in the main thread. | 
| 
 Let's say that the function is called in a subinterpreter running in the main thread, and  | 
| Hmm, yeah, that doesn't sound great. | 
| Yeah, I think we need a somewhat different design to handle subinterpreters and free threading smoothly. And PyErr_CheckSignals may need to be changed as well? Maybe the forward-thinking approach would be a single function then??? (Sorry @zackw). | 
| 
 | 
| I'm not convinced that we have to provide a way to "Allow PyErr_CheckSignals to be called without holding the GIL". PEP 788 shows that it's quite complicated to acquire/release the GIL, there are corner cases and we need a new (more elaborated) API to prevent bugs/crashes. 
 Is there a risk that the interpreter pointer can become a dangling pointer? | 
| 
 Yes, but that's true of the existing APIs too. If PEP 788 is accepted, we can change this to use a strong interpreter reference. | 
| The function already checks that it is the main thread. Is there a way to check that we’re in the main interpreter? Assuming we can check those without having the GIL or a tstate, would that simplify the requirements? Or is there still ambiguity? | 
| I think the goal is to eventually get signal handling working in subinterpreters that are running in the main thread too. Obligatory ping @ericsnowcurrently | 
| Yeah, that's a possibility. We'll have to see what makes the most sense. | 
| 
 Let's let the folks working on that feature worry about it. But is there way to check that you're in the main interpreter? | 
| 
 | 
| An issue regarding calls to  
 @zackw Sorry, missed this comment. I think you'd really benefit from creating the issue yourself. This proposal might go as far as to make a PEP, and that requires being comfortable with community discussion. Besides, the C API workground doesn't bite :) Alternatively, let's open a post on DPO if we want some broader community feedback. | 
| Sorry for disappearing for two weeks, everyone. I was really worn out after PyCon and then my day job needed me to do unrelated stuff. @vstinner I can't tell what exactly you're objecting to. My fundamental motivation for this patch is to make it possible to efficiently check for signals, somehow, at appropriate points in the middle of 'Efficiently' here means 'without re-attaching the thread state unless there is a pending signal to be processed', because checking for signals is cheap, but reattaching the thread state around a call to  Now, those numbers were generated using Python 3.12. Is it your contention that we shouldn't need a new API for checking for signals at all, because in a free-threaded world it won't be necessary for NumPy and the like to release the thread state around its many "large volumes of arbitrarily complicated C", or because in a free-threaded world will be so cheap (when there is no pending signal) that it won't be worth worrying about? | 
| @ZeroIntensity I have now filed an issue with the C-API working group (linked above). For the record, when I said "I'm too tired to do this", I meant right then, in the throes of post-PyCon brain crash. | 
Addresses #133465. See there, or the commit message for the first commit in this PR, for rationale.
There are two commits: the first actually implements the change, and the second demonstrates the motivation for it by pulling a lot of uses of
Py_(BEGIN|END)_ALLOW_THREADSwithin Python's stdlib out of loops.This has been tested (lightly - just the built in testsuite) both with and without
--disable-gil; however, I did not test--enable-optimizationsnor--enable-experimental-jit.📚 Documentation preview 📚: https://cpython-previews--133466.org.readthedocs.build/