Skip to content

Conversation

@lysnikolaou
Copy link
Member

@lysnikolaou lysnikolaou commented Oct 20, 2025

Copy link
Contributor

@ngoldbaum ngoldbaum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sure others will have opinions but here are my takes...

Doc/glossary.rst Outdated
:term:`data races <data race>` and corrupted data. In the
:term:`free-threaded <free threading>` build, built-in types like
:class:`dict`, :class:`list`, and :class:`set` use internal locks
to protect against concurrent modifications.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it makes sense to talk about third-party extensions that might not necessarily prevent data races?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that may be better in documentation than in the glossary.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the data races glossary entry covers this.


critical section
A section of code that accesses shared resources and must not be
executed by multiple threads simultaneously. Critical sections are
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should emphasize that critical sections are purely a concept in the C API and aren't exposed in the python language

Doc/glossary.rst Outdated
lead to :term:`non-deterministic` behavior and can cause data corruption.
Proper use of :term:`locks <lock>` and other :term:`synchronization primitives
<synchronization primitive>` prevents data races. See also
:term:`race condition` and :term:`thread-safe`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should explain that data races are only possible via extensions and not via pure-python code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may want to keep the glossary entry generic since adding this may give the false impression that user code can not create a data race.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @willingc that we should keep the glossary entries generic. I've removed references to APIs from other entries as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still worry this is a little vague in terms of what precisely causes a data race and we can maybe make that clearer. Maybe we can include the fact that the read and write needs to be in low-level code, although the low-level issue might be triggered by a high-level Python API.

It's certainly hard to be precise without being confusing to someone who isn't familiar with C extensions.

I worry that the current phrasing implies that two threads racing to update an attribute of a python class is a data race. But it's not, it's a race condition, and no data race happens because there is low-level synchronization in the CPython implementation. I realize the current definition includes this, but I worry that a reader might gather that "synchronization" is always via a threading.Lock or another Python API.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about "Note that data races can only happen in native code, but that native code might be exposed in a Python API"?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that, although maybe "native code" deserves an entry as well...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Contributor

@willingc willingc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lysnikolaou @ngoldbaum I left some comments. This is challenging to write since there is a balance between brevity and the desire to cover too much. Let's focus first on the common definition and second on any free-threading specifics. If it is difficult to add free-threading specifics for any term, let's cover that in a separate doc.

@-mention me when you want another review. Thanks.

Doc/glossary.rst Outdated
:term:`data races <data race>` and corrupted data. In the
:term:`free-threaded <free threading>` build, built-in types like
:class:`dict`, :class:`list`, and :class:`set` use internal locks
to protect against concurrent modifications.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that may be better in documentation than in the glossary.


critical section
A section of code that accesses shared resources and must not be
executed by multiple threads simultaneously. Critical sections are
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
executed by multiple threads simultaneously. Critical sections are
executed by multiple threads simultaneously. Critical sections are
purely a concept in the C API and are not exposed in Python.
Critical sections are

Doc/glossary.rst Outdated
:term:`data races <data race>` and corrupted data. In the
:term:`free-threaded <free threading>` build, built-in types like
:class:`dict`, :class:`list`, and :class:`set` use internal locks
to protect against concurrent modifications.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the data races glossary entry covers this.

Doc/glossary.rst Outdated
lead to :term:`non-deterministic` behavior and can cause data corruption.
Proper use of :term:`locks <lock>` and other :term:`synchronization primitives
<synchronization primitive>` prevents data races. See also
:term:`race condition` and :term:`thread-safe`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may want to keep the glossary entry generic since adding this may give the false impression that user code can not create a data race.

Doc/glossary.rst Outdated
Comment on lines 1098 to 1104
non-deterministic
Behavior where the outcome of a program can vary between executions with
the same inputs. In multi-threaded programs, non-deterministic behavior
often results from :term:`race conditions <race condition>` where the
relative timing or interleaving of threads affects the result.
:term:`Data races <data race>` are a common cause of non-deterministic
bugs. Proper synchronization using :term:`locks <lock>` and other
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
non-deterministic
Behavior where the outcome of a program can vary between executions with
the same inputs. In multi-threaded programs, non-deterministic behavior
often results from :term:`race conditions <race condition>` where the
relative timing or interleaving of threads affects the result.
:term:`Data races <data race>` are a common cause of non-deterministic
bugs. Proper synchronization using :term:`locks <lock>` and other
the same inputs. An example of non-deterministic behavior
are :term:`race conditions <race condition>`.
Proper synchronization using :term:`locks <lock>` and other

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just removed the sentence @ngoldbaum mentioned. Does it read okay like this? I'd like to keep the mention of relative timing and interleaving in.

Copy link
Contributor

@ngoldbaum ngoldbaum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hope you don't mind a few more suggestions that occurred to me as I was giving this another once-over. This is much closer to being ready than when I last looked and adding all these entries will be a huge improvement over the status quo. Definitely worth backporting to the 3.14 docs ASAP!

Doc/glossary.rst Outdated
:class:`~threading.Semaphore`, :class:`~threading.Condition`,
:class:`~threading.Event`, and :class:`~threading.Barrier`. Additionally,
the :mod:`queue` module provides multi-producer, multi-consumer queues
that are especially usedul in multithreaded programs. These
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

useful

Doc/glossary.rst Outdated
Comment on lines 310 to 311
proper synchronization. Concurrent modification can lead to
:term:`data races <data race>` and corrupted data.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
proper synchronization. Concurrent modification can lead to
:term:`data races <data race>` and corrupted data.
proper synchronization. Concurrent modification cause
:term:`race conditions <race condition>`, and might also trigger a
:term:`data race <data race>`, data corruption, or both.

I think concurrent modification without synchronization is a synonym for a race condition? So I want to link to that term in this term. I also think a data race doesn't necessarily imply data corruption, so I wanted to be a little more measured about that.

Doc/glossary.rst Outdated
lead to :term:`non-deterministic` behavior and can cause data corruption.
Proper use of :term:`locks <lock>` and other :term:`synchronization primitives
<synchronization primitive>` prevents data races. See also
:term:`race condition` and :term:`thread-safe`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still worry this is a little vague in terms of what precisely causes a data race and we can maybe make that clearer. Maybe we can include the fact that the read and write needs to be in low-level code, although the low-level issue might be triggered by a high-level Python API.

It's certainly hard to be precise without being confusing to someone who isn't familiar with C extensions.

I worry that the current phrasing implies that two threads racing to update an attribute of a python class is a data race. But it's not, it's a race condition, and no data race happens because there is low-level synchronization in the CPython implementation. I realize the current definition includes this, but I worry that a reader might gather that "synchronization" is always via a threading.Lock or another Python API.

Doc/glossary.rst Outdated
variables, class variables, or C static variables in :term:`extension modules
<extension module>`. In multi-threaded programs, global state shared
between threads typically requires synchronization to avoid
:term:`race conditions <race condition>`. In the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's include both terms.

Suggested change
:term:`race conditions <race condition>`. In the
:term:`race conditions <race condition>` and :term:`data races <data race>`. In the

Doc/glossary.rst Outdated
See also :term:`regular package` and :term:`namespace package`.

parallelism
The simultaneous execution of multiple operations on different CPU cores.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The simultaneous execution of multiple operations on different CPU cores.
The simultaneous execution of operations on multiple processors.

To allow for GPUs

Doc/glossary.rst Outdated
to avoid shared mutable state entirely. In the
:term:`free-threaded <free threading>` build, built-in types like
:class:`dict`, :class:`list`, and :class:`set` use internal locking
to provide thread-safe operations, though this doesn't guarantee safety
Copy link
Contributor

@ngoldbaum ngoldbaum Oct 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
to provide thread-safe operations, though this doesn't guarantee safety
to make many operations thread-safe, although thread safety is not necessarily guaranteed

The way you have this line now is a little confusing IMO, since it says that everything is thread-safe, but no guarantees. I think the way I rephrased it here is correct and leaves it a little clearer that lots of stuff is thread-safe but there are exceptions.

When eventually we have a full listing of what exactly is thread-safe and thread-unsafe in the APIs of the builtins, we can link to that here, maybe?

@colesbury colesbury self-requested a review October 29, 2025 18:12
Copy link
Contributor

@colesbury colesbury left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I think adding multithreading related entries will be helfpul

not necessarily simultaneously. In Python, concurrency can be achieved
through :mod:`threading` (using OS threads), :mod:`asyncio` (cooperative
multitasking), or :mod:`multiprocessing` (separate processes).
See also :term:`parallelism`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like the "out-of-order" definition. Concurrency is about things happening or being performed at the same time (in both computing and non-computing contexts).

Something like:

The ability of a computer program to perform multiple tasks at the same time. Python provides libraries for writing programs that make use of different forms of concurrency. :mod:asyncio is a library for dealing with asynchronous tasks and coroutines. :mod:threading provides access to operating system threads and :mod:multiprocessing to operating system processes. Multi-core processors can execute threads and processes on different CPU cores at the same time (see :term:parallelism).


atomic operation
An operation that completes as a single indivisible unit without
interruption from other threads. Atomic operations are critical for
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've seen "interrupt" in various definitions online, but I think "interrupt" is confusing in this context. I'm not really sure what it refers to here and the granularity of "atomic" vs. "interrupt" can be different. For example, a thread performing an atomic operation with locks can rescheduled (interrupted) by the OS without breaking atomicity.

I like this definition (from ChatGPT):

An operation that appears to execute as a single, indivisible step: no other thread can observe it half-done, and its effects become visible all at once. Python does not guarantee that ordinary high-level statements are atomic (for example, x += 1 performs multiple bytecode operations and is not atomic). Atomicity is only guaranteed where explicitly documented (e.g., operations performed while holding a lock, or methods of synchronization primitives such as those in threading and queue).

:keyword:`async with` keywords. These were introduced
by :pep:`492`.

critical section
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should define critical section, at least not for now. The way we use Py_BEGIN_CRITICAL_SECTION in the C API strays from the classical definition of a "critical section" because it can be suspended/interrupted.

I'd rather talk about those details in the C API docs instead of the Python glossary.

the :term:`cyclic garbage collector <garbage collection>` is to identify these groups and break the reference
cycles so that the memory can be reclaimed.

data race
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are only talking about native code (C or C++ or Rust or whatever), maybe we should leave this out of the glossary for now.

program that makes blocking calls using more than one lock is possibly
susceptible to deadlocks. Deadlocks can be avoided by always acquiring
multiple :term:`locks <lock>` in a consistent order or by using
timeout-based locking. See also :term:`lock` and :term:`reentrant`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First sentence:

A situation in which two or more tasks (threads, processes, or coroutines) wait indefinitely for each other to release resources or complete actions, preventing any from making progress.

And maybe:

In Python this often arises from acquiring multiple locks in conflicting orders or from circular join()/await dependencies.

Any
program that makes blocking calls using more than one lock is possibly
susceptible to deadlocks

This is too strong.

or by using timeout-based locking

I'd get rid of this. It may be true in some sense, but I don't know of a situation where that would actually be useful advice.

The simultaneous execution of operations on multiple processors.
True parallelism requires multiple processors or processor cores where
operations run at exactly the same time and are not just interleaved.
In Python, the :term:`free-threaded <free threading>` build enables
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe something like:

Executing multiple operations at the same time (e.g., on multiple CPU cores). In Python builds with the :term:global interpreter lock (GIL), only one thread runs Python bytecode at a time, so taking advantage of multiple CPU cores typically involves multiple processes (e.g., :term:multiprocessing) or native extensions that release the GIL. In :term:free-threaded` Python, multiple Python threads can run Python code simultaneously on different cores.

:class:`str` or :class:`bytes` result instead, respectively. Introduced
by :pep:`519`.

per-module state
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we elsewhere define global state as including per-module state?

Comment on lines +1548 to +1549
Code that functions correctly when accessed by multiple threads
concurrently. Thread-safe code uses appropriate
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to rework the first sentence a bit. We talk about thread-safe modules and data structures (classes), not just "code"

Maybe something like:

A module, function, or class that behaves correctly when used by multiple threads concurrently.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

awaiting core review docs Documentation in the Doc dir skip news

Projects

Status: Todo

Development

Successfully merging this pull request may close these issues.

5 participants