-
-
Notifications
You must be signed in to change notification settings - Fork 33.3k
gh-140374: Add glossary entries related to multithreading #140375
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 3 commits
64157d0
0825a98
3f44be0
568d201
22cf4f6
bd7b731
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -134,6 +134,16 @@ Glossary | |||||||||||
| iterator's :meth:`~object.__anext__` method until it raises a | ||||||||||||
| :exc:`StopAsyncIteration` exception. Introduced by :pep:`492`. | ||||||||||||
|
|
||||||||||||
| atomic operation | ||||||||||||
| An operation that completes as a single indivisible unit without | ||||||||||||
| interruption from other threads. Atomic operations are critical for | ||||||||||||
| :term:`thread-safe` programming because they cannot be observed in a | ||||||||||||
| partially completed state by other threads. In the | ||||||||||||
| :term:`free-threaded <free threading>` build, elementary operations | ||||||||||||
lysnikolaou marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||||||||
| should generally be assumed to be atomic unless the documentation | ||||||||||||
| explicitly states otherwise. See also :term:`race condition` and | ||||||||||||
| :term:`data race`. | ||||||||||||
|
|
||||||||||||
| attached thread state | ||||||||||||
|
|
||||||||||||
| A :term:`thread state` that is active for the current OS thread. | ||||||||||||
|
|
@@ -289,6 +299,20 @@ Glossary | |||||||||||
| advanced mathematical feature. If you're not aware of a need for them, | ||||||||||||
| it's almost certain you can safely ignore them. | ||||||||||||
|
|
||||||||||||
| concurrency | ||||||||||||
| The ability of different parts of a program to be executed out-of-order | ||||||||||||
| or in partial order without affecting the outcome. This allows for | ||||||||||||
| multiple tasks to make progress during overlapping time periods, though | ||||||||||||
| not necessarily simultaneously. In Python, concurrency can be achieved | ||||||||||||
| through :mod:`threading` (using OS threads), :mod:`asyncio` (cooperative | ||||||||||||
| multitasking), or :mod:`multiprocessing` (separate processes). | ||||||||||||
| See also :term:`parallelism`. | ||||||||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't like the "out-of-order" definition. Concurrency is about things happening or being performed at the same time (in both computing and non-computing contexts). Something like: The ability of a computer program to perform multiple tasks at the same time. Python provides libraries for writing programs that make use of different forms of concurrency. :mod: |
||||||||||||
|
|
||||||||||||
| concurrent modification | ||||||||||||
| When multiple threads modify shared data at the same time without | ||||||||||||
| proper synchronization. Concurrent modification can lead to | ||||||||||||
| :term:`data races <data race>` and corrupted data. | ||||||||||||
|
||||||||||||
| proper synchronization. Concurrent modification can lead to | |
| :term:`data races <data race>` and corrupted data. | |
| proper synchronization. Concurrent modification cause | |
| :term:`race conditions <race condition>`, and might also trigger a | |
| :term:`data race <data race>`, data corruption, or both. |
I think concurrent modification without synchronization is a synonym for a race condition? So I want to link to that term in this term. I also think a data race doesn't necessarily imply data corruption, so I wanted to be a little more measured about that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should define critical section, at least not for now. The way we use Py_BEGIN_CRITICAL_SECTION in the C API strays from the classical definition of a "critical section" because it can be suspended/interrupted.
I'd rather talk about those details in the C API docs instead of the Python glossary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should emphasize that critical sections are purely a concept in the C API and aren't exposed in the python language
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| executed by multiple threads simultaneously. Critical sections are | |
| executed by multiple threads simultaneously. Critical sections are | |
| purely a concept in the C API and are not exposed in Python. | |
| Critical sections are |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are only talking about native code (C or C++ or Rust or whatever), maybe we should leave this out of the glossary for now.
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should explain that data races are only possible via extensions and not via pure-python code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may want to keep the glossary entry generic since adding this may give the false impression that user code can not create a data race.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @willingc that we should keep the glossary entries generic. I've removed references to APIs from other entries as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still worry this is a little vague in terms of what precisely causes a data race and we can maybe make that clearer. Maybe we can include the fact that the read and write needs to be in low-level code, although the low-level issue might be triggered by a high-level Python API.
It's certainly hard to be precise without being confusing to someone who isn't familiar with C extensions.
I worry that the current phrasing implies that two threads racing to update an attribute of a python class is a data race. But it's not, it's a race condition, and no data race happens because there is low-level synchronization in the CPython implementation. I realize the current definition includes this, but I worry that a reader might gather that "synchronization" is always via a threading.Lock or another Python API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about "Note that data races can only happen in native code, but that native code might be exposed in a Python API"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like that, although maybe "native code" deserves an entry as well...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First sentence:
A situation in which two or more tasks (threads, processes, or coroutines) wait indefinitely for each other to release resources or complete actions, preventing any from making progress.
And maybe:
In Python this often arises from acquiring multiple locks in conflicting orders or from circular join()/await dependencies.
Any
program that makes blocking calls using more than one lock is possibly
susceptible to deadlocks
This is too strong.
or by using timeout-based locking
I'd get rid of this. It may be true in some sense, but I don't know of a situation where that would actually be useful advice.
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's include both terms.
| :term:`race conditions <race condition>`. In the | |
| :term:`race conditions <race condition>` and :term:`data races <data race>`. In the |
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| The simultaneous execution of multiple operations on different CPU cores. | |
| The simultaneous execution of operations on multiple processors. |
To allow for GPUs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe something like:
Executing multiple operations at the same time (e.g., on multiple CPU cores). In Python builds with the :term:global interpreter lock (GIL), only one thread runs Python bytecode at a time, so taking advantage of multiple CPU cores typically involves multiple processes (e.g., :term:multiprocessing) or native extensions that release the GIL. In :term:free-threaded` Python, multiple Python threads can run Python code simultaneously on different cores.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't we elsewhere define global state as including per-module state?
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
useful
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to rework the first sentence a bit. We talk about thread-safe modules and data structures (classes), not just "code"
Maybe something like:
A module, function, or class that behaves correctly when used by multiple threads concurrently.
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| to provide thread-safe operations, though this doesn't guarantee safety | |
| to make many operations thread-safe, although thread safety is not necessarily guaranteed |
The way you have this line now is a little confusing IMO, since it says that everything is thread-safe, but no guarantees. I think the way I rephrased it here is correct and leaves it a little clearer that lots of stuff is thread-safe but there are exceptions.
When eventually we have a full listing of what exactly is thread-safe and thread-unsafe in the APIs of the builtins, we can link to that here, maybe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've seen "interrupt" in various definitions online, but I think "interrupt" is confusing in this context. I'm not really sure what it refers to here and the granularity of "atomic" vs. "interrupt" can be different. For example, a thread performing an atomic operation with locks can rescheduled (interrupted) by the OS without breaking atomicity.
I like this definition (from ChatGPT):
An operation that appears to execute as a single, indivisible step: no other thread can observe it half-done, and its effects become visible all at once. Python does not guarantee that ordinary high-level statements are atomic (for example, x += 1 performs multiple bytecode operations and is not atomic). Atomicity is only guaranteed where explicitly documented (e.g., operations performed while holding a lock, or methods of synchronization primitives such as those in threading and queue).