Skip to content

Commit 5478c7f

Browse files
ericsnowcurrentlymiss-islington
authored andcommitted
pythongh-135944: Add a "Runtime Components" Section to the Execution Model Docs (pythongh-135945)
The section provides a brief overview of the Python runtime's execution environment. It is meant to be implementation agnostic, (cherry picked from commit 46a1f0a) Co-authored-by: Eric Snow <[email protected]>
1 parent a312dd0 commit 5478c7f

File tree

1 file changed

+186
-0
lines changed

1 file changed

+186
-0
lines changed

Doc/reference/executionmodel.rst

Lines changed: 186 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -398,6 +398,192 @@ See also the description of the :keyword:`try` statement in section :ref:`try`
398398
and :keyword:`raise` statement in section :ref:`raise`.
399399

400400

401+
.. _execcomponents:
402+
403+
Runtime Components
404+
==================
405+
406+
General Computing Model
407+
-----------------------
408+
409+
Python's execution model does not operate in a vacuum. It runs on
410+
a host machine and through that host's runtime environment, including
411+
its operating system (OS), if there is one. When a program runs,
412+
the conceptual layers of how it runs on the host look something
413+
like this:
414+
415+
| **host machine**
416+
| **process** (global resources)
417+
| **thread** (runs machine code)
418+
419+
Each process represents a program running on the host. Think of each
420+
process itself as the data part of its program. Think of the process'
421+
threads as the execution part of the program. This distinction will
422+
be important to understand the conceptual Python runtime.
423+
424+
The process, as the data part, is the execution context in which the
425+
program runs. It mostly consists of the set of resources assigned to
426+
the program by the host, including memory, signals, file handles,
427+
sockets, and environment variables.
428+
429+
Processes are isolated and independent from one another. (The same
430+
is true for hosts.) The host manages the process' access to its
431+
assigned resources, in addition to coordinating between processes.
432+
433+
Each thread represents the actual execution of the program's machine
434+
code, running relative to the resources assigned to the program's
435+
process. It's strictly up to the host how and when that execution
436+
takes place.
437+
438+
From the point of view of Python, a program always starts with exactly
439+
one thread. However, the program may grow to run in multiple
440+
simultaneous threads. Not all hosts support multiple threads per
441+
process, but most do. Unlike processes, threads in a process are not
442+
isolated and independent from one another. Specifically, all threads
443+
in a process share all of the process' resources.
444+
445+
The fundamental point of threads is that each one does *run*
446+
independently, at the same time as the others. That may be only
447+
conceptually at the same time ("concurrently") or physically
448+
("in parallel"). Either way, the threads effectively run
449+
at a non-synchronized rate.
450+
451+
.. note::
452+
453+
That non-synchronized rate means none of the process' memory is
454+
guaranteed to stay consistent for the code running in any given
455+
thread. Thus multi-threaded programs must take care to coordinate
456+
access to intentionally shared resources. Likewise, they must take
457+
care to be absolutely diligent about not accessing any *other*
458+
resources in multiple threads; otherwise two threads running at the
459+
same time might accidentally interfere with each other's use of some
460+
shared data. All this is true for both Python programs and the
461+
Python runtime.
462+
463+
The cost of this broad, unstructured requirement is the tradeoff for
464+
the kind of raw concurrency that threads provide. The alternative
465+
to the required discipline generally means dealing with
466+
non-deterministic bugs and data corruption.
467+
468+
Python Runtime Model
469+
--------------------
470+
471+
The same conceptual layers apply to each Python program, with some
472+
extra data layers specific to Python:
473+
474+
| **host machine**
475+
| **process** (global resources)
476+
| Python global runtime (*state*)
477+
| Python interpreter (*state*)
478+
| **thread** (runs Python bytecode and "C-API")
479+
| Python thread *state*
480+
481+
At the conceptual level: when a Python program starts, it looks exactly
482+
like that diagram, with one of each. The runtime may grow to include
483+
multiple interpreters, and each interpreter may grow to include
484+
multiple thread states.
485+
486+
.. note::
487+
488+
A Python implementation won't necessarily implement the runtime
489+
layers distinctly or even concretely. The only exception is places
490+
where distinct layers are directly specified or exposed to users,
491+
like through the :mod:`threading` module.
492+
493+
.. note::
494+
495+
The initial interpreter is typically called the "main" interpreter.
496+
Some Python implementations, like CPython, assign special roles
497+
to the main interpreter.
498+
499+
Likewise, the host thread where the runtime was initialized is known
500+
as the "main" thread. It may be different from the process' initial
501+
thread, though they are often the same. In some cases "main thread"
502+
may be even more specific and refer to the initial thread state.
503+
A Python runtime might assign specific responsibilities
504+
to the main thread, such as handling signals.
505+
506+
As a whole, the Python runtime consists of the global runtime state,
507+
interpreters, and thread states. The runtime ensures all that state
508+
stays consistent over its lifetime, particularly when used with
509+
multiple host threads.
510+
511+
The global runtime, at the conceptual level, is just a set of
512+
interpreters. While those interpreters are otherwise isolated and
513+
independent from one another, they may share some data or other
514+
resources. The runtime is responsible for managing these global
515+
resources safely. The actual nature and management of these resources
516+
is implementation-specific. Ultimately, the external utility of the
517+
global runtime is limited to managing interpreters.
518+
519+
In contrast, an "interpreter" is conceptually what we would normally
520+
think of as the (full-featured) "Python runtime". When machine code
521+
executing in a host thread interacts with the Python runtime, it calls
522+
into Python in the context of a specific interpreter.
523+
524+
.. note::
525+
526+
The term "interpreter" here is not the same as the "bytecode
527+
interpreter", which is what regularly runs in threads, executing
528+
compiled Python code.
529+
530+
In an ideal world, "Python runtime" would refer to what we currently
531+
call "interpreter". However, it's been called "interpreter" at least
532+
since introduced in 1997 (`CPython:a027efa5b`_).
533+
534+
.. _CPython:a027efa5b: https://github.com/python/cpython/commit/a027efa5b
535+
536+
Each interpreter completely encapsulates all of the non-process-global,
537+
non-thread-specific state needed for the Python runtime to work.
538+
Notably, the interpreter's state persists between uses. It includes
539+
fundamental data like :data:`sys.modules`. The runtime ensures
540+
multiple threads using the same interpreter will safely
541+
share it between them.
542+
543+
A Python implementation may support using multiple interpreters at the
544+
same time in the same process. They are independent and isolated from
545+
one another. For example, each interpreter has its own
546+
:data:`sys.modules`.
547+
548+
For thread-specific runtime state, each interpreter has a set of thread
549+
states, which it manages, in the same way the global runtime contains
550+
a set of interpreters. It can have thread states for as many host
551+
threads as it needs. It may even have multiple thread states for
552+
the same host thread, though that isn't as common.
553+
554+
Each thread state, conceptually, has all the thread-specific runtime
555+
data an interpreter needs to operate in one host thread. The thread
556+
state includes the current raised exception and the thread's Python
557+
call stack. It may include other thread-specific resources.
558+
559+
.. note::
560+
561+
The term "Python thread" can sometimes refer to a thread state, but
562+
normally it means a thread created using the :mod:`threading` module.
563+
564+
Each thread state, over its lifetime, is always tied to exactly one
565+
interpreter and exactly one host thread. It will only ever be used in
566+
that thread and with that interpreter.
567+
568+
Multiple thread states may be tied to the same host thread, whether for
569+
different interpreters or even the same interpreter. However, for any
570+
given host thread, only one of the thread states tied to it can be used
571+
by the thread at a time.
572+
573+
Thread states are isolated and independent from one another and don't
574+
share any data, except for possibly sharing an interpreter and objects
575+
or other resources belonging to that interpreter.
576+
577+
Once a program is running, new Python threads can be created using the
578+
:mod:`threading` module (on platforms and Python implementations that
579+
support threads). Additional processes can be created using the
580+
:mod:`os`, :mod:`subprocess`, and :mod:`multiprocessing` modules.
581+
Interpreters can be created and used with the
582+
:mod:`~concurrent.interpreters` module. Coroutines (async) can
583+
be run using :mod:`asyncio` in each interpreter, typically only
584+
in a single thread (often the main thread).
585+
586+
401587
.. rubric:: Footnotes
402588

403589
.. [#] This limitation occurs because the code that is executed by these operations

0 commit comments

Comments
 (0)