@@ -4,38 +4,46 @@ ref: porting-guide
44
55# Porting Python Packages to Support Free-Threading
66
7- This document discusses porting an existing Python package to support free-threading Python.
8-
9- ## Current status (as of early 2025)
10-
11- Many Python packages, particularly packages relying on C
12- extension modules, do not consider multithreaded use or make strong
13- assumptions about the GIL providing sequential consistency in multithreaded
14- contexts. These packages will:
15-
16- - fail to produce deterministic results on the free-threaded build
17- - may, if there are C extensions involved, crash the interpreter in multithreaded use in ways that are impossible on the
18- GIL-enabled build
7+ Many packages already support free-threaded Python. Check the [ tracking
8+ table] ( tracking.md ) in this guide, the [ free-threaded wheels
9+ tracker] ( https://hugovk.github.io/free-threaded-wheels/ ) , and the documentation
10+ and PyPI release pages for packages your project depends on to evaluate whether
11+ your project can run on the free-threaded build. In addition, you may need to
12+ update your code to support the free-threaded build.
13+
14+ ## Why do projects need updates?
15+
16+ Free-threaded Python can exploit the many cores present in modern CPUs in pure
17+ Python code. In all previous Python releases before the free-threaded build and
18+ in the current default build, only one thread at a time could execute Python
19+ code because of the [ global interpreter
20+ lock] ( https://docs.python.org/3/glossary.html#term-global-interpreter-lock ) (the
21+ GIL).
1922
2023Attempting to parallelize many workflows using the Python
2124[ threading] ( https://docs.python.org/3/library/threading.html ) module will not
22- produce any speedups on the GIL-enabled build, so thread safety issues that are possible even with the
23- GIL are not hit often since users do not make use of threading as much as other
24- parallelization strategies. This means many codebases have threading bugs that
25- up-until-now have only been theoretical or present in niche use cases. With
26- free-threading, many more users will want to use Python threads.
27-
28- This means we must analyze Python codebases to identify supported and
29- unsupported multithreaded workflows and make changes to fix thread safety
30- issues. Extra care must be taken to address this need, particularly when using low-level C, C++, Cython, and Rust
31- code exposed to Python. Even pure Python codebases can exhibit
32- non-determinism and races in the free-threaded build that are either very
33- unlikely or impossible in the default configuration of the GIL-enabled build.
25+ produce any speedups on the GIL-enabled build. This means many codebases have
26+ threading bugs that up-until-now have only been theoretical or present in niche
27+ use cases. With free-threading, many more users will want to use Python threads,
28+ making fixing existing thread safety issues more important. Additionally,
29+ free-threading makes new kinds of concurrent use possible, so situations were
30+ the GIL * was* providing safety will need new analysis to ensure they are safe
31+ under free-threaded Python.
32+
33+ Packages that have not yet been updated may exhibit behaviors such as:
34+
35+ - Fail to produce deterministic results on the free-threaded build and may not be
36+ deterministic * with* the GIL either.
37+ - May, if there are C extensions involved, crash the interpreter in multithreaded
38+ use in ways that are impossible on the GIL-enabled build. Some extensions may
39+ crash the interpreter under multithreaded use even with the GIL.
3440
3541For a more in-depth look at the differences between the GIL-enabled and
3642free-threaded build, we suggest reading [ the ` ft_utils `
3743documentation] ( https://github.com/facebookincubator/ft_utils/blob/main/docs/ft_worked_examples.md )
38- on this topic.
44+ on this topic. Also see the [ section of this porting
45+ guide] ( porting-extensions.md ) on extensions to understand why compiled code
46+ needs special updates to support the free threaded build.
3947
4048<!-- ref:plan-of-attack -->
4149
@@ -101,17 +109,36 @@ Many projects assume the GIL serializes access to state shared between threads,
101109introducing the possibility of data races in native extensions and race
102110conditions that are impossible when the GIL is enabled.
103111
104- We suggest focusing on safety and multithreaded scaling before single-threaded
105- performance.
112+ Ideally it should be possible to add safety without adding any performance
113+ cost. This may be impossible in the real world but is the ideal goal. You should
114+ benchmark to check that single-threaded performance is not seriously impacted by
115+ work to improve thread safety. It may be possible to set things up so that
116+ single-threaded users of your library can find ways to avoid paying the cost of
117+ synchronization.
118+
119+ If there is no way to add zero-cost thread-safety but the GIL is sufficient to
120+ prevent races on the GIL-enabled build, consider adding logic that only triggers
121+ if the GIL is disabled at runtime or only triggers on the free-threaded build:
122+
123+ ``` python
124+ import sys
125+ import sysconfig
126+
127+ if not getattr (sys, ' _is_gil_enabled' , lambda : True )():
128+ # logic that only happens if the GIL is disabled
129+
130+ if sysconfig.get_config_var(" Py_GIL_DISABLED" ):
131+ # logic that only happens on the free-threaded build
132+ ```
106133
107- Here's an example of this approach. If adding a lock to a global cache would harm
108- multithreaded scaling, and turning off the cache implies a small performance
109- hit, consider doing the simpler thing and disabling the cache in the
134+ Here's an example of this approach. If adding a lock to a global cache would
135+ harm multithreaded scaling, and turning off the cache implies a small
136+ performance hit, consider doing the simpler thing and disabling the cache in the
110137free-threaded build.
111138
112- Single-threaded performance can always be improved later,
113- once you've established free-threaded support and hopefully improved test
114- coverage for multithreaded workflows.
139+ Single-threaded performance can always be improved later, once you've
140+ established free-threaded support and hopefully improved test coverage for
141+ multithreaded workflows.
115142
116143NumPy, for example, decided * not* to add explicit locking to the ndarray object
117144and [ does not support mutating shared
@@ -221,7 +248,86 @@ This wouldn't help a case where each thread having a copy of the cache would be
221248prohibitive, but it does fix possible issues with resource leaks issues due to
222249races filling a cache.
223250
224- ### Making mutable global caches thread-safe with locking
251+ <!-- ref:copy-on-write -->
252+
253+ ### Copy-on-Write
254+
255+ [ Copy-on-Write (CoW)] ( https://en.wikipedia.org/wiki/Copy-on-write ) is a
256+ thread-safe pattern to implement lock-free sharing of data structures. It is
257+ useful when reads are much more frequent than writes. It is commonly used for
258+ caching, where reads are frequent and writes are infrequent.
259+
260+ Consider a library which generates the nth Fibonacci number. The library caches
261+ previously computed Fibonacci numbers.
262+
263+ ``` python
264+ cache = [0 , 1 ]
265+
266+
267+ def fib (nth : int ) -> int :
268+ global cache
269+ if nth < 1 :
270+ raise ValueError (" nth must be a positive integer" )
271+
272+ # Atomically read shared reference to global cache
273+ local_cache = cache
274+
275+ if nth > len (local_cache) + 1 :
276+ # Make a new un-shared list
277+ local_cache = local_cache.copy()
278+
279+ # Mutating here is safe because the list local_cache refers
280+ # to is private to this thread
281+ while nth >= len (local_cache):
282+ local_cache.append(local_cache[- 1 ] + local_cache[- 2 ])
283+
284+ # Atomically update global shared reference to point to the new list
285+ cache = local_cache
286+
287+ # Must use a reference to the local_cache because another thread
288+ # may have updated the global reference
289+ return local_cache[nth]
290+ ```
291+
292+ This code is thread-safe because the shared global cache is never modified
293+ in-place. Instead, a new copy of the cache is created and updated, and then the
294+ reference to the cache is updated atomically. This ensures that readers always
295+ see a consistent view of the cache, even if a writer is updating it
296+ concurrently.
297+
298+ This does not rely on the thread-safety of the underlying list. Instead, it
299+ relies on the fact that shared references can be read from and modified
300+ atomically. This means you can use this technique to allow lock-free access to a
301+ shared global cache implemented using a thread-unsafe data structure.
302+
303+ Note that for this to work correctly, readers must * not* assume that the shared
304+ reference (the global ` cache ` variable) will be unchanged from one access to the
305+ next. For example, this is not thread-safe:
306+
307+ ``` python
308+ if nth < len (cache):
309+ # Another thread may replace cache with a shorter list
310+ # after len(cache) but before cache[nth] so that this fails:
311+ return cache[nth]
312+ ```
313+
314+ Instead, readers should atomically copy the shared reference to a local
315+ variable and then only access the local variable:
316+
317+ ``` python
318+ local_cache = cache
319+ if nth < len (local_cache):
320+ # No other thread will reassign the local_cache variable
321+ # or mutate the object that it points to.
322+ return local_cache[nth]
323+ ```
324+
325+ Also keep in mind that readers may not necessarily see the most up-to-date
326+ version of the cache. The CPU cost to calculate some entries will be wasted if
327+ there are races to create a new cache. For memoization and other caching this is
328+ often fine but may be problematic for some use-cases.
329+
330+ ### Locking
225331
226332If a thread-local cache doesn't make sense, then you can serialize access to the
227333cache with a lock. A
@@ -274,10 +380,11 @@ held cannot lead to recursive calls or lead to a situation where a thread owning
274380the lock is blocked on acquiring a different mutex. You do not need to worry
275381about deadlocking with the GIL in pure Python code, the interpreter will handle
276382that for you.
383+
277384There is also
278385[ threading.RLock] ( https://docs.python.org/3/library/threading.html#rlock-objects ) ,
279386which provides a reentrant lock allowing threads to recursively acquire the same
280- lock.
387+ lock, but is not quite as performant as a ` threading.Lock ` in single-threaded use .
281388
282389Finally, note how the above code will ensure that only a single call to
283390` _do_expensive_calculation ` will run at any given time, regardless of ` arg ` .
0 commit comments