Skip to content

Commit 0abb5df

Browse files
authored
Warn when a process that has imported tiledb calls fork() (#1876)
TileDB uses multiple threads and it's easy to run into deadlocks on Linux where `fork` is the default for `multiprocessing` until Python 3.14. We want to see a warning like Python 3.12 emits when fork() is called with multiple threads detected (see the implementation in posixmodule.c). I see three different ways to do this. 1. Do nothing other than document the situation and wait for users to migrate to Python 3.12. 2. Warn about fork() immediately on import of tiledb. 3. Warn when `os.fork()` is called and tiledb has been imported. The pros for number one: only needing to reference the change coming in 3.12. The cons: this project currently supports 4 versions of Python before 3.12 and it'll be years before new users are entirely on 3.12. Also, to a first approximation, nobody reads documentation. Perhaps better said as: we often try to read as little documentation as we can get away with. The pros for number two: super easy to implement and all tiledb users see it. The cons: users would see the warning even if they had no intention of using `fork()`. It's noisy and would have to be actively silenced. The pros for number three: all tiledb users who call `os.fork()`, intentionally or not, get the warning and no one else. It's not noisy. The cons: it requires monkeypatching`os.fork()` or, suboptimally, using `os.register_at_fork()`. The only problem with `os.register_at_fork()` is that exceptions and warnings can't be raised from registered callables. We're limited to printing to stderr/stdout. This is a less good experience for developers and the project also currently checks in tests that tiledb doesn't print such output. Monkeypatching methods of the standard library isn't a perfect solution. We'd all prefer that modules we import incidentally didn't change things in the standard library, right? And there's no guarantee that a module imported after tiledb won't patch `os.fork()` again. On the other hand, users import tiledb deliberately, to use it. And it's mostly incompatible with `fork()`. The monkeypatch I'm suggesting for `os.fork()` doesn't change behavior of the stdlib function, it only wraps it in a warning. It's as innocuous as such a patch can be. --- Original commit comments --- * Add a failing test We want to get a DeprecationWarning like Python 3.12 emits when fork() is called with multiple threads detected. * Monkeypatch os.fork and raise a warning This is equivalent to the warnings added in Python 3.12. We don't try to count threads because TileDB is multithreaded. * Isolate built-in os.fork from in-test patching * Don't monkeypatch on Windows * Add a module level flag indicating whether os.fork needs a patch Drop use of monkeypatch fixture for os.fork isolation. Add some docs and comments to the two new fixtures and rename the test module. * Never access os.fork on windows * Make isolate_os_fork fixture yield on all platforms * Add missing win32 skip marks for two tests * Add another comment about patching and a history entry
1 parent a93be0c commit 0abb5df

File tree

4 files changed

+180
-1
lines changed

4 files changed

+180
-1
lines changed

HISTORY.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,9 @@
1+
# Next
2+
3+
## Improvements
4+
5+
* Warn when `os.fork()` is used in the presence of a Tiledb context [#1876](https://github.com/TileDB-Inc/TileDB-Py/pull/1876/files).
6+
17
# Release 0.24.0
28

39
* TileDB-Py 0.24.0 includes TileDB Embedded [2.18.2](https://github.com/TileDB-Inc/TileDB/releases/tag/2.18.2)

tiledb/ctx.py

Lines changed: 50 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@
1010
_ctx_var = ContextVar("ctx")
1111

1212
already_warned = False
13+
_needs_fork_wrapper = sys.platform != "win32" and sys.version_info < (3, 12)
1314

1415

1516
class Config(lt.Config):
@@ -345,6 +346,16 @@ def __init__(self, config: Config = None):
345346

346347
self._set_default_tags()
347348

349+
# The core tiledb library uses threads and it's easy
350+
# to experience deadlocks when forking a process that is using
351+
# tiledb. The project doesn't have a solution for this at the
352+
# moment other than to avoid using fork(), which is the same
353+
# recommendation that Python makes. Python 3.12 warns if you
354+
# fork() when multiple threads are detected and Python 3.14 will
355+
# make it so you never accidentally fork(): multiprocessing will
356+
# default to "spawn" on Linux.
357+
_ensure_os_fork_wrap()
358+
348359
def __repr__(self):
349360
return "tiledb.Ctx() [see Ctx.config() for configuration]"
350361

@@ -439,7 +450,6 @@ def check_ipykernel_warn_once():
439450
global already_warned
440451
if not already_warned:
441452
try:
442-
import sys
443453
import warnings
444454

445455
if "ipykernel" in sys.modules and tuple(
@@ -521,7 +531,46 @@ def default_ctx(config: Union["Config", dict] = None) -> "Ctx":
521531
ctx = _ctx_var.get()
522532
if config is not None:
523533
raise tiledb.TileDBError("Global context already initialized!")
534+
535+
# The core tiledb library uses threads and it's easy
536+
# to experience deadlocks when forking a process that is using
537+
# tiledb. The project doesn't have a solution for this at the
538+
# moment other than to avoid using fork(), which is the same
539+
# recommendation that Python makes. Python 3.12 warns if you
540+
# fork() when multiple threads are detected and Python 3.14 will
541+
# make it so you never accidentally fork(): multiprocessing will
542+
# default to "spawn" on Linux.
543+
_ensure_os_fork_wrap()
524544
except LookupError:
525545
ctx = tiledb.Ctx(config)
526546
_ctx_var.set(ctx)
527547
return ctx
548+
549+
550+
def _ensure_os_fork_wrap():
551+
global _needs_fork_wrapper
552+
if _needs_fork_wrapper:
553+
import os
554+
import warnings
555+
from functools import wraps
556+
557+
def warning_wrapper(func):
558+
@wraps(func)
559+
def wrapper():
560+
warnings.warn(
561+
"TileDB is a multithreading library and deadlocks "
562+
"are likely if fork() is called after a TileDB "
563+
"context has been created (such as for array "
564+
"access). To safely use TileDB with "
565+
"multiprocessing or concurrent.futures, choose "
566+
"'spawn' as the start method for child processes. "
567+
"For example: "
568+
"multiprocessing.set_start_method('spawn').",
569+
UserWarning,
570+
)
571+
return func()
572+
573+
return wrapper
574+
575+
os.fork = warning_wrapper(os.fork)
576+
_needs_fork_wrapper = False

tiledb/tests/conftest.py

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
11
import ctypes
2+
import os
23
import sys
34

45
import pytest
@@ -98,3 +99,24 @@ def __init__(self, params=None):
9899

99100
tiledb.Ctx = PatchedCtx
100101
tiledb.Config = PatchedConfig
102+
103+
104+
@pytest.fixture(scope="function", autouse=True)
105+
def isolate_os_fork(original_os_fork):
106+
"""Guarantee that tests start and finish with no os.fork patch."""
107+
# Python 3.12 warns about fork() and threads. Tiledb only patches
108+
# os.fork for Pythons 3.8-3.11.
109+
if original_os_fork:
110+
tiledb.ctx._needs_fork_wrapper = True
111+
os.fork = original_os_fork
112+
yield
113+
if original_os_fork:
114+
tiledb.ctx._needs_fork_wrapper = True
115+
os.fork = original_os_fork
116+
117+
118+
@pytest.fixture(scope="session")
119+
def original_os_fork():
120+
"""Provides the original unpatched os.fork."""
121+
if sys.platform != "win32":
122+
return os.fork

tiledb/tests/test_fork_ctx.py

Lines changed: 102 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,102 @@
1+
"""Tests combining fork with tiledb context threads.
2+
3+
Background: the core tiledb library uses threads and it's easy to
4+
experience deadlocks when forking a process that is using tiledb. The
5+
project doesn't have a solution for this at the moment other than to
6+
avoid using fork(), which is the same recommendation that Python makes.
7+
Python 3.12 warns if you fork() when multiple threads are detected and
8+
Python 3.14 will make it so you never accidentally fork():
9+
multiprocessing will default to "spawn" on Linux.
10+
"""
11+
12+
import multiprocessing
13+
import os
14+
import sys
15+
import warnings
16+
17+
import pytest
18+
19+
import tiledb
20+
21+
22+
@pytest.mark.skipif(
23+
sys.platform == "win32", reason="fork() is not available on Windows"
24+
)
25+
def test_no_warning_fork_without_ctx():
26+
"""Get no warning if no tiledb context exists."""
27+
with warnings.catch_warnings():
28+
warnings.simplefilter("error")
29+
pid = os.fork()
30+
if pid == 0:
31+
os._exit(0)
32+
else:
33+
os.wait()
34+
35+
36+
@pytest.mark.skipif(
37+
sys.platform == "win32", reason="fork() is not available on Windows"
38+
)
39+
def test_warning_fork_with_ctx():
40+
"""Get a warning if we fork after creating a tiledb context."""
41+
_ = tiledb.Ctx()
42+
with pytest.warns(UserWarning, match="TileDB is a multithreading library"):
43+
pid = os.fork()
44+
if pid == 0:
45+
os._exit(0)
46+
else:
47+
os.wait()
48+
49+
50+
@pytest.mark.skipif(
51+
sys.platform == "win32", reason="fork() is not available on Windows"
52+
)
53+
def test_warning_fork_with_default_ctx():
54+
"""Get a warning if we fork after creating a default context."""
55+
_ = tiledb.default_ctx()
56+
with pytest.warns(UserWarning, match="TileDB is a multithreading library"):
57+
pid = os.fork()
58+
if pid == 0:
59+
os._exit(0)
60+
else:
61+
os.wait()
62+
63+
pass
64+
65+
66+
@pytest.mark.skipif(
67+
sys.platform == "win32", reason="fork() is not available on Windows"
68+
)
69+
def test_no_warning_multiprocessing_without_ctx():
70+
"""Get no warning if no tiledb context exists."""
71+
with warnings.catch_warnings():
72+
warnings.simplefilter("error")
73+
mp = multiprocessing.get_context("fork")
74+
p = mp.Process()
75+
p.start()
76+
p.join()
77+
78+
79+
@pytest.mark.skipif(
80+
sys.platform == "win32", reason="fork() is not available on Windows"
81+
)
82+
def test_warning_multiprocessing_with_ctx():
83+
"""Get a warning if we fork after creating a tiledb context."""
84+
_ = tiledb.Ctx()
85+
mp = multiprocessing.get_context("fork")
86+
p = mp.Process()
87+
with pytest.warns(UserWarning, match="TileDB is a multithreading library"):
88+
p.start()
89+
p.join()
90+
91+
92+
@pytest.mark.skipif(
93+
sys.platform == "win32", reason="fork() is not available on Windows"
94+
)
95+
def test_warning_multiprocessing_with_default_ctx():
96+
"""Get a warning if we fork after creating a default context."""
97+
_ = tiledb.default_ctx()
98+
mp = multiprocessing.get_context("fork")
99+
p = mp.Process()
100+
with pytest.warns(UserWarning, match="TileDB is a multithreading library"):
101+
p.start()
102+
p.join()

0 commit comments

Comments
 (0)