Skip to content

Commit 179efda

Browse files
authored
PYTHON-3406 Reinstate warning and docs that PyMongo is not fork safe (#1050)
Log child process C-level stacks when fork tests deadlock. Encode hostname to bytes to avoid getaddrinfo importlib deadlock.
1 parent b8cb1c1 commit 179efda

File tree

8 files changed

+120
-96
lines changed

8 files changed

+120
-96
lines changed

doc/changelog.rst

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -13,8 +13,11 @@ PyMongo 4.3 brings a number of improvements including:
1313
:class:`bson.codec_options.DatetimeConversion`, and
1414
:class:`bson.codec_options.CodecOptions`'s ``datetime_conversion``
1515
parameter for more details (`PYTHON-1824`_).
16-
- Added support for using a :class:`~pymongo.mongo_client.MongoClient` after
17-
an :py:func:`os.fork` (`PYTHON-2484`_).
16+
- PyMongo now resets its locks and other shared state in the child process
17+
after a :py:func:`os.fork` to reduce the frequency of deadlocks. Note that
18+
deadlocks are still possible because libraries that PyMongo depends like
19+
OpenSSL cannot be made fork() safe in multithreaded applications.
20+
(`PYTHON-2484`_). For more info see :ref:`pymongo-fork-safe`.
1821

1922
Bug fixes
2023
.........

doc/faq.rst

Lines changed: 32 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -14,21 +14,15 @@ for threaded applications.
1414
Is PyMongo fork-safe?
1515
---------------------
1616

17-
Starting in PyMongo 4.3, forking on a compatible Python interpreter while
18-
using a client will result in all locks held by :class:`~bson.objectid
19-
.ObjectId` and :class:`~pymongo.mongo_client.MongoClient` being released in
20-
the child, as well as state shared between child and parent processes being
21-
reset.
22-
23-
If greenlet has been imported (usually with a library like gevent or
24-
Eventlet), care must be taken when using instances of :class:`~pymongo
25-
.mongo_client.MongoClient` with ``fork()``. Specifically, instances of
26-
MongoClient must not be copied from a parent process to a child process.
27-
Instead, the parent process and each child process must create their own
28-
instances of MongoClient. Instances of MongoClient copied from the parent
29-
process have a high probability of deadlock in the child process due to the
30-
inherent incompatibilities between ``fork()``, threads, and locks described
31-
:ref:`below<pymongo-fork-safe-details>`.
17+
PyMongo is not fork-safe. Care must be taken when using instances of
18+
:class:`~pymongo.mongo_client.MongoClient` with ``fork()``. Specifically,
19+
instances of MongoClient must not be copied from a parent process to
20+
a child process. Instead, the parent process and each child process must
21+
create their own instances of MongoClient. Instances of MongoClient copied from
22+
the parent process have a high probability of deadlock in the child process due
23+
to the inherent incompatibilities between ``fork()``, threads, and locks
24+
described :ref:`below <pymongo-fork-safe-details>`. PyMongo will attempt to
25+
issue a warning if there is a chance of this deadlock occurring.
3226

3327
.. _pymongo-fork-safe-details:
3428

@@ -44,10 +38,33 @@ created by ``fork()`` only has one thread, so any locks that were taken out by
4438
other threads in the parent will never be released in the child. The next time
4539
the child process attempts to acquire one of these locks, deadlock occurs.
4640

41+
Starting in version 4.3, PyMongo utilizes :py:func:`os.register_at_fork` to
42+
reset its locks and other shared state in the child process after a
43+
:py:func:`os.fork` to reduce the frequency of deadlocks. However deadlocks
44+
are still possible because libraries that PyMongo depends on, like `OpenSSL`_
45+
and `getaddrinfo(3)`_ (on some platforms), are not fork() safe in a
46+
multithreaded application. Linux also imposes the restriction that:
47+
48+
After a `fork()`_ in a multithreaded program, the child can
49+
safely call only async-signal-safe functions (see
50+
`signal-safety(7)`_) until such time as it calls `execve(2)`_.
51+
52+
PyMongo relies on functions that are *not* `async-signal-safe`_ and hence the
53+
child process can experience deadlocks or crashes when attempting to call
54+
a non `async-signal-safe`_ function. For examples of deadlocks or crashes
55+
that could occur see `PYTHON-3406`_.
56+
4757
For a long but interesting read about the problems of Python locks in
4858
multithreaded contexts with ``fork()``, see http://bugs.python.org/issue6721.
4959

5060
.. _not fork-safe: http://bugs.python.org/issue6721
61+
.. _OpenSSL: https://github.com/openssl/openssl/issues/19066
62+
.. _fork(): https://man7.org/linux/man-pages/man2/fork.2.html
63+
.. _signal-safety(7): https://man7.org/linux/man-pages/man7/signal-safety.7.html
64+
.. _async-signal-safe: https://man7.org/linux/man-pages/man7/signal-safety.7.html
65+
.. _execve(2): https://man7.org/linux/man-pages/man2/execve.2.html
66+
.. _getaddrinfo(3): https://man7.org/linux/man-pages/man3/gai_strerror.3.html
67+
.. _PYTHON-3406: https://jira.mongodb.org/browse/PYTHON-3406
5168

5269
.. _connection-pooling:
5370

pymongo/mongo_client.py

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -82,7 +82,7 @@
8282
ServerSelectionTimeoutError,
8383
WaitQueueTimeoutError,
8484
)
85-
from pymongo.lock import _create_lock, _release_locks
85+
from pymongo.lock import _HAS_REGISTER_AT_FORK, _create_lock, _release_locks
8686
from pymongo.pool import ConnectionClosedReason
8787
from pymongo.read_preferences import ReadPreference, _ServerMode
8888
from pymongo.server_selectors import writable_server_selector
@@ -831,9 +831,10 @@ def __init__(
831831
self._encrypter = _Encrypter(self, self.__options.auto_encryption_opts)
832832
self._timeout = self.__options.timeout
833833

834-
# Add this client to the list of weakly referenced items.
835-
# This will be used later if we fork.
836-
MongoClient._clients[self._topology._topology_id] = self
834+
if _HAS_REGISTER_AT_FORK:
835+
# Add this client to the list of weakly referenced items.
836+
# This will be used later if we fork.
837+
MongoClient._clients[self._topology._topology_id] = self
837838

838839
def _init_background(self):
839840
self._topology = Topology(self._topology_settings)
@@ -2177,7 +2178,7 @@ def _after_fork_child():
21772178
client._after_fork()
21782179

21792180

2180-
if hasattr(os, "register_at_fork"):
2181+
if _HAS_REGISTER_AT_FORK:
21812182
# This will run in the same thread as the fork was called.
21822183
# If we fork in a critical region on the same thread, it should break.
21832184
# This is fine since we would never call fork directly from a critical region.

pymongo/pool.py

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -979,9 +979,11 @@ def _create_connection(address, options):
979979
This is a modified version of create_connection from CPython >= 2.7.
980980
"""
981981
host, port = address
982+
# Avoid the getaddrinfo importlib deadlock on fork() described in PYTHON-3406.
983+
host = host.encode("idna")
982984

983985
# Check if dealing with a unix domain socket
984-
if host.endswith(".sock"):
986+
if host.endswith(b".sock"):
985987
if not hasattr(socket, "AF_UNIX"):
986988
raise ConnectionFailure("UNIX-sockets are not supported on this system")
987989
sock = socket.socket(socket.AF_UNIX)
@@ -998,7 +1000,7 @@ def _create_connection(address, options):
9981000
# is 'localhost' (::1 is fine). Avoids slow connect issues
9991001
# like PYTHON-356.
10001002
family = socket.AF_INET
1001-
if socket.has_ipv6 and host != "localhost":
1003+
if socket.has_ipv6 and host != b"localhost":
10021004
family = socket.AF_UNSPEC
10031005

10041006
err = None

pymongo/topology.py

Lines changed: 7 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,7 @@
3636
WriteError,
3737
)
3838
from pymongo.hello import Hello
39-
from pymongo.lock import _HAS_REGISTER_AT_FORK, _create_lock
39+
from pymongo.lock import _create_lock
4040
from pymongo.monitor import SrvMonitor
4141
from pymongo.pool import PoolOptions
4242
from pymongo.server import Server
@@ -174,13 +174,12 @@ def open(self):
174174
self._pid = pid
175175
elif pid != self._pid:
176176
self._pid = pid
177-
if not _HAS_REGISTER_AT_FORK:
178-
warnings.warn(
179-
"MongoClient opened before fork. May not be entirely fork-safe, "
180-
"proceed with caution. See PyMongo's documentation for details: "
181-
"https://pymongo.readthedocs.io/en/stable/faq.html#"
182-
"is-pymongo-fork-safe"
183-
)
177+
warnings.warn(
178+
"MongoClient opened before fork. May not be entirely fork-safe, "
179+
"proceed with caution. See PyMongo's documentation for details: "
180+
"https://pymongo.readthedocs.io/en/stable/faq.html#"
181+
"is-pymongo-fork-safe"
182+
)
184183
with self._lock:
185184
# Close servers and clear the pools.
186185
for server in self._servers.values():

test/__init__.py

Lines changed: 60 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -21,8 +21,10 @@
2121
import os
2222
import signal
2323
import socket
24+
import subprocess
2425
import sys
2526
import threading
27+
import time
2628
import traceback
2729
import unittest
2830
import warnings
@@ -1011,8 +1013,28 @@ def fork(
10111013
with self.fork(target=lambda: print('in child')) as proc:
10121014
self.assertTrue(proc.pid) # Child process was started
10131015
"""
1016+
1017+
def _print_threads(*args: object) -> None:
1018+
if _print_threads.called:
1019+
return
1020+
_print_threads.called = True
1021+
print_thread_tracebacks()
1022+
1023+
_print_threads.called = False
1024+
1025+
def _target() -> None:
1026+
signal.signal(signal.SIGUSR1, _print_threads)
1027+
try:
1028+
target()
1029+
except Exception as exc:
1030+
sys.stderr.write(f"Child process failed with: {exc}\n")
1031+
_print_threads()
1032+
# Sleep for a while to let the parent attach via GDB.
1033+
time.sleep(2 * timeout)
1034+
raise
1035+
10141036
ctx = multiprocessing.get_context("fork")
1015-
proc = ctx.Process(target=target)
1037+
proc = ctx.Process(target=_target)
10161038
proc.start()
10171039
try:
10181040
yield proc # type: ignore
@@ -1021,15 +1043,47 @@ def fork(
10211043
pid = proc.pid
10221044
assert pid
10231045
if proc.exitcode is None:
1024-
# If it failed, SIGINT to get traceback and wait 10s.
1025-
os.kill(pid, signal.SIGINT)
1026-
proc.join(10)
1027-
proc.kill()
1028-
proc.join(1)
1046+
# gdb to get C-level tracebacks
1047+
print_thread_stacks(pid)
1048+
# If it failed, SIGUSR1 to get thread tracebacks.
1049+
os.kill(pid, signal.SIGUSR1)
1050+
proc.join(5)
1051+
if proc.exitcode is None:
1052+
# SIGINT to get main thread traceback in case SIGUSR1 didn't work.
1053+
os.kill(pid, signal.SIGINT)
1054+
proc.join(5)
1055+
if proc.exitcode is None:
1056+
# SIGKILL in case SIGINT didn't work.
1057+
proc.kill()
1058+
proc.join(1)
10291059
self.fail(f"child timed out after {timeout}s (see traceback in logs): deadlock?")
10301060
self.assertEqual(proc.exitcode, 0)
10311061

10321062

1063+
def print_thread_tracebacks() -> None:
1064+
"""Print all Python thread tracebacks."""
1065+
for thread_id, frame in sys._current_frames().items():
1066+
sys.stderr.write(f"\n--- Traceback for thread {thread_id} ---\n")
1067+
traceback.print_stack(frame, file=sys.stderr)
1068+
1069+
1070+
def print_thread_stacks(pid: int) -> None:
1071+
"""Print all C-level thread stacks for a given process id."""
1072+
if sys.platform == "darwin":
1073+
cmd = ["lldb", "--attach-pid", f"{pid}", "--batch", "--one-line", '"thread backtrace all"']
1074+
else:
1075+
cmd = ["gdb", f"--pid={pid}", "--batch", '--eval-command="thread apply all bt"']
1076+
1077+
try:
1078+
res = subprocess.run(
1079+
cmd, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, encoding="utf-8"
1080+
)
1081+
except Exception as exc:
1082+
sys.stderr.write(f"Could not print C-level thread stacks because {cmd[0]} failed: {exc}")
1083+
else:
1084+
sys.stderr.write(res.stdout)
1085+
1086+
10331087
class IntegrationTest(PyMongoTestCase):
10341088
"""Base class for TestCases that need a connection to MongoDB to pass."""
10351089

test/test_fork.py

Lines changed: 4 additions & 57 deletions
Original file line numberDiff line numberDiff line change
@@ -12,23 +12,19 @@
1212
# See the License for the specific language governing permissions and
1313
# limitations under the License.
1414

15-
"""Test that pymongo is fork safe."""
15+
"""Test that pymongo resets its own locks after a fork."""
1616

1717
import os
1818
import sys
1919
import unittest
2020
from multiprocessing import Pipe
2121

22-
from bson.objectid import ObjectId
23-
2422
sys.path[0:0] = [""]
2523

2624
from test import IntegrationTest
27-
from test.utils import (
28-
ExceptionCatchingThread,
29-
is_greenthread_patched,
30-
rs_or_single_client,
31-
)
25+
from test.utils import is_greenthread_patched
26+
27+
from bson.objectid import ObjectId
3228

3329

3430
@unittest.skipIf(
@@ -91,55 +87,6 @@ def target():
9187
passed, msg = parent_conn.recv()
9288
self.assertTrue(passed, msg)
9389

94-
def test_many_threaded(self):
95-
# Fork randomly while doing operations.
96-
clients = []
97-
for _ in range(10):
98-
c = rs_or_single_client()
99-
clients.append(c)
100-
self.addCleanup(c.close)
101-
102-
class ForkThread(ExceptionCatchingThread):
103-
def __init__(self, runner, clients):
104-
self.runner = runner
105-
self.clients = clients
106-
self.fork = False
107-
108-
super().__init__(target=self.fork_behavior)
109-
110-
def fork_behavior(self) -> None:
111-
def action(client):
112-
client.admin.command("ping")
113-
return 0
114-
115-
for i in range(200):
116-
# Pick a random client.
117-
rc = self.clients[i % len(self.clients)]
118-
if i % 50 == 0 and self.fork:
119-
# Fork
120-
def target():
121-
for c_ in self.clients:
122-
action(c_)
123-
c_.close()
124-
125-
with self.runner.fork(target=target) as proc:
126-
self.runner.assertTrue(proc.pid)
127-
action(rc)
128-
129-
threads = [ForkThread(self, clients) for _ in range(10)]
130-
threads[-1].fork = True
131-
for t in threads:
132-
t.start()
133-
134-
for t in threads:
135-
t.join()
136-
137-
for t in threads:
138-
self.assertIsNone(t.exc)
139-
140-
for c in clients:
141-
c.close()
142-
14390

14491
if __name__ == "__main__":
14592
unittest.main()

test/unified_format.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1591,7 +1591,8 @@ def run_scenario(self, spec, uri=None):
15911591
if i < attempts - 1:
15921592
print(
15931593
f"Retrying after attempt {i+1} of {self.id()} failed with:\n"
1594-
f"{traceback.format_exc()}"
1594+
f"{traceback.format_exc()}",
1595+
file=sys.stderr,
15951596
)
15961597
self.setUp()
15971598
continue

0 commit comments

Comments
 (0)