Skip to content

Commit a665ffe

Browse files
emmettbutlerYun-KimP403n1x87juanjuxbrettlangdon
authored
fix(internal): remove nogevent compatibility layer (#5105)
This pull request removes the `nogevent` compatibility layer from the library and tests. It also changes the behavior of some feature flags related to gevent and adjusts a whole lot of tests to work with these changes. This change was manually tested on delancie-dummy staging and shown to fix the issue we were investigating there. See noteboook 4782453 in the ddstaging datadoghq account for a detailed look at the metrics collected during that test. ## What Changed? * `sitecustomize.py`: refactored module cloning logic; changed the default of `DD_UNLOAD_MODULES_FROM_SITECUSTOMIZE` to `auto` meaning it runs when gevent is installed; unloaded a few additional modules like `time` and `attrs` that were causing tests and manual checks to fail; deprecated `DD_GEVENT_PATCH_ALL` flag * Removed various hacks and layers that had intended to fix gevent compatibility, including in `forksafe.py`, `periodic.py`, `ddtrace_gevent_check.py`, and `nogevent.py` * `profiling/`: adjusted all uses of the removed module `nogevent` to work with `threading` * Adjusted tests to work with these removals We tried to separate some of these changes into other pull requests, which you can see linked in the discussion below. Because of how difficult it is to replicate the issue we're chasing outside of the staging environment, we decided to minimize the number of variables under test and combine the various changes into a single pull request. This makes it a bit harder to review, which we've tried to mitigate with the checklist above. ## Risk The main risk in this change is the change to the default behavior of module cloning. We've mitigated this risk with the automated test suite as well as manual testing described in the notebook above. **Why doesn't this change put all new behavior behind a feature flag and leave the default behavior untouched?** The main reason for this decision is pragmatic: it's really hard to test for the issue this solves, requiring a turnaround time of about an hour to get feedback from changes. The secondary reason is that the `nogevent` layer is highly coupled to the rest of the library's code, and putting it behind a feature flag is a significant and nontrivial effort. The third reason is that full support of all of the various configurations and combinations with other tools that gevent can be used in is a goal that we could probably spend infinite time on if we chose to. Given this, we need to intentionally set a goal that solves the current and likely near-future issues as completely as possible, make it the default behavior, and call this effort "done". @brettlangdon @P403n1x87 @Yun-Kim and I are in agreement that the evidence in noteboook 4782453 in the ddstaging datadoghq account is enough to justify this change to the default behavior. ## Performance Testing Performance testing with a sample flask application (notebook 4442578) shows no immediately noticeable impact of tracing. Dynamic instrumentation seems to cause slow-downs, and the reason has been tracked down to joining service threads on shutdown. Avoiding the joins cures the problem, but further testing is required to ensure that DI still behaves as intended. Profiling also shows a slow-down in the application response when enabled. This seems to be due to retrieving the response from the agent after the payload has been uploaded. A potential solution to this might be offered by libdatadog. The following are the details of the scenario used to measure the performance under different configurations. The application is the simple Flask app of the issue reproducer: ``` # app.py import os import time from ddtrace.internal.remoteconfig import RemoteConfig from flask import Flask app = Flask(__name__) def start(): pid1 = os.fork() if pid1 == 0: os.setsid() x = 2 while x > 0: time.sleep(0.2) x -= 1 else: os.waitpid(pid1, 0) @app.route("/") def index(): start() return "OK" if RemoteConfig._worker is not None else "NOK" ``` We can control what products to start with the following run.sh script ``` #!/bin/bash source .venv/bin/activate export DD_DYNAMIC_INSTRUMENTATION_ENABLED=true export DD_PROFILING_ENABLED=false export DD_TRACE_ENABLED=false export DD_ENV=gab-testing export DD_SERVICE=flask-gevent ddtrace-run gunicorn -w 3 -k gevent app:app deactivate ``` To run the app we create a virtual environment with ``` python3.9 -m venv .venv source .venv/bin/activate pip install flask gevent gunicorn pip install -e path/to/dd-trace-py deactivate ``` and then invoke the script, adjusting the exported variables as required ``` chmod +x run.sh ./run.sh ``` In another terminal we can check the average response time by sending requests to the application while running. With the following simple k6 script ``` import http from 'k6/http'; export default function () { http.get('http://localhost:8000'); } ``` We invoke k6 with ``` k6 run -d 30s script.js ``` and look for this line in the output ``` http_req_duration..............: avg=335.68ms min=119.56ms med=418.76ms max=451.49ms p(90) ``` Co-authored-by: Yun Kim <[email protected]> Co-authored-by: Gabriele N. Tornetta <[email protected]> Co-authored-by: Juanjo Alvarez Martinez <[email protected]> Co-authored-by: Gabriele N. Tornetta <[email protected]> Co-authored-by: Yun Kim <[email protected]> Co-authored-by: Brett Langdon <[email protected]>
1 parent de60d7e commit a665ffe

35 files changed

+469
-1017
lines changed

ddtrace/auto.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
import ddtrace.bootstrap.sitecustomize # noqa

ddtrace/bootstrap/sitecustomize.py

Lines changed: 76 additions & 102 deletions
Original file line numberDiff line numberDiff line change
@@ -5,109 +5,19 @@
55
import sys
66

77

8-
MODULES_LOADED_AT_STARTUP = frozenset(sys.modules.keys())
9-
MODULES_THAT_TRIGGER_CLEANUP_WHEN_INSTALLED = ("gevent",)
10-
11-
12-
import os # noqa
13-
14-
15-
"""
16-
The following modules cause problems when being unloaded/reloaded in module cloning.
17-
Notably, unloading the atexit module will remove all registered hooks which we use for cleaning up on tracer shutdown.
18-
The other listed modules internally maintain some state that does not coexist well if reloaded.
19-
"""
20-
MODULES_TO_NOT_CLEANUP = {"atexit", "asyncio", "attr", "concurrent", "ddtrace", "logging"}
21-
if sys.version_info < (3, 7):
22-
MODULES_TO_NOT_CLEANUP |= {"typing"} # required by older versions of Python
23-
if sys.version_info <= (2, 7):
24-
MODULES_TO_NOT_CLEANUP |= {"encodings", "codecs"}
25-
import imp
26-
27-
_unloaded_modules = []
28-
29-
def is_installed(module_name):
30-
try:
31-
imp.find_module(module_name)
32-
except ImportError:
33-
return False
34-
return True
35-
36-
37-
else:
38-
import importlib
39-
40-
def is_installed(module_name):
41-
return importlib.util.find_spec(module_name)
42-
43-
44-
def should_cleanup_loaded_modules():
45-
dd_unload_sitecustomize_modules = os.getenv("DD_UNLOAD_MODULES_FROM_SITECUSTOMIZE", default="0").lower()
46-
if dd_unload_sitecustomize_modules not in ("1", "auto"):
47-
return False
48-
elif dd_unload_sitecustomize_modules == "auto" and not any(
49-
is_installed(module_name) for module_name in MODULES_THAT_TRIGGER_CLEANUP_WHEN_INSTALLED
50-
):
51-
return False
52-
return True
53-
54-
55-
def cleanup_loaded_modules(aggressive=False):
56-
"""
57-
"Aggressive" here means "cleanup absolutely every module that has been loaded since startup".
58-
Non-aggressive cleanup entails leaving untouched certain modules
59-
This distinction is necessary because this function is used both to prepare for gevent monkeypatching
60-
(requiring aggressive cleanup) and to implement "module cloning" (requiring non-aggressive cleanup)
61-
"""
62-
# Figuring out modules_loaded_since_startup is necessary because sys.modules has more in it than just what's in
63-
# import statements in this file, and unloading some of them can break the interpreter.
64-
modules_loaded_since_startup = set(_ for _ in sys.modules if _ not in MODULES_LOADED_AT_STARTUP)
65-
# Unload all the modules that we have imported, except for ddtrace and a few
66-
# others that don't like being cloned.
67-
# Doing so will allow ddtrace to continue using its local references to modules unpatched by
68-
# gevent, while avoiding conflicts with user-application code potentially running
69-
# `gevent.monkey.patch_all()` and thus gevent-patched versions of the same modules.
70-
for module_name in modules_loaded_since_startup:
71-
if aggressive:
72-
del sys.modules[module_name]
73-
continue
74-
75-
for module_to_not_cleanup in MODULES_TO_NOT_CLEANUP:
76-
if module_name == module_to_not_cleanup:
77-
break
78-
elif module_name.startswith("%s." % module_to_not_cleanup):
79-
break
80-
else:
81-
del sys.modules[module_name]
82-
# Some versions of CPython import the time module during interpreter startup, which needs to be unloaded.
83-
if "time" in sys.modules:
84-
del sys.modules["time"]
85-
86-
87-
will_run_module_cloning = should_cleanup_loaded_modules()
88-
if not will_run_module_cloning:
89-
# Perform gevent patching as early as possible in the application before
90-
# importing more of the library internals.
91-
if os.environ.get("DD_GEVENT_PATCH_ALL", "false").lower() in ("true", "1"):
92-
# successfully running `gevent.monkey.patch_all()` this late into
93-
# sitecustomize requires aggressive module unloading beforehand.
94-
# gevent's documentation strongly warns against calling monkey.patch_all() anywhere other
95-
# than the first line of the program. since that's what we're doing here,
96-
# we cleanup aggressively beforehand to replicate the conditions at program start
97-
# as closely as possible.
98-
cleanup_loaded_modules(aggressive=True)
99-
import gevent.monkey
100-
101-
gevent.monkey.patch_all()
8+
LOADED_MODULES = frozenset(sys.modules.keys())
1029

10310
import logging # noqa
10411
import os # noqa
10512
from typing import Any # noqa
10613
from typing import Dict # noqa
14+
import warnings # noqa
10715

10816
from ddtrace import config # noqa
10917
from ddtrace.debugging._config import config as debugger_config # noqa
18+
from ddtrace.internal.compat import PY2 # noqa
11019
from ddtrace.internal.logger import get_logger # noqa
20+
from ddtrace.internal.module import find_loader # noqa
11121
from ddtrace.internal.runtime.runtime_metrics import RuntimeWorker # noqa
11222
from ddtrace.internal.utils.formats import asbool # noqa
11323
from ddtrace.internal.utils.formats import parse_tags_str # noqa
@@ -142,6 +52,25 @@ def cleanup_loaded_modules(aggressive=False):
14252

14353
log = get_logger(__name__)
14454

55+
if os.environ.get("DD_GEVENT_PATCH_ALL") is not None:
56+
deprecate(
57+
"The environment variable DD_GEVENT_PATCH_ALL is deprecated and will be removed in a future version. ",
58+
postfix="There is no special configuration necessary to make ddtrace work with gevent if using ddtrace-run. "
59+
"If not using ddtrace-run, import ddtrace.auto before calling gevent.monkey.patch_all().",
60+
removal_version="2.0.0",
61+
)
62+
if "gevent" in sys.modules or "gevent.monkey" in sys.modules:
63+
import gevent.monkey # noqa
64+
65+
if gevent.monkey.is_module_patched("threading"):
66+
warnings.warn(
67+
"Loading ddtrace after gevent.monkey.patch_all() is not supported and is "
68+
"likely to break the application. Use ddtrace-run to fix this, or "
69+
"import ddtrace.auto before calling gevent.monkey.patch_all().",
70+
RuntimeWarning,
71+
)
72+
73+
14574
EXTRA_PATCHED_MODULES = {
14675
"bottle": True,
14776
"django": True,
@@ -162,6 +91,52 @@ def update_patched_modules():
16291
EXTRA_PATCHED_MODULES[module] = asbool(should_patch)
16392

16493

94+
if PY2:
95+
_unloaded_modules = []
96+
97+
98+
def is_module_installed(module_name):
99+
return find_loader(module_name) is not None
100+
101+
102+
def cleanup_loaded_modules():
103+
MODULES_REQUIRING_CLEANUP = ("gevent",)
104+
do_cleanup = os.getenv("DD_UNLOAD_MODULES_FROM_SITECUSTOMIZE", default="auto").lower()
105+
if do_cleanup == "auto":
106+
do_cleanup = any(is_module_installed(m) for m in MODULES_REQUIRING_CLEANUP)
107+
108+
if not asbool(do_cleanup):
109+
return
110+
111+
# Unload all the modules that we have imported, except for the ddtrace one.
112+
# NB: this means that every `import threading` anywhere in `ddtrace/` code
113+
# uses a copy of that module that is distinct from the copy that user code
114+
# gets when it does `import threading`. The same applies to every module
115+
# not in `KEEP_MODULES`.
116+
KEEP_MODULES = frozenset(["atexit", "ddtrace", "asyncio", "concurrent", "typing", "logging", "attr"])
117+
for m in list(_ for _ in sys.modules if _ not in LOADED_MODULES):
118+
if any(m == _ or m.startswith(_ + ".") for _ in KEEP_MODULES):
119+
continue
120+
121+
if PY2:
122+
KEEP_MODULES_PY2 = frozenset(["encodings", "codecs"])
123+
if any(m == _ or m.startswith(_ + ".") for _ in KEEP_MODULES_PY2):
124+
continue
125+
# Store a reference to deleted modules to avoid them being garbage
126+
# collected
127+
_unloaded_modules.append(sys.modules[m])
128+
129+
del sys.modules[m]
130+
131+
# TODO: The better strategy is to identify the core modues in LOADED_MODULES
132+
# that should not be unloaded, and then unload as much as possible.
133+
UNLOAD_MODULES = frozenset(["time"])
134+
for u in UNLOAD_MODULES:
135+
for m in list(sys.modules):
136+
if m == u or m.startswith(u + "."):
137+
del sys.modules[m]
138+
139+
165140
try:
166141
from ddtrace import tracer
167142

@@ -202,19 +177,18 @@ def update_patched_modules():
202177
if not opts:
203178
tracer.configure(**opts)
204179

205-
# We need to clean up after we have imported everything we need from
206-
# ddtrace, but before we register the patch-on-import hooks for the
207-
# integrations. This is because registering a hook for a module
208-
# that is already imported causes the module to be patched immediately.
209-
# So if we unload the module after registering hooks, we effectively
210-
# remove the patching, thus breaking the tracer integration.
211-
if will_run_module_cloning:
212-
cleanup_loaded_modules()
213180
if trace_enabled:
214181
update_patched_modules()
215182
from ddtrace import patch_all
216183

184+
# We need to clean up after we have imported everything we need from
185+
# ddtrace, but before we register the patch-on-import hooks for the
186+
# integrations.
187+
cleanup_loaded_modules()
188+
217189
patch_all(**EXTRA_PATCHED_MODULES)
190+
else:
191+
cleanup_loaded_modules()
218192

219193
# Only the import of the original sitecustomize.py is allowed after this
220194
# point.

ddtrace/contrib/gevent/__init__.py

Lines changed: 3 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -5,11 +5,9 @@
55
The integration patches the gevent internals to add context management logic.
66
77
.. note::
8-
If :ref:`ddtrace-run<ddtracerun>` is being used set ``DD_GEVENT_PATCH_ALL=true`` and
9-
``gevent.monkey.patch_all()`` will be called as early as possible in the application
10-
to avoid patching conflicts.
11-
If ``ddtrace-run`` is not being used then be sure to call ``gevent.monkey.patch_all``
12-
before importing ``ddtrace`` and calling ``ddtrace.patch`` or ``ddtrace.patch_all``.
8+
If ``ddtrace-run`` is not being used then be sure to ``import ddtrace.auto``
9+
before calling ``gevent.monkey.patch_all``.
10+
If ``ddtrace-run`` is being used then no additional configuration is required.
1311
1412
1513
The integration also configures the global tracer instance to use a gevent

ddtrace/contrib/gunicorn/__init__.py

Lines changed: 5 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -1,33 +1,10 @@
11
"""
2-
**Note:** ``ddtrace-run`` and Python 2 are both not supported with `Gunicorn <https://gunicorn.org>`__.
2+
ddtrace works with Gunicorn.
33
4-
``ddtrace`` only supports Gunicorn's ``gevent`` worker type when configured as follows:
5-
6-
- The application is running under a Python version >=3.6 and <=3.10
7-
- `ddtrace-run` is not used
8-
- The `DD_GEVENT_PATCH_ALL=1` environment variable is set
9-
- Gunicorn's ```post_fork`` <https://docs.gunicorn.org/en/stable/settings.html#post-fork>`__ hook does not import from
10-
``ddtrace``
11-
- ``import ddtrace.bootstrap.sitecustomize`` is called either in the application's main process or in the
12-
```post_worker_init`` <https://docs.gunicorn.org/en/stable/settings.html#post-worker-init>`__ hook.
13-
14-
.. code-block:: python
15-
16-
# gunicorn.conf.py
17-
def post_fork(server, worker):
18-
# don't touch ddtrace here
19-
pass
20-
21-
def post_worker_init(worker):
22-
import ddtrace.bootstrap.sitecustomize
23-
24-
workers = 4
25-
worker_class = "gevent"
26-
bind = "8080"
27-
28-
.. code-block:: bash
29-
30-
DD_GEVENT_PATCH_ALL=1 gunicorn --config gunicorn.conf.py path.to.my:app
4+
.. note::
5+
If you cannot wrap your Gunicorn server with the ``ddtrace-run``command and
6+
it uses ``gevent`` workers, be sure to ``import ddtrace.auto`` as early as
7+
possible in your application's lifecycle.
318
"""
329

3310

ddtrace/internal/forksafe.py

Lines changed: 0 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -7,8 +7,6 @@
77
import typing
88
import weakref
99

10-
from ddtrace.internal.module import ModuleWatchdog
11-
from ddtrace.internal.utils.formats import asbool
1210
from ddtrace.vendor import wrapt
1311

1412

@@ -24,26 +22,6 @@
2422
_soft = True
2523

2624

27-
def patch_gevent_hub_reinit(module):
28-
# The gevent hub is re-initialized *after* the after-in-child fork hooks are
29-
# called, so we patch the gevent.hub.reinit function to ensure that the
30-
# fork hooks run again after this further re-initialisation, if it is ever
31-
# called.
32-
from ddtrace.internal.wrapping import wrap
33-
34-
def wrapped_reinit(f, args, kwargs):
35-
try:
36-
return f(*args, **kwargs)
37-
finally:
38-
ddtrace_after_in_child()
39-
40-
wrap(module.reinit, wrapped_reinit)
41-
42-
43-
if asbool(os.getenv("_DD_TRACE_GEVENT_HUB_PATCHED", default=False)):
44-
ModuleWatchdog.register_module_hook("gevent.hub", patch_gevent_hub_reinit)
45-
46-
4725
def ddtrace_after_in_child():
4826
# type: () -> None
4927
global _registry

0 commit comments

Comments
 (0)