Skip to content

Commit 25e6bc1

Browse files
authored
fix(lib-injection): include package in image [backport #5410 to 1.11] (#5467)
Formerly, the Kubernetes lib injection mechanism worked by including a sitecustomize.py file on the path which would execute a `pip install ddtrace==...`. This approach has a few downsides: - Internet access required - Application overhead at startup - python/pip on the path doesn't have permission to install Instead, the ddtrace package and its dependencies can be baked into the library injection image. This is achieved with the script `dl_wheels.py` which downloads wheels matching ddtrace version, Python versions, platforms and architectures. It downloads the wheels and merges them into one big directory. This works as Python is able to pick out the right binary at run-time to load, so as long as the compatible binary is present, Python will use it correctly. The sitecustomize.py file now updates the PYTHONPATH to include the directory containing the ddtrace package and its dependencies. This works as Python is able to identify which binaries to use at runtime so all of the binaries can be included in the injection image. It was decided to only support Python 3.7 and above (officially supported Python versions) to keep the image size small. If support for additional Pythons is required then we can evaluate adding them in to the image. Fixes #5291. ## Risk The risk of this change is minimal compared to what existed previously. If the installation failed previously then it would likely fail here, similarly with no application impact. There is a risk of the increased image size. The image size is now up to >100MB (from ~3MB). ## Testing The existing tests are sufficient to test the new mechanism. ## Checklist - [x] Change(s) are motivated and described in the PR description. - [x] Testing strategy is described if automated tests are not included in the PR. - [x] Risk is outlined (performance impact, potential for breakage, maintainability, etc). - [x] Change is maintainable (easy to change, telemetry, documentation). - [x] [Library release note guidelines](https://ddtrace.readthedocs.io/en/stable/contributing.html#Release-Note-Guidelines) are followed. - [x] Documentation is included (in-code, generated user docs, [public corp docs](https://github.com/DataDog/documentation/)). - [x] PR description includes explicit acknowledgement/acceptance of the performance implications of this PR as reported in the benchmarks PR comment. ## Reviewer Checklist - [x] Title is accurate. - [x] No unnecessary changes are introduced. - [x] Description motivates each change. - [x] Avoids breaking [API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces) changes unless absolutely necessary. - [x] Testing strategy adequately addresses listed risk(s). - [x] Change is maintainable (easy to change, telemetry, documentation). - [x] Release note makes sense to a user of the library. - [x] Reviewer has explicitly acknowledged and discussed the performance implications of this PR as reported in the benchmarks PR comment.
1 parent 506081f commit 25e6bc1

File tree

12 files changed

+280
-84
lines changed

12 files changed

+280
-84
lines changed

.github/workflows/build_deploy.yml

Lines changed: 7 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -214,13 +214,10 @@ jobs:
214214

215215
build-and-publish-init-image:
216216
needs: [upload_pypi]
217-
# We have to wait for the PyPI job since the image that we publish depends on installing
218-
# the package from PyPI.
219-
uses: ./.github/workflows/build-and-publish-image.yml
220-
with:
221-
tags: ghcr.io/datadog/dd-trace-py/dd-lib-python-init:${{ github.ref_name }}
222-
platforms: 'linux/amd64,linux/arm64/v8'
223-
build-args: "DDTRACE_PYTHON_VERSION=${{ github.ref_name }}"
224-
context: ./lib-injection
225-
secrets:
226-
token: ${{ secrets.GITHUB_TOKEN }}
217+
steps:
218+
- uses: ./.github/workflows/lib-inject-publish.yml
219+
with:
220+
ddtrace-version: ${{ github.ref_name }}
221+
image-tag: ${{ github.ref_name }}
222+
secrets:
223+
token: ${{ secrets.GITHUB_TOKEN }}
Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,38 @@
1+
name: Build and publish library injection images
2+
3+
on:
4+
workflow_call:
5+
inputs:
6+
ddtrace-version:
7+
required: true
8+
type: string
9+
image-tag:
10+
required: true
11+
type: string
12+
secrets:
13+
token:
14+
required: true
15+
16+
jobs:
17+
wait_for_package:
18+
runs-on: ubuntu-latest
19+
steps:
20+
- uses: actions/setup-python@v4
21+
with:
22+
python-version: '3.11'
23+
- name: Wait for package to be available from PyPI
24+
run: |
25+
until pip install ddtrace==${{ inputs.ddtrace-version }}
26+
do
27+
sleep 20
28+
done
29+
build_push:
30+
needs: [wait_for_package]
31+
uses: ./.github/workflows/build-and-publish-image.yml
32+
with:
33+
tags: ghcr.io/datadog/dd-trace-py/dd-lib-python-init:${{ inputs.image-tag }}
34+
build-args: 'DDTRACE_PYTHON_VERSION=${{ inputs.ddtrace-version }}'
35+
platforms: 'linux/amd64,linux/arm64/v8'
36+
context: ./lib-injection
37+
secrets:
38+
token: ${{ secrets.token }}

.github/workflows/lib-injection.yml

Lines changed: 23 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -5,14 +5,12 @@ on:
55

66
jobs:
77
build-and-publish-test-image:
8-
uses: ./.github/workflows/build-and-publish-image.yml
9-
with:
10-
tags: 'ghcr.io/datadog/dd-trace-py/dd-lib-python-init:${{ github.sha }}'
11-
platforms: 'linux/amd64,linux/arm64/v8'
12-
build-args: 'DDTRACE_PYTHON_VERSION=git+https://github.com/Datadog/dd-trace-py@${{ github.sha }}'
13-
context: ./lib-injection
8+
uses: ./.github/workflows/lib-inject-publish.yml
149
secrets:
1510
token: ${{ secrets.GITHUB_TOKEN }}
11+
with:
12+
ddtrace-version: v1.10
13+
image-tag: ${{ github.sha }}
1614

1715
test:
1816
needs:
@@ -67,8 +65,23 @@ jobs:
6765
--network=test-inject \
6866
-p 8126:8126 \
6967
ghcr.io/datadog/dd-apm-test-agent/ddapm-test-agent:v1.7.2
70-
- name: Apply a fixed, stable version of the library
71-
run: sed -i "s~<DD_TRACE_VERSION_TO_BE_REPLACED>~git+https://github.com/Datadog/dd-trace-py@${{ github.sha }}~g" lib-injection/sitecustomize.py
68+
- name: Prepare the volume
69+
run: |
70+
cd lib-injection
71+
mkdir -p lib-injection/ddtrace_pkgs
72+
cp sitecustomize.py lib-injection/
73+
./dl_megawheel.py \
74+
--ddtrace-version=v1.10 \
75+
--python-version=3.11 \
76+
--python-version=3.10 \
77+
--python-version=3.9 \
78+
--python-version=3.8 \
79+
--python-version=3.7 \
80+
--ddtrace-version=v1.10 \
81+
--arch x86_64 \
82+
--platform manylinux2014 \
83+
--output-dir ddtrace_pkgs \
84+
--verbose
7285
- name: Build test app
7386
run: |
7487
docker build \
@@ -83,8 +96,8 @@ jobs:
8396
-e PYTHONPATH=/lib-injection \
8497
-v $PWD/lib-injection:/lib-injection \
8598
${{matrix.variant}}
86-
# Package has to be built from source, wait a while
87-
sleep 85
99+
# Wait for the app to start
100+
sleep 5
88101
docker logs ${{matrix.variant}}
89102
- name: Test the app
90103
run: |

lib-injection/Dockerfile

Lines changed: 25 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,35 @@
11
# This image provides the files needed to install the ddtrace Python package
22
# and auto instrument Python applications in containerized environments.
3-
FROM busybox
3+
FROM python:3.10
4+
WORKDIR /build
5+
ARG DDTRACE_PYTHON_VERSION
6+
RUN python3 -m pip install -U pip==23.0.1
7+
RUN python3 -m pip install packaging==23.0
8+
RUN mkdir -p pkgs
9+
ADD ./dl_megawheel.py .
10+
# Note that we only get Python >= 3.7. This is to keep the size of the image
11+
# as small as possible.
12+
RUN python3 dl_megawheel.py \
13+
--python-version=3.11 \
14+
--python-version=3.10 \
15+
--python-version=3.9 \
16+
--python-version=3.8 \
17+
--python-version=3.7 \
18+
--ddtrace-version=${DDTRACE_PYTHON_VERSION} \
19+
--arch x86_64 \
20+
--arch aarch64 \
21+
--platform musllinux_1_1 \
22+
--platform manylinux2014 \
23+
--output-dir /build/pkgs \
24+
--verbose
425

26+
FROM busybox
27+
COPY --from=0 /build/pkgs /datadog-init/ddtrace_pkgs
528
ARG UID=10000
6-
ARG DDTRACE_PYTHON_VERSION
7-
ENV DDTRACE_PYTHON_VERSION=$DDTRACE_PYTHON_VERSION
829
RUN addgroup -g 10000 -S datadog && \
930
adduser -u ${UID} -S datadog -G datadog
31+
RUN chown -R datadog:datadog /datadog-init/ddtrace_pkgs
1032
USER ${UID}
1133
WORKDIR /datadog-init
1234
ADD sitecustomize.py /datadog-init/sitecustomize.py
13-
# Use ~ as a delimiter because git urls can contain slashes
14-
RUN sed -i "s~<DD_TRACE_VERSION_TO_BE_REPLACED>~${DDTRACE_PYTHON_VERSION}~g" /datadog-init/sitecustomize.py
1535
ADD copy-lib.sh /datadog-init/copy-lib.sh

lib-injection/README.md

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
# Kubernetes library injection
2+
3+
This directory contains scripts and a Docker image for providing the ddtrace
4+
package via a Kubernetes [init
5+
container](https://kubernetes.io/docs/concepts/workloads/pods/init-containers/)
6+
which allows users to easily instrument Python applications without requiring
7+
changes to the application image.
8+
9+
The `Dockerfile` defines the image that is published for `ddtrace` which is used
10+
as a Kubernetes InitContainer. Kubernetes runs it before deployment pods start.
11+
It is responsible for providing the files necessary to run `ddtrace` in an
12+
arbitrary downstream application container. It also provides a script to copy
13+
the necessary files to a given directory.
14+
15+
The `dl_megawheel.py` script provides a portable `ddtrace` package. It is
16+
responsible for downloading and merging the published wheels of `ddtrace` and
17+
its dependencies.
18+
19+
The Datadog Admission Controller injects the InitContainer with a new volume
20+
mount to the application deployment. The script to copy files out of the
21+
InitContainer is run to copy the files to the volume. The `PYTHONPATH`
22+
environment variable is injected into the application container along with the
23+
volume mount.
24+
25+
The files copied to the volume are:
26+
27+
- `sitecustomize.py`: Python module that gets run automatically on interpreter startup when it is detected in the `PYTHONPATH`. When executed, it updates the Python path further to include the `ddtrace_pkgs/` directory and then calls `import ddtrace.bootstrap.sitecustomize` which performs automatic instrumentation.
28+
- `ddtrace_pkgs/`: Directory containing the `ddtrace` package and its dependencies for each Python version, platform and architecture.
29+
30+
31+
The `PYTHONPATH` environment variable is set to the shared volume directory
32+
which contains `sitecustomize.py` and `ddtrace_pkgs`. The environment variable
33+
is injected into the the application container. This enables the
34+
`sitecustomize.py` file to execute on any Python interpreter startup which
35+
results in the automatic instrument being applied to the application.

lib-injection/copy-lib.sh

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,3 +3,4 @@
33
# This script is used by the admission controller to install the library from the
44
# init container into the application container.
55
cp sitecustomize.py "$1/sitecustomize.py"
6+
cp -r ddtrace_pkgs "$1/ddtrace_pkgs"

lib-injection/dl_megawheel.py

Lines changed: 116 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,116 @@
1+
#!/usr/bin/env python3
2+
"""
3+
Script to download all required wheels (including dependencies) of the ddtrace
4+
Python package for relevant Python versions (+ abis), C library platforms and
5+
architectures and merge them into a "megawheel" directory.
6+
7+
This directory provides a portable installation of ddtrace which can be used
8+
on multiple platforms and architectures.
9+
10+
Currently the only OS supported is Linux.
11+
12+
This script has been tested with 21.0.0 and is confirmed to not work with
13+
20.0.2.
14+
15+
Usage:
16+
./dl_megawheel.py --help
17+
18+
19+
The downloaded wheels can then be installed locally using:
20+
pip install --no-index --find-links <dir_of_downloaded_wheels> ddtrace
21+
"""
22+
import argparse
23+
import itertools
24+
import os
25+
import subprocess
26+
import sys
27+
28+
import packaging.version
29+
30+
31+
# Do a check on the pip version since older versions are known to be
32+
# incompatible.
33+
MIN_PIP_VERSION = packaging.version.parse("21.0")
34+
cmd = [sys.executable, "-m", "pip", "--version"]
35+
res = subprocess.run(cmd, capture_output=True)
36+
out = res.stdout.decode().split(" ")[1]
37+
pip_version = packaging.version.parse(out)
38+
if pip_version < MIN_PIP_VERSION:
39+
print(
40+
"WARNING: using known incompatible version, %r, of pip. The minimum compatible pip version is %r"
41+
% (pip_version, MIN_PIP_VERSION)
42+
)
43+
44+
45+
supported_pythons = ["2.7", "3.6", "3.7", "3.8", "3.9", "3.10", "3.11"]
46+
supported_arches = ["aarch64", "x86_64", "i686"]
47+
supported_platforms = ["musllinux_1_1", "manylinux2014"]
48+
49+
parser = argparse.ArgumentParser(description=__doc__)
50+
parser.add_argument(
51+
"--python-version",
52+
choices=supported_pythons,
53+
action="append",
54+
required=True,
55+
)
56+
parser.add_argument(
57+
"--arch",
58+
choices=supported_arches,
59+
action="append",
60+
required=True,
61+
)
62+
parser.add_argument(
63+
"--platform",
64+
choices=supported_platforms,
65+
action="append",
66+
required=True,
67+
)
68+
parser.add_argument("--ddtrace-version", type=str, required=True)
69+
parser.add_argument("--output-dir", type=str, required=True)
70+
parser.add_argument("--dry-run", action="store_true")
71+
parser.add_argument("--verbose", action="store_true")
72+
args = parser.parse_args()
73+
74+
dl_dir = args.output_dir
75+
print("saving wheels to %s" % dl_dir)
76+
77+
for python_version, arch, platform in itertools.product(args.python_version, args.arch, args.platform):
78+
print("Downloading %s %s %s wheel" % (python_version, arch, platform))
79+
abi = "cp%s" % python_version.replace(".", "")
80+
# Have to special-case these versions of Python for some reason.
81+
if python_version in ["2.7", "3.5", "3.6", "3.7"]:
82+
abi += "m"
83+
84+
# See the docs for an explanation of all the options used:
85+
# https://pip.pypa.io/en/stable/cli/pip_download/
86+
# only-binary=:all: is specified to ensure we get all the dependencies of ddtrace as well.
87+
cmd = [
88+
sys.executable,
89+
"-m",
90+
"pip",
91+
"download",
92+
"ddtrace==%s" % args.ddtrace_version,
93+
"--platform",
94+
"%s_%s" % (platform, arch),
95+
"--python-version",
96+
python_version,
97+
"--abi",
98+
abi,
99+
"--only-binary=:all:",
100+
"--dest",
101+
dl_dir,
102+
]
103+
if args.verbose:
104+
print(" ".join(cmd))
105+
106+
if not args.dry_run:
107+
subprocess.run(cmd, capture_output=not args.verbose, check=True)
108+
109+
# Unzip all the wheels into the output directory
110+
wheel_files = [f for f in os.listdir(dl_dir) if f.endswith(".whl")]
111+
for whl in wheel_files:
112+
wheel_file = os.path.join(dl_dir, whl)
113+
# -q for quieter output, else we get all of the files being unzipped.
114+
subprocess.run(["unzip", "-q", "-o", wheel_file, "-d", dl_dir])
115+
# Remove the wheel as it has been unpacked
116+
os.remove(wheel_file)

lib-injection/sitecustomize.py

Lines changed: 19 additions & 47 deletions
Original file line numberDiff line numberDiff line change
@@ -1,57 +1,29 @@
11
"""
2-
This module when included on the PYTHONPATH will install the ddtrace package from pypi
3-
for the Python runtime being used.
2+
When included on the PYTHONPATH this module will initialize the PYTHONPATH to
3+
include the directory containing the ddtrace package and its dependencies. It
4+
then imports the ddtrace.bootstrap.sitecustomize module to automatically
5+
instrument the application.
46
"""
57
import os
68
import sys
79

810

9-
# This special string is to be replaced at container build time so that the
10-
# version is fixed in the source.
11-
version = "<DD_TRACE_VERSION_TO_BE_REPLACED>"
12-
13-
14-
def _configure_ddtrace():
15-
# This import has the same effect as ddtrace-run for the current process.
16-
import ddtrace.bootstrap.sitecustomize
17-
18-
bootstrap_dir = os.path.abspath(os.path.dirname(ddtrace.bootstrap.sitecustomize.__file__))
11+
def _add_to_pythonpath(path):
12+
# type: (str) -> None
13+
"""Adds a path to the start of PYTHONPATH."""
1914
prev_python_path = os.getenv("PYTHONPATH", "")
20-
os.environ["PYTHONPATH"] = "%s%s%s" % (bootstrap_dir, os.path.pathsep, prev_python_path)
21-
22-
# Also insert the bootstrap dir in the path of the current python process.
23-
sys.path.insert(0, bootstrap_dir)
24-
print("datadog autoinstrumentation: successfully configured python package")
25-
26-
27-
# Avoid infinite loop when attempting to install ddtrace. This flag is set when
28-
# the subprocess is launched to perform the installation.
29-
if "DDTRACE_PYTHON_INSTALL_IN_PROGRESS" not in os.environ:
30-
try:
31-
import ddtrace # noqa: F401
32-
33-
except ImportError:
34-
import subprocess
35-
36-
print("datadog autoinstrumentation: installing python package")
15+
os.environ["PYTHONPATH"] = "%s%s%s" % (path, os.pathsep, prev_python_path)
16+
sys.path.insert(0, path)
3717

38-
# Set the flag to avoid an infinite loop.
39-
env = os.environ.copy()
40-
env["DDTRACE_PYTHON_INSTALL_IN_PROGRESS"] = "true"
4118

42-
if "git" in version:
43-
ddtrace_version = version
44-
else:
45-
ddtrace_version = "ddtrace==%s" % version
19+
pkgs_path = os.path.join(os.path.dirname(__file__), "ddtrace_pkgs")
20+
bootstrap_dir = os.path.join(pkgs_path, "ddtrace", "bootstrap")
21+
_add_to_pythonpath(pkgs_path)
22+
_add_to_pythonpath(bootstrap_dir)
4623

47-
# Execute the installation with the current interpreter
48-
try:
49-
subprocess.run([sys.executable, "-m", "pip", "install", ddtrace_version], env=env, check=True)
50-
except Exception:
51-
print("datadog autoinstrumentation: failed to install python package version %r" % ddtrace_version)
52-
else:
53-
print("datadog autoinstrumentation: successfully installed python package version %r" % ddtrace_version)
54-
_configure_ddtrace()
55-
else:
56-
print("datadog autoinstrumentation: ddtrace already installed, skipping install")
57-
_configure_ddtrace()
24+
try:
25+
import ddtrace.bootstrap.sitecustomize # noqa: F401
26+
except BaseException as e:
27+
print("datadog autoinstrumentation: ddtrace failed to install:\n %s" % str(e))
28+
else:
29+
print("datadog autoinstrumentation: ddtrace successfully installed")
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
---
2+
fixes:
3+
- |
4+
lib-injection: The ddtrace package is now provided via the Docker image
5+
rather than relying on a run-time ``pip install``. This solves issues like
6+
containers blocking network requests, installation overhead during
7+
application startup, permissions issues with the install.

0 commit comments

Comments
 (0)