Skip to content

Commit 55c270e

Browse files
author
root
committed
fix: certbot deploy hook uses Podman REST API via Python; fix EL9 cgroup/SELinux settings
Root cause: certbot container (Alpine-based) does not include the `podman` binary, so the v1.0.0 deploy hook failed silently when trying to run `podman restart`. Separately, the compose files used `cgroupns: private` which crashes the testpoint container's internal systemd on EL9 hosts. Diagnosed and fixed on psum01.aglt2.org (Feb 2026). certbot-deploy-hook.sh (v1.0.0 → v2.0.0): - Replace `podman restart` CLI call with Python HTTP client talking to the Podman REST API over the mounted Unix socket (/run/podman/podman.sock). python3 is available in the Alpine certbot image; no additional packages required. - Removes dependency on the `podman` binary inside the container. - Add SHA256 checksum file (certbot-deploy-hook.sh.sha256). docker-compose.testpoint-le{,-auto}.yml: - testpoint: replace `privileged: true` + `cgroupns: private` with `cgroup: host` + `/sys/fs/cgroup` volume mount + `tty: true`. The old settings prevented systemd from running inside the container on EL9. Remove `CAP_SYS_ADMIN` and `CAP_SYS_PTRACE` (not needed). Increase healthcheck start_period to 60s (allows systemd more time). - certbot: add `security_opt: label=disable` so SELinux does not block the container from accessing the host Podman socket. install-perfsonar-testpoint.md: - Correct deploy hook troubleshooting path from /opt/certbot/deploy-hook.sh to the correct /etc/letsencrypt/renewal-hooks/deploy/certbot-deploy-hook.sh. - Expand deploy hook description: note Python REST API usage and the SELinux security_opt requirement for EL9 hosts. - Add actionable verification commands to the troubleshooting section.
1 parent a377595 commit 55c270e

File tree

5 files changed

+70
-31
lines changed

5 files changed

+70
-31
lines changed
Lines changed: 42 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -1,27 +1,54 @@
11
#!/bin/sh
22
# certbot-deploy-hook.sh
33
# This script is executed by Certbot's --deploy-hook after a successful renewal.
4-
# It gracefully restarts the perfsonar-testpoint container to load the new certificate.
4+
# It restarts the perfsonar-testpoint container to load the new certificate by
5+
# calling the Podman REST API over the mounted Unix socket.
56
#
6-
# It requires the host's Podman socket to be mounted into the certbot container.
7+
# Requirements (inside the certbot container):
8+
# - /run/podman/podman.sock mounted from the host (read-only)
9+
# - python3 available (present in docker.io/certbot/certbot:latest on Alpine)
710
#
8-
# Version: 1.0.0
11+
# The compose file must mount the socket and disable SELinux labeling:
12+
# volumes:
13+
# - /run/podman/podman.sock:/run/podman/podman.sock:ro
14+
# security_opt:
15+
# - label=disable
16+
#
17+
# Version: 2.0.0
918

1019
set -eu
1120

12-
# The name of the container to restart
21+
SOCKET="/run/podman/podman.sock"
1322
TARGET_CONTAINER="perfsonar-testpoint"
23+
STOP_TIMEOUT=30
24+
25+
echo "[INFO] Certbot deploy hook triggered for domains: ${RENEWED_DOMAINS:-unknown}"
26+
echo "[INFO] Restarting container '${TARGET_CONTAINER}' via Podman socket..."
1427

15-
echo "[INFO] Certbot deploy hook triggered for domains: $RENEWED_DOMAINS"
16-
echo "[INFO] Attempting to gracefully restart container: $TARGET_CONTAINER"
28+
python3 - <<PYEOF
29+
import http.client, socket, sys
1730
18-
# Use the mounted Podman socket to restart the container on the host
19-
# The --time=30 gives the container 30 seconds to shut down gracefully
20-
podman restart --time=30 "$TARGET_CONTAINER"
31+
class _UnixConn(http.client.HTTPConnection):
32+
def __init__(self, path):
33+
super().__init__("localhost")
34+
self._path = path
35+
def connect(self):
36+
self.sock = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM)
37+
self.sock.connect(self._path)
2138
22-
if [ $? -eq 0 ]; then
23-
echo "[SUCCESS] Container '$TARGET_CONTAINER' restarted successfully."
24-
else
25-
echo "[ERROR] Failed to restart container '$TARGET_CONTAINER'." >&2
26-
exit 1
27-
fi
39+
conn = _UnixConn("${SOCKET}")
40+
try:
41+
conn.request("POST",
42+
"/v4.0.0/containers/${TARGET_CONTAINER}/restart?t=${STOP_TIMEOUT}")
43+
resp = conn.getresponse()
44+
body = resp.read().decode()
45+
if resp.status == 204:
46+
print("[SUCCESS] Container '${TARGET_CONTAINER}' restarted successfully.")
47+
sys.exit(0)
48+
else:
49+
print(f"[ERROR] Podman API returned {resp.status}: {body}", file=sys.stderr)
50+
sys.exit(1)
51+
except Exception as e:
52+
print(f"[ERROR] Failed to contact Podman socket at ${SOCKET}: {e}", file=sys.stderr)
53+
sys.exit(1)
54+
PYEOF
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
6074f4dca208bcc556afa3155e8c295a238e59524a26c2dfe960fd85163c70bf

docs/perfsonar/tools_scripts/docker-compose.testpoint-le-auto.yml

Lines changed: 10 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -7,8 +7,7 @@ services:
77
labels:
88
- io.containers.autoupdate=registry
99
network_mode: "host"
10-
privileged: true
11-
cgroupns: private
10+
cgroup: host
1211
environment:
1312
- TZ=UTC
1413
# Optional: Set SERVER_FQDN to explicitly specify which Let's Encrypt certificate to use.
@@ -23,23 +22,23 @@ services:
2322
- /run/lock
2423
- /tmp
2524
volumes:
25+
- /sys/fs/cgroup:/sys/fs/cgroup:rw
2626
- /opt/perfsonar-tp/psconfig:/etc/perfsonar/psconfig:Z
2727
- /var/www/html:/var/www/html:z
2828
- /etc/apache2:/etc/apache2:z
2929
- /etc/letsencrypt:/etc/letsencrypt:z
3030
# Mount the tools_scripts directory so the entrypoint wrapper is available
3131
- /opt/perfsonar-tp/tools_scripts:/opt/perfsonar-tp/tools_scripts:ro
32+
tty: true
3233
pids_limit: 8192
3334
cap_add:
3435
- CAP_NET_RAW
35-
- CAP_SYS_ADMIN
36-
- CAP_SYS_PTRACE
3736
healthcheck:
3837
test: ["CMD-SHELL", "curl -kSfS https://localhost/ || exit 1"]
3938
interval: 30s
4039
timeout: 10s
4140
retries: 3
42-
start_period: 30s
41+
start_period: 60s
4342

4443
certbot:
4544
image: docker.io/certbot/certbot:latest
@@ -51,7 +50,12 @@ services:
5150
# on port 80, avoiding conflicts. Both containers share the host network
5251
# namespace without collision.
5352
network_mode: "host"
54-
# Mount the host's Podman socket to allow the deploy hook to restart containers
53+
# SELinux: disable label confinement so the container can access the host
54+
# Podman socket. Required on EL9 hosts with SELinux in enforcing mode.
55+
security_opt:
56+
- label=disable
57+
# Mount the host's Podman socket so the deploy hook can call the Podman REST
58+
# API to restart the testpoint container after renewal.
5559
volumes:
5660
- /run/podman/podman.sock:/run/podman/podman.sock:ro
5761
- /var/www/html:/var/www/html:Z

docs/perfsonar/tools_scripts/docker-compose.testpoint-le.yml

Lines changed: 10 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -7,8 +7,7 @@ services:
77
labels:
88
- io.containers.autoupdate=registry
99
network_mode: "host"
10-
privileged: true
11-
cgroupns: private
10+
cgroup: host
1211
environment:
1312
- TZ=UTC
1413
restart: unless-stopped
@@ -17,21 +16,21 @@ services:
1716
- /run/lock
1817
- /tmp
1918
volumes:
19+
- /sys/fs/cgroup:/sys/fs/cgroup:rw
2020
- /opt/perfsonar-tp/psconfig:/etc/perfsonar/psconfig:Z
2121
- /var/www/html:/var/www/html:z
2222
- /etc/apache2:/etc/apache2:z
2323
- /etc/letsencrypt:/etc/letsencrypt:z
24+
tty: true
2425
pids_limit: 8192
2526
cap_add:
2627
- CAP_NET_RAW
27-
- CAP_SYS_ADMIN
28-
- CAP_SYS_PTRACE
2928
healthcheck:
3029
test: ["CMD-SHELL", "curl -kSfS https://localhost/ || exit 1"]
3130
interval: 30s
3231
timeout: 10s
3332
retries: 3
34-
start_period: 30s
33+
start_period: 60s
3534

3635
certbot:
3736
image: docker.io/certbot/certbot:latest
@@ -43,7 +42,12 @@ services:
4342
# on port 80, avoiding conflicts. Both containers share the host network
4443
# namespace without collision.
4544
network_mode: "host"
46-
# Mount the host's Podman socket to allow the deploy hook to restart containers
45+
# SELinux: disable label confinement so the container can access the host
46+
# Podman socket. Required on EL9 hosts with SELinux in enforcing mode.
47+
security_opt:
48+
- label=disable
49+
# Mount the host's Podman socket so the deploy hook can call the Podman REST
50+
# API to restart the testpoint container after renewal.
4751
volumes:
4852
- /run/podman/podman.sock:/run/podman/podman.sock:ro
4953
- /var/www/html:/var/www/html:Z

docs/personas/quick-deploy/install-perfsonar-testpoint.md

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -645,7 +645,9 @@ The certbot container runs a renewal loop that checks for expiring certificates
645645
a deploy hook script (`certbot-deploy-hook.sh`) that gracefully restarts the `perfsonar-testpoint`
646646
container. This ensures the new certificates are loaded without manual intervention. The deploy hook
647647
uses the mounted Podman socket (`/run/podman/podman.sock`) to communicate with the host's container
648-
runtime.
648+
runtime via the Podman REST API (using Python, since the `podman` CLI is not present in the certbot
649+
image). On EL9 hosts with SELinux enforcing, the certbot service requires `security_opt: label=disable`
650+
to access the socket.
649651
650652
**Note:** The certbot container in this setup uses **host networking mode** (via `network_mode: host` in the
651653
compose file) so it can bind directly to port 80 for HTTP-01 challenges during renewals. This works
@@ -1196,9 +1198,10 @@ Perform these checks before handing the host over to operations:
11961198
11971199
**Solutions:**
11981200
1199-
- Ensure certbot container has deploy hook configured: `--deploy-hook /opt/certbot/deploy-hook.sh`
1200-
- Verify Podman socket is mounted in certbot container
1201-
- Manually restart testpoint after renewals if deploy hook fails
1201+
- Ensure certbot container has deploy hook configured: `--deploy-hook /etc/letsencrypt/renewal-hooks/deploy/certbot-deploy-hook.sh`
1202+
- Verify Podman socket is mounted in certbot container: `podman exec certbot ls /run/podman/podman.sock`
1203+
- Verify `security_opt: label=disable` is set on the certbot service in `docker-compose.yml` (required on EL9/SELinux hosts)
1204+
- Manually restart testpoint after renewals if deploy hook fails: `podman restart perfsonar-testpoint`
12021205
12031206
### perfSONAR Service Issues
12041207

0 commit comments

Comments
 (0)