Skip to content

Commit 5d7bcb5

Browse files
committed
docs(quick-deploy): remove Container Issues troubleshooting section
Container-based deployments are out of scope for the RPM Toolkit Quick Deploy guide. All container-specific troubleshooting has been removed to keep the guide focused solely on RPM-based installations. Container Issues removed: - Container won't start or exits immediately - Container crashes after reboot with exit code 255 - Certbot service fails with config file error - SELinux denials blocking container operations
1 parent bb13b04 commit 5d7bcb5

File tree

1 file changed

+0
-214
lines changed

1 file changed

+0
-214
lines changed

docs/personas/quick-deploy/install-perfsonar-toolkit.md

Lines changed: 0 additions & 214 deletions
Original file line numberDiff line numberDiff line change
@@ -1197,21 +1197,6 @@ podman exec -it perfsonar-testpoint psconfig remote list
11971197
/opt/perfsonar-toolkit/tools_scripts/perfSONAR-update-lsregistration.sh extract --output /root/restore-lsreg.sh --local
11981198
```
11991199
1200-
**For container-based installs:**
1201-
```bash
1202-
# Preview changes only
1203-
/opt/perfsonar-toolkit/tools_scripts/perfSONAR-update-lsregistration.sh update --container perfsonar-testpoint \
1204-
--dry-run --site-name "Acme Co." --project WLCG \
1205-
--admin-email admin@example.org --admin-name "pS Admin"
1206-
1207-
# Apply new settings and restart the daemon in the container
1208-
/opt/perfsonar-toolkit/tools_scripts/perfSONAR-update-lsregistration.sh create --container perfsonar-testpoint \
1209-
--site-name "Acme Co." --domain example.org --project WLCG --project OSG \
1210-
--city Berkeley --region CA --country US --zip 94720 \
1211-
--latitude 37.5 --longitude -121.7469 \
1212-
--admin-name "pS Admin" --admin-email admin@example.org
1213-
```
1214-
12151200
1. **Automatic updates**
12161201
12171202
The perfSONAR Toolkit uses `dnf-automatic` for automatic updates (already configured in Step 5).
@@ -1584,205 +1569,6 @@ Perform these checks before handing the host over to operations:
15841569
15851570
## Troubleshooting
15861571
1587-
### Container Issues
1588-
1589-
??? failure "Container won't start or exits immediately"
1590-
1591-
**Symptoms:** `podman ps` shows no running containers, or container exits shortly after starting.
1592-
1593-
**Diagnostic steps:**
1594-
1595-
```bash
1596-
# Check container logs
1597-
podman logs perfsonar-testpoint
1598-
1599-
# Check for systemd initialization errors
1600-
podman logs perfsonar-testpoint 2>&1 | grep -i "failed\|error"
1601-
1602-
# Verify compose file syntax
1603-
cd /opt/perfsonar-toolkit
1604-
podman-compose config
1605-
1606-
```
1607-
1608-
**Common causes:**
1609-
1610-
- Missing entrypoint wrapper: Ensure `/opt/perfsonar-toolkit/tools_scripts/testpoint-entrypoint-wrapper.sh` exists
1611-
- SELinux denials: Check `ausearch -m avc -ts recent` and consider temporarily setting to permissive mode for testing
1612-
- Incorrect bind-mount paths: Verify all host directories exist and have correct permissions
1613-
- Cgroup issues: Ensure `cgroupns: private` is set and no manual cgroup bind-mounts exist
1614-
1615-
??? failure "Container won't start or exits immediately"
1616-
1617-
**Symptoms:** `podman ps` shows no running containers, or container exits shortly after starting.
1618-
1619-
**Diagnostic steps:**
1620-
1621-
```bash
1622-
# Check container logs
1623-
podman logs perfsonar-testpoint
1624-
1625-
# Check for systemd initialization errors
1626-
podman logs perfsonar-testpoint 2>&1 | grep -i "failed\|error"
1627-
1628-
# Verify compose file syntax
1629-
cd /opt/perfsonar-toolkit
1630-
podman-compose config
1631-
1632-
```
1633-
1634-
**Common causes:**
1635-
1636-
- Missing entrypoint wrapper: Ensure `/opt/perfsonar-toolkit/tools_scripts/testpoint-entrypoint-wrapper.sh` exists
1637-
- SELinux denials: Check `ausearch -m avc -ts recent` and consider temporarily setting to permissive mode for testing
1638-
- Incorrect bind-mount paths: Verify all host directories exist and have correct permissions
1639-
- Cgroup issues: Ensure `cgroupns: private` is set and no manual cgroup bind-mounts exist
1640-
1641-
??? failure "Container crashes after reboot with exit code 255"
1642-
1643-
**Symptoms:** Containers run fine when started manually but crash-loop after host reboot. Logs show repeated restarts
1644-
with exit code 255.
1645-
1646-
**Cause:** The perfSONAR testpoint image runs systemd internally but podman-compose doesn't support the
1647-
`--systemd=always` flag required for proper systemd operation in containers.
1648-
1649-
**Diagnostic steps:**
1650-
1651-
```bash
1652-
# Check container status
1653-
podman ps -a
1654-
1655-
# Check systemd service status
1656-
systemctl status perfsonar-testpoint.service
1657-
1658-
# View recent container logs
1659-
podman logs perfsonar-testpoint --tail 100
1660-
1661-
# Check if using compose-based service (BAD)
1662-
grep -A5 "ExecStart" /etc/systemd/system/perfsonar-testpoint.service
1663-
```
1664-
1665-
**Solution:**
1666-
1667-
Replace the compose-based systemd service with proper systemd units that use `podman run --systemd=always`:
1668-
1669-
```bash
1670-
# Stop and disable old service
1671-
systemctl stop perfsonar-testpoint.service
1672-
systemctl disable perfsonar-testpoint.service
1673-
1674-
# Install new systemd units
1675-
curl -fsSL \
1676-
https://raw.githubusercontent.com/osg-htc/networking/master/docs/perfsonar/tools_scripts/install-systemd-units.sh \
1677-
-o /tmp/install-systemd-units.sh
1678-
chmod 0755 /tmp/install-systemd-units.sh
1679-
1680-
# For testpoint only:
1681-
/tmp/install-systemd-units.sh --install-dir /opt/perfsonar-toolkit
1682-
1683-
# For testpoint + certbot:
1684-
/tmp/install-systemd-units.sh --install-dir /opt/perfsonar-toolkit --with-certbot
1685-
1686-
# Enable and start
1687-
systemctl enable --now perfsonar-testpoint.service
1688-
1689-
# If using certbot:
1690-
systemctl enable --now perfsonar-certbot.service
1691-
1692-
# Verify containers are running
1693-
podman ps
1694-
curl -kI https://127.0.0.1/
1695-
```
1696-
1697-
**Verification:**
1698-
1699-
After installing the new units, the testpoint should:
1700-
- Start successfully on boot
1701-
- Run systemd properly inside the container
1702-
- Maintain state across reboots
1703-
- Show "Up" status in `podman ps` (not "Exited" or crash-looping)
1704-
1705-
??? failure "Certbot service fails with 'Unable to open config file' error"
1706-
1707-
**Symptoms:** `perfsonar-certbot.service` fails immediately after starting with exit code 2. Logs show: `certbot: error:
1708-
Unable to open config file: trap exit TERM; while...`
1709-
1710-
**Cause:** The certbot container image has a built-in entrypoint that expects certbot commands directly. When using a
1711-
shell loop for renewal, the entrypoint tries to parse the shell command as a certbot config file, causing this error.
1712-
1713-
**Diagnostic steps:**
1714-
1715-
```bash
1716-
# Check certbot service status
1717-
systemctl status perfsonar-certbot.service
1718-
1719-
# View detailed logs
1720-
journalctl -u perfsonar-certbot.service -n 50
1721-
1722-
# Check for the error in logs
1723-
journalctl -u perfsonar-certbot.service | grep "Unable to open config file"
1724-
1725-
# Verify service file configuration
1726-
grep -A5 "ExecStart" /etc/systemd/system/perfsonar-certbot.service
1727-
```
1728-
1729-
**Solution:**
1730-
1731-
The certbot service needs two flags:
1732-
- `--systemd=always` for proper systemd integration and reboot persistence
1733-
- `--entrypoint=/bin/sh` to override the built-in entrypoint
1734-
1735-
Re-run the installation script to get the fixed version:
1736-
1737-
```bash
1738-
# Stop current service
1739-
systemctl stop perfsonar-certbot.service
1740-
1741-
# Download and install updated systemd units
1742-
curl -fsSL \
1743-
https://raw.githubusercontent.com/osg-htc/networking/master/docs/perfsonar/tools_scripts/install-systemd-units.sh \
1744-
-o /tmp/install-systemd-units.sh
1745-
chmod 0755 /tmp/install-systemd-units.sh
1746-
1747-
# Install with certbot support
1748-
/tmp/install-systemd-units.sh --install-dir /opt/perfsonar-toolkit --with-certbot
1749-
1750-
# Start the fixed service
1751-
systemctl daemon-reload
1752-
systemctl start perfsonar-certbot.service
1753-
1754-
# Verify it's running
1755-
systemctl status perfsonar-certbot.service
1756-
podman ps | grep certbot
1757-
```
1758-
1759-
**Expected result:** The certbot container should be running (not exiting) and the service should be in "active (running)" state.
1760-
1761-
??? failure "SELinux denials blocking container operations"
1762-
1763-
**Symptoms:** Container starts but services fail, permission denied errors in logs.
1764-
1765-
**Diagnostic steps:**
1766-
1767-
```bash
1768-
1769-
# Check for recent SELinux denials
1770-
ausearch -m avc -ts recent
1771-
1772-
# Temporarily set to permissive for testing
1773-
setenforce 0
1774-
1775-
# Test if issue resolves, then check audit log
1776-
ausearch -m avc -ts recent > /tmp/selinux-denials.txt
1777-
1778-
```
1779-
1780-
**Solutions:**
1781-
1782-
- Verify volume labels are correct (`:Z` for exclusive, `:z` for shared)
1783-
- Recreate containers to reapply SELinux labels: `podman-compose down && podman-compose up -d`
1784-
- If persistent issues, consider creating custom SELinux policy or running in permissive mode
1785-
17861572
### Networking Issues
17871573
17881574
??? failure "Policy-based routing not working correctly"

0 commit comments

Comments
 (0)