@@ -1197,21 +1197,6 @@ podman exec -it perfsonar-testpoint psconfig remote list
11971197 /opt/perfsonar-toolkit/tools_scripts/perfSONAR-update-lsregistration.sh extract --output /root/restore-lsreg.sh --local
11981198 ```
11991199
1200- **For container-based installs:**
1201- ```bash
1202- # Preview changes only
1203- /opt/perfsonar-toolkit/tools_scripts/perfSONAR-update-lsregistration.sh update --container perfsonar-testpoint \
1204- --dry-run --site-name "Acme Co." --project WLCG \
1205- --admin-email admin@example.org --admin-name "pS Admin"
1206-
1207- # Apply new settings and restart the daemon in the container
1208- /opt/perfsonar-toolkit/tools_scripts/perfSONAR-update-lsregistration.sh create --container perfsonar-testpoint \
1209- --site-name "Acme Co." --domain example.org --project WLCG --project OSG \
1210- --city Berkeley --region CA --country US --zip 94720 \
1211- --latitude 37.5 --longitude -121.7469 \
1212- --admin-name "pS Admin" --admin-email admin@example.org
1213- ```
1214-
121512001. **Automatic updates**
12161201
12171202 The perfSONAR Toolkit uses `dnf-automatic` for automatic updates (already configured in Step 5).
@@ -1584,205 +1569,6 @@ Perform these checks before handing the host over to operations:
15841569
15851570## Troubleshooting
15861571
1587- ### Container Issues
1588-
1589- ??? failure "Container won' t start or exits immediately"
1590-
1591- **Symptoms:** ` podman ps` shows no running containers, or container exits shortly after starting.
1592-
1593- **Diagnostic steps:**
1594-
1595- ` ` ` bash
1596- # Check container logs
1597- podman logs perfsonar-testpoint
1598-
1599- # Check for systemd initialization errors
1600- podman logs perfsonar-testpoint 2>&1 | grep -i " failed\|error"
1601-
1602- # Verify compose file syntax
1603- cd /opt/perfsonar-toolkit
1604- podman-compose config
1605-
1606- ` ` `
1607-
1608- **Common causes:**
1609-
1610- - Missing entrypoint wrapper: Ensure ` /opt/perfsonar-toolkit/tools_scripts/testpoint-entrypoint-wrapper.sh` exists
1611- - SELinux denials: Check ` ausearch -m avc -ts recent` and consider temporarily setting to permissive mode for testing
1612- - Incorrect bind-mount paths: Verify all host directories exist and have correct permissions
1613- - Cgroup issues: Ensure ` cgroupns: private` is set and no manual cgroup bind-mounts exist
1614-
1615- ??? failure " Container won' t start or exits immediately"
1616-
1617- **Symptoms:** `podman ps` shows no running containers, or container exits shortly after starting.
1618-
1619- **Diagnostic steps:**
1620-
1621- ```bash
1622- # Check container logs
1623- podman logs perfsonar-testpoint
1624-
1625- # Check for systemd initialization errors
1626- podman logs perfsonar-testpoint 2>&1 | grep -i "failed\|error"
1627-
1628- # Verify compose file syntax
1629- cd /opt/perfsonar-toolkit
1630- podman-compose config
1631-
1632- ```
1633-
1634- **Common causes:**
1635-
1636- - Missing entrypoint wrapper: Ensure `/opt/perfsonar-toolkit/tools_scripts/testpoint-entrypoint-wrapper.sh` exists
1637- - SELinux denials: Check `ausearch -m avc -ts recent` and consider temporarily setting to permissive mode for testing
1638- - Incorrect bind-mount paths: Verify all host directories exist and have correct permissions
1639- - Cgroup issues: Ensure `cgroupns: private` is set and no manual cgroup bind-mounts exist
1640-
1641- ??? failure "Container crashes after reboot with exit code 255"
1642-
1643- **Symptoms:** Containers run fine when started manually but crash-loop after host reboot. Logs show repeated restarts
1644- with exit code 255.
1645-
1646- **Cause:** The perfSONAR testpoint image runs systemd internally but podman-compose doesn' t support the
1647- ` --systemd=always` flag required for proper systemd operation in containers.
1648-
1649- ** Diagnostic steps:**
1650-
1651- ` ` ` bash
1652- # Check container status
1653- podman ps -a
1654-
1655- # Check systemd service status
1656- systemctl status perfsonar-testpoint.service
1657-
1658- # View recent container logs
1659- podman logs perfsonar-testpoint --tail 100
1660-
1661- # Check if using compose-based service (BAD)
1662- grep -A5 " ExecStart" /etc/systemd/system/perfsonar-testpoint.service
1663- ` ` `
1664-
1665- ** Solution:**
1666-
1667- Replace the compose-based systemd service with proper systemd units that use ` podman run --systemd=always` :
1668-
1669- ` ` ` bash
1670- # Stop and disable old service
1671- systemctl stop perfsonar-testpoint.service
1672- systemctl disable perfsonar-testpoint.service
1673-
1674- # Install new systemd units
1675- curl -fsSL \
1676- https://raw.githubusercontent.com/osg-htc/networking/master/docs/perfsonar/tools_scripts/install-systemd-units.sh \
1677- -o /tmp/install-systemd-units.sh
1678- chmod 0755 /tmp/install-systemd-units.sh
1679-
1680- # For testpoint only:
1681- /tmp/install-systemd-units.sh --install-dir /opt/perfsonar-toolkit
1682-
1683- # For testpoint + certbot:
1684- /tmp/install-systemd-units.sh --install-dir /opt/perfsonar-toolkit --with-certbot
1685-
1686- # Enable and start
1687- systemctl enable --now perfsonar-testpoint.service
1688-
1689- # If using certbot:
1690- systemctl enable --now perfsonar-certbot.service
1691-
1692- # Verify containers are running
1693- podman ps
1694- curl -kI https://127.0.0.1/
1695- ` ` `
1696-
1697- ** Verification:**
1698-
1699- After installing the new units, the testpoint should:
1700- - Start successfully on boot
1701- - Run systemd properly inside the container
1702- - Maintain state across reboots
1703- - Show " Up" status in ` podman ps` (not " Exited" or crash-looping)
1704-
1705- ??? failure " Certbot service fails with 'Unable to open config file' error"
1706-
1707- ** Symptoms:** ` perfsonar-certbot.service` fails immediately after starting with exit code 2. Logs show: ` certbot: error:
1708- Unable to open config file: trap exit TERM; while...`
1709-
1710- ** Cause:** The certbot container image has a built-in entrypoint that expects certbot commands directly. When using a
1711- shell loop for renewal, the entrypoint tries to parse the shell command as a certbot config file, causing this error.
1712-
1713- ** Diagnostic steps:**
1714-
1715- ` ` ` bash
1716- # Check certbot service status
1717- systemctl status perfsonar-certbot.service
1718-
1719- # View detailed logs
1720- journalctl -u perfsonar-certbot.service -n 50
1721-
1722- # Check for the error in logs
1723- journalctl -u perfsonar-certbot.service | grep " Unable to open config file"
1724-
1725- # Verify service file configuration
1726- grep -A5 " ExecStart" /etc/systemd/system/perfsonar-certbot.service
1727- ` ` `
1728-
1729- ** Solution:**
1730-
1731- The certbot service needs two flags:
1732- - ` --systemd=always` for proper systemd integration and reboot persistence
1733- - ` --entrypoint=/bin/sh` to override the built-in entrypoint
1734-
1735- Re-run the installation script to get the fixed version:
1736-
1737- ` ` ` bash
1738- # Stop current service
1739- systemctl stop perfsonar-certbot.service
1740-
1741- # Download and install updated systemd units
1742- curl -fsSL \
1743- https://raw.githubusercontent.com/osg-htc/networking/master/docs/perfsonar/tools_scripts/install-systemd-units.sh \
1744- -o /tmp/install-systemd-units.sh
1745- chmod 0755 /tmp/install-systemd-units.sh
1746-
1747- # Install with certbot support
1748- /tmp/install-systemd-units.sh --install-dir /opt/perfsonar-toolkit --with-certbot
1749-
1750- # Start the fixed service
1751- systemctl daemon-reload
1752- systemctl start perfsonar-certbot.service
1753-
1754- # Verify it's running
1755- systemctl status perfsonar-certbot.service
1756- podman ps | grep certbot
1757- ` ` `
1758-
1759- ** Expected result:** The certbot container should be running (not exiting) and the service should be in " active (running)" state.
1760-
1761- ??? failure " SELinux denials blocking container operations"
1762-
1763- ** Symptoms:** Container starts but services fail, permission denied errors in logs.
1764-
1765- ** Diagnostic steps:**
1766-
1767- ` ` ` bash
1768-
1769- # Check for recent SELinux denials
1770- ausearch -m avc -ts recent
1771-
1772- # Temporarily set to permissive for testing
1773- setenforce 0
1774-
1775- # Test if issue resolves, then check audit log
1776- ausearch -m avc -ts recent > /tmp/selinux-denials.txt
1777-
1778- ` ` `
1779-
1780- ** Solutions:**
1781-
1782- - Verify volume labels are correct (` :Z` for exclusive, ` :z` for shared)
1783- - Recreate containers to reapply SELinux labels: ` podman-compose down && podman-compose up -d`
1784- - If persistent issues, consider creating custom SELinux policy or running in permissive mode
1785-
17861572### Networking Issues
17871573
17881574??? failure "Policy-based routing not working correctly"
0 commit comments