|
| 1 | +# Debug Container for Cluster Monitoring Operator |
| 2 | + |
| 3 | +## Overview |
| 4 | + |
| 5 | +The Cluster Monitoring Operator (CMO) supports an optional debug container that can be deployed as a sidecar alongside the main operator. This debug container provides additional tooling capabilities for troubleshooting monitoring stack issues without requiring separate pod deployments or complex debugging setups. |
| 6 | + |
| 7 | +## Enabling the Debug Container |
| 8 | + |
| 9 | +### For Existing Clusters (Immediate Use) |
| 10 | + |
| 11 | +Create a patch file to add the debug container to the running deployment: |
| 12 | + |
| 13 | +```bash |
| 14 | +# Create debug container patch |
| 15 | +cat > debug-enable-patch.yaml << 'EOF' |
| 16 | +spec: |
| 17 | + template: |
| 18 | + spec: |
| 19 | + containers: |
| 20 | + - name: debug-tools |
| 21 | + image: registry.redhat.io/ubi9/ubi:latest |
| 22 | + command: ["/bin/bash", "-c", "dnf swap -y libcurl-minimal libcurl && sleep infinity"] |
| 23 | + resources: |
| 24 | + requests: |
| 25 | + cpu: 10m |
| 26 | + memory: 32Mi |
| 27 | + limits: |
| 28 | + cpu: 50m |
| 29 | + memory: 128Mi |
| 30 | + securityContext: |
| 31 | + allowPrivilegeEscalation: false |
| 32 | + capabilities: |
| 33 | + drop: ["ALL"] |
| 34 | + runAsNonRoot: true |
| 35 | + terminationMessagePolicy: FallbackToLogsOnError |
| 36 | +EOF |
| 37 | + |
| 38 | +# Apply the patch |
| 39 | +kubectl patch deployment cluster-monitoring-operator \ |
| 40 | + -n openshift-monitoring \ |
| 41 | + --patch-file debug-enable-patch.yaml |
| 42 | + |
| 43 | +# Verify the rollout |
| 44 | +kubectl rollout status deployment/cluster-monitoring-operator -n openshift-monitoring |
| 45 | +``` |
| 46 | + |
| 47 | +### Option 2: Edit Deployment Directly |
| 48 | + |
| 49 | +```bash |
| 50 | +# Edit the deployment manifest |
| 51 | +kubectl edit deployment cluster-monitoring-operator -n openshift-monitoring |
| 52 | + |
| 53 | +# Find the commented debug container section and uncomment it: |
| 54 | +# Look for the section starting with "# DEBUG TOOLS SIDECAR (OPTIONAL)" |
| 55 | +# Remove the '#' from the container definition lines |
| 56 | +``` |
| 57 | + |
| 58 | +## Using the Debug Container |
| 59 | + |
| 60 | +### Basic Access |
| 61 | + |
| 62 | +Once enabled, access the debug container using kubectl exec: |
| 63 | + |
| 64 | +```bash |
| 65 | +# Access the debug container |
| 66 | +kubectl exec -n openshift-monitoring \ |
| 67 | + deployment/cluster-monitoring-operator \ |
| 68 | + -c debug-tools -- /bin/bash |
| 69 | + |
| 70 | +# Or target a specific pod |
| 71 | +kubectl exec -n openshift-monitoring \ |
| 72 | + pod/cluster-monitoring-operator-xyz \ |
| 73 | + -c debug-tools -- /bin/bash |
| 74 | +``` |
| 75 | + |
| 76 | +## Common Use Cases and Examples |
| 77 | + |
| 78 | +### 1. Network Connectivity Testing |
| 79 | + |
| 80 | +```bash |
| 81 | +# Exec into debug container |
| 82 | +kubectl exec -n openshift-monitoring deployment/cluster-monitoring-operator -c debug-tools -- /bin/bash |
| 83 | + |
| 84 | +# Test connectivity to Prometheus |
| 85 | +curl -I http://prometheus-k8s.openshift-monitoring:9090/metrics |
| 86 | + |
| 87 | +# Test connectivity to Alertmanager |
| 88 | +curl -I http://alertmanager-main.openshift-monitoring:9093/api/v1/status |
| 89 | + |
| 90 | +# DNS resolution testing |
| 91 | +nslookup prometheus-k8s.openshift-monitoring |
| 92 | +nslookup alertmanager-main.openshift-monitoring |
| 93 | + |
| 94 | +# Network port testing (if netcat is available) |
| 95 | +nc -zv prometheus-k8s.openshift-monitoring 9090 |
| 96 | +``` |
| 97 | + |
| 98 | +### 2. Kubernetes API Debugging |
| 99 | + |
| 100 | +```bash |
| 101 | +# Use the mounted service account token |
| 102 | +TOKEN=$(cat /var/run/secrets/kubernetes.io/serviceaccount/token) |
| 103 | +CA_CERT=/var/run/secrets/kubernetes.io/serviceaccount/ca.crt |
| 104 | + |
| 105 | +# Test API access with same permissions as operator |
| 106 | +curl -H "Authorization: Bearer $TOKEN" \ |
| 107 | + --cacert $CA_CERT \ |
| 108 | + https://kubernetes.default.svc/api/v1/namespaces/openshift-monitoring/pods |
| 109 | + |
| 110 | +# Check RBAC permissions |
| 111 | +curl -H "Authorization: Bearer $TOKEN" \ |
| 112 | + --cacert $CA_CERT \ |
| 113 | + https://kubernetes.default.svc/apis/authorization.k8s.io/v1/selfsubjectaccessreviews \ |
| 114 | + -X POST -H "Content-Type: application/json" -d '{ |
| 115 | + "kind": "SelfSubjectAccessReview", |
| 116 | + "apiVersion": "authorization.k8s.io/v1", |
| 117 | + "spec": { |
| 118 | + "resourceAttributes": { |
| 119 | + "namespace": "openshift-monitoring", |
| 120 | + "verb": "get", |
| 121 | + "resource": "pods" |
| 122 | + } |
| 123 | + } |
| 124 | + }' |
| 125 | +``` |
| 126 | + |
| 127 | +### 3. Resource and Environment Investigation |
| 128 | + |
| 129 | +```bash |
| 130 | +# Check process information |
| 131 | +ps aux |
| 132 | + |
| 133 | +# Monitor resource usage |
| 134 | +top |
| 135 | + |
| 136 | +# Check mounted volumes |
| 137 | +mount | grep -E "(configmap|secret)" |
| 138 | + |
| 139 | +# Examine environment variables |
| 140 | +env | sort |
| 141 | + |
| 142 | +# Check disk usage |
| 143 | +df -h |
| 144 | + |
| 145 | +# Network interface information |
| 146 | +ip addr show |
| 147 | + |
| 148 | +# Process network connections |
| 149 | +ss -tuln |
| 150 | +``` |
| 151 | + |
| 152 | +### 4. Configuration Analysis |
| 153 | + |
| 154 | +```bash |
| 155 | +# Examine mounted ConfigMaps |
| 156 | +find /etc -name "*.yaml" -o -name "*.yml" | head -10 |
| 157 | +cat /etc/cluster-monitoring-operator/telemetry/metrics.yaml |
| 158 | + |
| 159 | +# Check mounted secrets |
| 160 | +ls -la /etc/ssl/certs/ |
| 161 | +ls -la /var/run/secrets/ |
| 162 | + |
| 163 | +# Validate configuration files |
| 164 | +find /etc -type f -exec file {} \; | grep -i yaml |
| 165 | +``` |
| 166 | + |
| 167 | +### 5. Log Analysis |
| 168 | + |
| 169 | +```bash |
| 170 | +# Check operator logs from inside the pod |
| 171 | +# (Note: This shows logs from the main container, not the debug container) |
| 172 | +tail -f /proc/1/fd/1 |
| 173 | + |
| 174 | +# Or check specific log files if they exist |
| 175 | +find /var/log -type f 2>/dev/null |
| 176 | +``` |
| 177 | + |
| 178 | +## Custom Debug Images |
| 179 | + |
| 180 | +For enhanced debugging capabilities, you can create a custom debug image with additional tools: |
| 181 | + |
| 182 | +### Creating a Custom Debug Image |
| 183 | + |
| 184 | +```dockerfile |
| 185 | +# Example Dockerfile for custom debug image |
| 186 | +FROM registry.redhat.io/ubi9/ubi:latest |
| 187 | + |
| 188 | +# Install common debugging tools and swap to full curl |
| 189 | +RUN dnf update -y && \ |
| 190 | + dnf swap -y libcurl-minimal libcurl && \ |
| 191 | + dnf install -y \ |
| 192 | + wget \ |
| 193 | + bind-utils \ |
| 194 | + procps-ng \ |
| 195 | + && dnf clean all |
| 196 | + |
| 197 | +USER 1001 |
| 198 | +CMD ["/bin/sleep", "infinity"] |
| 199 | +``` |
| 200 | + |
| 201 | +### Using a Custom Image |
| 202 | + |
| 203 | +Update the image reference in your debug container patch: |
| 204 | + |
| 205 | +```yaml |
| 206 | +spec: |
| 207 | + template: |
| 208 | + spec: |
| 209 | + containers: |
| 210 | + - name: debug-tools |
| 211 | + image: quay.io/your-org/debug-tools:latest # Your custom image |
| 212 | + command: ["/bin/sleep", "infinity"] |
| 213 | + # ... rest of container spec |
| 214 | +``` |
| 215 | + |
| 216 | +## Security Considerations |
| 217 | + |
| 218 | +The debug container runs with a restricted security context: |
| 219 | + |
| 220 | +- **No privilege escalation** - `allowPrivilegeEscalation: false` |
| 221 | +- **Dropped capabilities** - All Linux capabilities are dropped |
| 222 | +- **Non-root user** - Runs as non-root user (UID 1001) |
| 223 | +- **Same permissions** - Uses the same ServiceAccount as the operator |
| 224 | +- **Network isolation** - Shares the pod network namespace |
| 225 | + |
| 226 | +## Troubleshooting |
| 227 | + |
| 228 | +### Debug Container Won't Start |
| 229 | + |
| 230 | +```bash |
| 231 | +# Check pod events |
| 232 | +kubectl describe pod -n openshift-monitoring -l app.kubernetes.io/name=cluster-monitoring-operator |
| 233 | + |
| 234 | +# Check container status |
| 235 | +kubectl get pods -n openshift-monitoring -l app.kubernetes.io/name=cluster-monitoring-operator -o jsonpath='{.items[0].status.containerStatuses}' |
| 236 | + |
| 237 | +# Verify image pull |
| 238 | +kubectl get events -n openshift-monitoring --field-selector reason=Failed |
| 239 | +``` |
| 240 | + |
| 241 | +### Cannot Access Debug Container |
| 242 | + |
| 243 | +```bash |
| 244 | +# Verify container is running |
| 245 | +kubectl get pods -n openshift-monitoring -l app.kubernetes.io/name=cluster-monitoring-operator |
| 246 | + |
| 247 | +# Check if debug container exists |
| 248 | +kubectl get pods -n openshift-monitoring -l app.kubernetes.io/name=cluster-monitoring-operator -o jsonpath='{.items[0].spec.containers[*].name}' |
| 249 | + |
| 250 | +# Try accessing by pod name instead of deployment |
| 251 | +kubectl get pods -n openshift-monitoring -l app.kubernetes.io/name=cluster-monitoring-operator |
| 252 | +kubectl exec -n openshift-monitoring pod/cluster-monitoring-operator-abc123 -c debug-tools -- /bin/bash |
| 253 | +``` |
0 commit comments