Skip to content

Commit 2758b32

Browse files
Merge pull request #2623 from danielmellado/add_debug_flag
MON-4318: Add debug image manifest to CMO
2 parents 5db9394 + 484bfa1 commit 2758b32

File tree

5 files changed

+284
-1
lines changed

5 files changed

+284
-1
lines changed

Documentation/debug-container.md

Lines changed: 253 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,253 @@
1+
# Debug Container for Cluster Monitoring Operator
2+
3+
## Overview
4+
5+
The Cluster Monitoring Operator (CMO) supports an optional debug container that can be deployed as a sidecar alongside the main operator. This debug container provides additional tooling capabilities for troubleshooting monitoring stack issues without requiring separate pod deployments or complex debugging setups.
6+
7+
## Enabling the Debug Container
8+
9+
### For Existing Clusters (Immediate Use)
10+
11+
Create a patch file to add the debug container to the running deployment:
12+
13+
```bash
14+
# Create debug container patch
15+
cat > debug-enable-patch.yaml << 'EOF'
16+
spec:
17+
template:
18+
spec:
19+
containers:
20+
- name: debug-tools
21+
image: registry.redhat.io/ubi9/ubi:latest
22+
command: ["/bin/bash", "-c", "dnf swap -y libcurl-minimal libcurl && sleep infinity"]
23+
resources:
24+
requests:
25+
cpu: 10m
26+
memory: 32Mi
27+
limits:
28+
cpu: 50m
29+
memory: 128Mi
30+
securityContext:
31+
allowPrivilegeEscalation: false
32+
capabilities:
33+
drop: ["ALL"]
34+
runAsNonRoot: true
35+
terminationMessagePolicy: FallbackToLogsOnError
36+
EOF
37+
38+
# Apply the patch
39+
kubectl patch deployment cluster-monitoring-operator \
40+
-n openshift-monitoring \
41+
--patch-file debug-enable-patch.yaml
42+
43+
# Verify the rollout
44+
kubectl rollout status deployment/cluster-monitoring-operator -n openshift-monitoring
45+
```
46+
47+
### Option 2: Edit Deployment Directly
48+
49+
```bash
50+
# Edit the deployment manifest
51+
kubectl edit deployment cluster-monitoring-operator -n openshift-monitoring
52+
53+
# Find the commented debug container section and uncomment it:
54+
# Look for the section starting with "# DEBUG TOOLS SIDECAR (OPTIONAL)"
55+
# Remove the '#' from the container definition lines
56+
```
57+
58+
## Using the Debug Container
59+
60+
### Basic Access
61+
62+
Once enabled, access the debug container using kubectl exec:
63+
64+
```bash
65+
# Access the debug container
66+
kubectl exec -n openshift-monitoring \
67+
deployment/cluster-monitoring-operator \
68+
-c debug-tools -- /bin/bash
69+
70+
# Or target a specific pod
71+
kubectl exec -n openshift-monitoring \
72+
pod/cluster-monitoring-operator-xyz \
73+
-c debug-tools -- /bin/bash
74+
```
75+
76+
## Common Use Cases and Examples
77+
78+
### 1. Network Connectivity Testing
79+
80+
```bash
81+
# Exec into debug container
82+
kubectl exec -n openshift-monitoring deployment/cluster-monitoring-operator -c debug-tools -- /bin/bash
83+
84+
# Test connectivity to Prometheus
85+
curl -I http://prometheus-k8s.openshift-monitoring:9090/metrics
86+
87+
# Test connectivity to Alertmanager
88+
curl -I http://alertmanager-main.openshift-monitoring:9093/api/v1/status
89+
90+
# DNS resolution testing
91+
nslookup prometheus-k8s.openshift-monitoring
92+
nslookup alertmanager-main.openshift-monitoring
93+
94+
# Network port testing (if netcat is available)
95+
nc -zv prometheus-k8s.openshift-monitoring 9090
96+
```
97+
98+
### 2. Kubernetes API Debugging
99+
100+
```bash
101+
# Use the mounted service account token
102+
TOKEN=$(cat /var/run/secrets/kubernetes.io/serviceaccount/token)
103+
CA_CERT=/var/run/secrets/kubernetes.io/serviceaccount/ca.crt
104+
105+
# Test API access with same permissions as operator
106+
curl -H "Authorization: Bearer $TOKEN" \
107+
--cacert $CA_CERT \
108+
https://kubernetes.default.svc/api/v1/namespaces/openshift-monitoring/pods
109+
110+
# Check RBAC permissions
111+
curl -H "Authorization: Bearer $TOKEN" \
112+
--cacert $CA_CERT \
113+
https://kubernetes.default.svc/apis/authorization.k8s.io/v1/selfsubjectaccessreviews \
114+
-X POST -H "Content-Type: application/json" -d '{
115+
"kind": "SelfSubjectAccessReview",
116+
"apiVersion": "authorization.k8s.io/v1",
117+
"spec": {
118+
"resourceAttributes": {
119+
"namespace": "openshift-monitoring",
120+
"verb": "get",
121+
"resource": "pods"
122+
}
123+
}
124+
}'
125+
```
126+
127+
### 3. Resource and Environment Investigation
128+
129+
```bash
130+
# Check process information
131+
ps aux
132+
133+
# Monitor resource usage
134+
top
135+
136+
# Check mounted volumes
137+
mount | grep -E "(configmap|secret)"
138+
139+
# Examine environment variables
140+
env | sort
141+
142+
# Check disk usage
143+
df -h
144+
145+
# Network interface information
146+
ip addr show
147+
148+
# Process network connections
149+
ss -tuln
150+
```
151+
152+
### 4. Configuration Analysis
153+
154+
```bash
155+
# Examine mounted ConfigMaps
156+
find /etc -name "*.yaml" -o -name "*.yml" | head -10
157+
cat /etc/cluster-monitoring-operator/telemetry/metrics.yaml
158+
159+
# Check mounted secrets
160+
ls -la /etc/ssl/certs/
161+
ls -la /var/run/secrets/
162+
163+
# Validate configuration files
164+
find /etc -type f -exec file {} \; | grep -i yaml
165+
```
166+
167+
### 5. Log Analysis
168+
169+
```bash
170+
# Check operator logs from inside the pod
171+
# (Note: This shows logs from the main container, not the debug container)
172+
tail -f /proc/1/fd/1
173+
174+
# Or check specific log files if they exist
175+
find /var/log -type f 2>/dev/null
176+
```
177+
178+
## Custom Debug Images
179+
180+
For enhanced debugging capabilities, you can create a custom debug image with additional tools:
181+
182+
### Creating a Custom Debug Image
183+
184+
```dockerfile
185+
# Example Dockerfile for custom debug image
186+
FROM registry.redhat.io/ubi9/ubi:latest
187+
188+
# Install common debugging tools and swap to full curl
189+
RUN dnf update -y && \
190+
dnf swap -y libcurl-minimal libcurl && \
191+
dnf install -y \
192+
wget \
193+
bind-utils \
194+
procps-ng \
195+
&& dnf clean all
196+
197+
USER 1001
198+
CMD ["/bin/sleep", "infinity"]
199+
```
200+
201+
### Using a Custom Image
202+
203+
Update the image reference in your debug container patch:
204+
205+
```yaml
206+
spec:
207+
template:
208+
spec:
209+
containers:
210+
- name: debug-tools
211+
image: quay.io/your-org/debug-tools:latest # Your custom image
212+
command: ["/bin/sleep", "infinity"]
213+
# ... rest of container spec
214+
```
215+
216+
## Security Considerations
217+
218+
The debug container runs with a restricted security context:
219+
220+
- **No privilege escalation** - `allowPrivilegeEscalation: false`
221+
- **Dropped capabilities** - All Linux capabilities are dropped
222+
- **Non-root user** - Runs as non-root user (UID 1001)
223+
- **Same permissions** - Uses the same ServiceAccount as the operator
224+
- **Network isolation** - Shares the pod network namespace
225+
226+
## Troubleshooting
227+
228+
### Debug Container Won't Start
229+
230+
```bash
231+
# Check pod events
232+
kubectl describe pod -n openshift-monitoring -l app.kubernetes.io/name=cluster-monitoring-operator
233+
234+
# Check container status
235+
kubectl get pods -n openshift-monitoring -l app.kubernetes.io/name=cluster-monitoring-operator -o jsonpath='{.items[0].status.containerStatuses}'
236+
237+
# Verify image pull
238+
kubectl get events -n openshift-monitoring --field-selector reason=Failed
239+
```
240+
241+
### Cannot Access Debug Container
242+
243+
```bash
244+
# Verify container is running
245+
kubectl get pods -n openshift-monitoring -l app.kubernetes.io/name=cluster-monitoring-operator
246+
247+
# Check if debug container exists
248+
kubectl get pods -n openshift-monitoring -l app.kubernetes.io/name=cluster-monitoring-operator -o jsonpath='{.items[0].spec.containers[*].name}'
249+
250+
# Try accessing by pod name instead of deployment
251+
kubectl get pods -n openshift-monitoring -l app.kubernetes.io/name=cluster-monitoring-operator
252+
kubectl exec -n openshift-monitoring pod/cluster-monitoring-operator-abc123 -c debug-tools -- /bin/bash
253+
```

manifests/0000_50_cluster-monitoring-operator_05-deployment.yaml

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -100,3 +100,31 @@ spec:
100100
- mountPath: /etc/cluster-monitoring-operator/telemetry
101101
name: telemetry-config
102102
readOnly: true
103+
# DEBUG TOOLS SIDECAR (OPTIONAL)
104+
# Uncomment the debug-tools container below to enable debugging capabilities
105+
#
106+
# To enable:
107+
# 1. Uncomment the debug-tools container below
108+
# 2. Replace the image with your debug tools image (optional - UBI minimal works)
109+
# 3. Apply the updated deployment
110+
#
111+
# Usage: kubectl exec -n openshift-monitoring deployment/cluster-monitoring-operator -c debug-tools -- /bin/bash
112+
#
113+
# - name: debug-tools
114+
# # TODO: Replace with your debug tools image at quay.io/openshift/[debug-image-name]:latest
115+
# # UBI9 includes dnf and more packages. Can upgrade curl and install debugging tools.
116+
# image: registry.redhat.io/ubi9/ubi:latest # Full UBI9 with dnf package manager
117+
# command: ["/bin/bash", "-c", "dnf swap -y libcurl-minimal libcurl && sleep infinity"]
118+
# resources:
119+
# requests:
120+
# cpu: 10m
121+
# memory: 32Mi
122+
# limits:
123+
# cpu: 50m
124+
# memory: 128Mi
125+
# securityContext:
126+
# allowPrivilegeEscalation: false
127+
# capabilities:
128+
# drop: ["ALL"]
129+
# runAsNonRoot: true
130+
# terminationMessagePolicy: FallbackToLogsOnError

manifests/image-references

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -62,6 +62,7 @@ spec:
6262
from:
6363
kind: DockerImage
6464
name: quay.io/openshift/origin-monitoring-plugin:latest
65+
6566
- name: kube-metrics-server
6667
from:
6768
kind: DockerImage

pkg/manifests/config.go

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -268,6 +268,7 @@ type Images struct {
268268
TelemeterClient string
269269
Thanos string
270270
MonitoringPlugin string
271+
DebugTools string
271272
}
272273

273274
type HTTPConfig struct {
@@ -505,6 +506,7 @@ func (c *Config) SetImages(images map[string]string) {
505506
c.Images.OpenShiftStateMetrics = images["openshift-state-metrics"]
506507
c.Images.Thanos = images["thanos"]
507508
c.Images.MonitoringPlugin = images["monitoring-plugin"]
509+
c.Images.DebugTools = images["debug-tools"]
508510
}
509511

510512
func (c *Config) SetTelemetryMatches(matches []string) {

pkg/operator/operator.go

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -258,7 +258,6 @@ func New(
258258
configMapName: configMapName,
259259
userWorkloadConfigMapName: userWorkloadConfigMapName,
260260
remoteWrite: remoteWrite,
261-
CollectionProfilesEnabled: false,
262261
namespace: namespace,
263262
namespaceUserWorkload: namespaceUserWorkload,
264263
client: c,

0 commit comments

Comments
 (0)