-
Notifications
You must be signed in to change notification settings - Fork 837
Description
What happened:
The subscriber-config ConfigMap is initialized with a hardcoded localhost:32506 URL during Litmus installation, causing the subscriber pod to crash with CrashLoopBackOff. The pod cannot connect to the Litmus server because localhost doesn't resolve inside Kubernetes pods.
Error observed:
time="2026-01-03T04:16:11Z" level=fatal msg="Failed to confirm agent"
error="Post \"http://localhost:32506/api/query\": dial tcp [::1]:32506: connect: connection refused"
Pod status:
NAME READY STATUS RESTARTS AGE
subscriber-77764cd5cb-xxxxx 0/1 CrashLoopBackOff 5 5m
What you expected to happen:
The subscriber-config ConfigMap should be initialized with the correct Kubernetes service name and port:
- Service Name:
chaos-litmus-server-service(or appropriate service name) - Port:
9002(GraphQL server port from the service) - Endpoint:
/query(GraphQL query endpoint)
Expected ConfigMap value:
SERVER_ADDR: http://chaos-litmus-server-service:9002/queryWhere can this issue be corrected? (optional)
The issue appears to be in the Litmus installation/initialization code that creates the subscriber-config ConfigMap. This could be in:
- Helm chart templates (if using Helm installation)
- Operator initialization code
- ConfigMap creation logic in the installation manifests
The ConfigMap is created in the litmus namespace with the name subscriber-config.
How to reproduce it (as minimally and precisely as possible):
-
Install Litmus 3.24.0 using Helm or kubectl apply:
# Using Helm helm repo add litmuschaos https://litmuschaos.github.io/litmus-helm/ helm install chaos litmuschaos/litmus --namespace=litmus --create-namespace # OR using kubectl (apply the provided YAML) kubectl apply -f LitmusChaos-Local-Development.yml
-
Check the subscriber-config ConfigMap:
kubectl get configmap subscriber-config -n litmus -o yaml
-
Observe the incorrect SERVER_ADDR value:
data: SERVER_ADDR: http://localhost:32506/api/query
-
Check subscriber pod status:
kubectl get pods -n litmus -l app=subscriber
-
Check subscriber pod logs:
kubectl logs -n litmus -l app=subscriber --tail=20
-
Expected result: Pod will be in
CrashLoopBackOffstate with connection refused errors.
Environment Details:
- Litmus Version: 3.24.0
- Kubernetes Version: Rancher Desktop (K3s)
- Platform: macOS (ARM64/Apple Silicon)
- Installation Method: kubectl apply (using provided YAML file)
- Namespace:
litmus
Workaround:
Manually patch the ConfigMap to use the correct service name:
# Get the correct service name and port
kubectl get svc -n litmus | grep litmus-server
# Patch the ConfigMap
kubectl patch configmap subscriber-config -n litmus --type merge \
-p '{"data":{"SERVER_ADDR":"http://chaos-litmus-server-service:9002/query"}}'
# Delete the subscriber pod to restart with new config
kubectl delete pod -n litmus -l app=subscriber
# Verify the pod is now running
kubectl get pods -n litmus -l app=subscriber
kubectl logs -n litmus -l app=subscriber --tail=20After the workaround, the subscriber should show:
AgentID: 8c97b69d-7bee-4802-8c14-fbf3b4bb1d1c has been confirmed
Connecting to ws://chaos-litmus-server-service:9002/query
Server connection established, Listening....
Anything else we need to know?:
-
Root Cause Analysis:
- The ConfigMap annotation shows this value was set during installation:
"SERVER_ADDR":"http://localhost:32506/api/query" - This appears to be a default value bug in the installation process
- The value
32506is a NodePort (external access), not the internal service port - The endpoint
/api/queryis incorrect (should be/query)
- The ConfigMap annotation shows this value was set during installation:
-
Impact:
- Subscriber pod cannot connect to server
- Infrastructure connection fails in Litmus UI
- No communication between subscriber and server
- Infrastructure status shows as disconnected
- This affects all fresh installations of Litmus 3.24.0
-
Service Information:
# Actual service configuration kubectl get svc chaos-litmus-server-service -n litmus # Shows ports: 9002 (GraphQL server) and 8000 (RPC server)
-
Related Components:
- ConfigMap:
subscriber-configinlitmusnamespace - Deployment:
subscriberinlitmusnamespace - Service:
chaos-litmus-server-serviceinlitmusnamespace
- ConfigMap:
-
Additional Notes:
- This bug affects the subscriber component specifically
- Other components (operator, exporter, event-tracker) work correctly
- The bug is reproducible on fresh installations
- The workaround is effective but should not be necessary