Skip to content

Subscriber ConfigMap Initialized with Hardcoded localhost URL #5369

@dalekurt

Description

@dalekurt

What happened:

The subscriber-config ConfigMap is initialized with a hardcoded localhost:32506 URL during Litmus installation, causing the subscriber pod to crash with CrashLoopBackOff. The pod cannot connect to the Litmus server because localhost doesn't resolve inside Kubernetes pods.

Error observed:

time="2026-01-03T04:16:11Z" level=fatal msg="Failed to confirm agent" 
error="Post \"http://localhost:32506/api/query\": dial tcp [::1]:32506: connect: connection refused"

Pod status:

NAME                          READY   STATUS             RESTARTS   AGE
subscriber-77764cd5cb-xxxxx   0/1     CrashLoopBackOff   5          5m

What you expected to happen:

The subscriber-config ConfigMap should be initialized with the correct Kubernetes service name and port:

  • Service Name: chaos-litmus-server-service (or appropriate service name)
  • Port: 9002 (GraphQL server port from the service)
  • Endpoint: /query (GraphQL query endpoint)

Expected ConfigMap value:

SERVER_ADDR: http://chaos-litmus-server-service:9002/query

Where can this issue be corrected? (optional)

The issue appears to be in the Litmus installation/initialization code that creates the subscriber-config ConfigMap. This could be in:

  • Helm chart templates (if using Helm installation)
  • Operator initialization code
  • ConfigMap creation logic in the installation manifests

The ConfigMap is created in the litmus namespace with the name subscriber-config.

How to reproduce it (as minimally and precisely as possible):

  1. Install Litmus 3.24.0 using Helm or kubectl apply:

    # Using Helm
    helm repo add litmuschaos https://litmuschaos.github.io/litmus-helm/
    helm install chaos litmuschaos/litmus --namespace=litmus --create-namespace
    
    # OR using kubectl (apply the provided YAML)
    kubectl apply -f LitmusChaos-Local-Development.yml
  2. Check the subscriber-config ConfigMap:

    kubectl get configmap subscriber-config -n litmus -o yaml
  3. Observe the incorrect SERVER_ADDR value:

    data:
      SERVER_ADDR: http://localhost:32506/api/query
  4. Check subscriber pod status:

    kubectl get pods -n litmus -l app=subscriber
  5. Check subscriber pod logs:

    kubectl logs -n litmus -l app=subscriber --tail=20
  6. Expected result: Pod will be in CrashLoopBackOff state with connection refused errors.

Environment Details:

  • Litmus Version: 3.24.0
  • Kubernetes Version: Rancher Desktop (K3s)
  • Platform: macOS (ARM64/Apple Silicon)
  • Installation Method: kubectl apply (using provided YAML file)
  • Namespace: litmus

Workaround:

Manually patch the ConfigMap to use the correct service name:

# Get the correct service name and port
kubectl get svc -n litmus | grep litmus-server

# Patch the ConfigMap
kubectl patch configmap subscriber-config -n litmus --type merge \
  -p '{"data":{"SERVER_ADDR":"http://chaos-litmus-server-service:9002/query"}}'

# Delete the subscriber pod to restart with new config
kubectl delete pod -n litmus -l app=subscriber

# Verify the pod is now running
kubectl get pods -n litmus -l app=subscriber
kubectl logs -n litmus -l app=subscriber --tail=20

After the workaround, the subscriber should show:

AgentID: 8c97b69d-7bee-4802-8c14-fbf3b4bb1d1c has been confirmed
Connecting to ws://chaos-litmus-server-service:9002/query
Server connection established, Listening....

Anything else we need to know?:

  1. Root Cause Analysis:

    • The ConfigMap annotation shows this value was set during installation: "SERVER_ADDR":"http://localhost:32506/api/query"
    • This appears to be a default value bug in the installation process
    • The value 32506 is a NodePort (external access), not the internal service port
    • The endpoint /api/query is incorrect (should be /query)
  2. Impact:

    • Subscriber pod cannot connect to server
    • Infrastructure connection fails in Litmus UI
    • No communication between subscriber and server
    • Infrastructure status shows as disconnected
    • This affects all fresh installations of Litmus 3.24.0
  3. Service Information:

    # Actual service configuration
    kubectl get svc chaos-litmus-server-service -n litmus
    # Shows ports: 9002 (GraphQL server) and 8000 (RPC server)
  4. Related Components:

    • ConfigMap: subscriber-config in litmus namespace
    • Deployment: subscriber in litmus namespace
    • Service: chaos-litmus-server-service in litmus namespace
  5. Additional Notes:

    • This bug affects the subscriber component specifically
    • Other components (operator, exporter, event-tracker) work correctly
    • The bug is reproducible on fresh installations
    • The workaround is effective but should not be necessary

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions