Skip to content

controller-manager pod crashloopback #1473

@cloudcafetech

Description

@cloudcafetech

Running on RKE2 + Kubevirt + CDI + Prometheus + Loki (Mono)

  • Deployment
helm repo add netobserv https://netobserv.io/static/helm/ --force-update
helm install netobserv --create-namespace -n netobserv --set standaloneConsole.enable=true netobserv/netobserv-operator
  • Error
# k get po -n netobserv
NAME                                           READY   STATUS             RESTARTS      AGE
netobserv-controller-manager-546bb84fb-ddn2k   0/1     CrashLoopBackOff   5 (62s ago)   4m20s

#k describe po netobserv-controller-manager-546bb84fb-ddn2k -n netobserv
Name:             netobserv-controller-manager-546bb84fb-ddn2k
Namespace:        netobserv
Priority:         0
Service Account:  netobserv-controller-manager
Node:             lenevo-ts-w2/192.168.0.119
Start Time:       Thu, 01 May 2025 02:49:15 +0000
Labels:           app=netobserv-operator
                  control-plane=controller-manager
                  pod-template-hash=546bb84fb
Annotations:      cni.projectcalico.org/containerID: 1326c203864d9fa3db82d55e04c90f82cf71f3414d84944be36425680baafce5
                  cni.projectcalico.org/podIP: 10.244.1.30/32
                  cni.projectcalico.org/podIPs: 10.244.1.30/32
                  k8s.v1.cni.cncf.io/network-status:
                    [{
                        "name": "k8s-pod-network",
                        "ips": [
                            "10.244.1.30"
                        ],
                        "default": true,
                        "dns": {}
                    }]
Status:           Running
IP:               10.244.1.30
IPs:
  IP:           10.244.1.30
Controlled By:  ReplicaSet/netobserv-controller-manager-546bb84fb
Containers:
  manager:
    Container ID:  containerd://0de483a18749525ca7105ab8b889f4bd2dbb432546236a06445fe90a60f7457a
    Image:         quay.io/netobserv/network-observability-operator:1.8.2-community
    Image ID:      quay.io/netobserv/network-observability-operator@sha256:ed1766e0ca5b94bdd4f645a5f5a38e31b92542b59da226cfeef3d9fc1ceffbac
    Port:          9443/TCP
    Host Port:     0/TCP
    Command:
      /manager
    Args:
      --health-probe-bind-address=:8081
      --metrics-bind-address=:8443
      --leader-elect
      --ebpf-agent-image=$(RELATED_IMAGE_EBPF_AGENT)
      --flowlogs-pipeline-image=$(RELATED_IMAGE_FLOWLOGS_PIPELINE)
      --console-plugin-image=$(RELATED_IMAGE_CONSOLE_PLUGIN)
      --downstream-deployment=$(DOWNSTREAM_DEPLOYMENT)
      --profiling-bind-address=$(PROFILING_BIND_ADDRESS)
      --metrics-cert-file=/etc/tls/private/tls.crt
      --metrics-cert-key-file=/etc/tls/private/tls.key
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Thu, 01 May 2025 02:52:33 +0000
      Finished:     Thu, 01 May 2025 02:52:33 +0000
    Ready:          False
    Restart Count:  5
    Limits:
      memory:  400Mi
    Requests:
      cpu:      100m
      memory:   100Mi
    Liveness:   http-get http://:8081/healthz delay=15s timeout=1s period=20s #success=1 #failure=3
    Readiness:  http-get http://:8081/readyz delay=5s timeout=1s period=10s #success=1 #failure=3
    Environment:
      RELATED_IMAGE_EBPF_AGENT:         quay.io/netobserv/netobserv-ebpf-agent:v1.8.2-community
      RELATED_IMAGE_FLOWLOGS_PIPELINE:  quay.io/netobserv/flowlogs-pipeline:v1.8.2-community
      RELATED_IMAGE_CONSOLE_PLUGIN:     quay.io/netobserv/network-observability-standalone-frontend:v1.8.2-community
      DOWNSTREAM_DEPLOYMENT:            false
      PROFILING_BIND_ADDRESS:
    Mounts:
      /etc/tls/private from manager-metric-tls (ro)
      /tmp/k8s-webhook-server/serving-certs from cert (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-2842z (ro)
Conditions:
  Type                        Status
  PodReadyToStartContainers   True
  Initialized                 True
  Ready                       False
  ContainersReady             False
  PodScheduled                True
Volumes:
  cert:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  webhook-server-cert
    Optional:    false
  manager-metric-tls:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  manager-metrics-tls
    Optional:    false
  kube-api-access-2842z:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason          Age                   From               Message
  ----     ------          ----                  ----               -------
  Normal   Scheduled       4m4s                  default-scheduler  Successfully assigned netobserv/netobserv-controller-manager-546bb84fb-ddn2k to lenevo-ts-w2
  Normal   AddedInterface  4m3s                  multus             Add eth0 [10.244.1.30/32] from k8s-pod-network
  Normal   Pulled          4m1s                  kubelet            Successfully pulled image "quay.io/netobserv/network-observability-operator:1.8.2-community" in 1.101s (1.101s including waiting). Image size: 82112140 bytes.
  Normal   Pulled          4m                    kubelet            Successfully pulled image "quay.io/netobserv/network-observability-operator:1.8.2-community" in 1.238s (1.238s including waiting). Image size: 82112140 bytes.
  Normal   Pulled          3m42s                 kubelet            Successfully pulled image "quay.io/netobserv/network-observability-operator:1.8.2-community" in 1.013s (1.013s including waiting). Image size: 82112140 bytes.
  Normal   Pulling         3m11s (x4 over 4m2s)  kubelet            Pulling image "quay.io/netobserv/network-observability-operator:1.8.2-community"
  Normal   Created         3m10s (x4 over 4m1s)  kubelet            Created container: manager
  Normal   Started         3m10s (x4 over 4m1s)  kubelet            Started container manager
  Normal   Pulled          3m10s                 kubelet            Successfully pulled image "quay.io/netobserv/network-observability-operator:1.8.2-community" in 1.01s (1.01s including waiting). Image size: 82112140 bytes.
  Warning  BackOff         3m3s (x9 over 3m59s)  kubelet            Back-off restarting failed container manager in pod netobserv-controller-manager-546bb84fb-ddn2k_netobserv(d7102a88-1061-41ed-8895-9681609215c7)

#k logs -f netobserv-controller-manager-546bb84fb-ddn2k -n netobserv
2025-05-01T02:52:33.530Z        INFO    setup   Starting netobserv-operator [build version: main-ab3524e, build date: 2025-03-20 11:39]
2025-05-01T02:52:33.561Z        INFO    setup   Initializing metrics certificate watcher using provided certificates    {"metrics-cert-file": "/etc/tls/private/tls.crt", "metrics-cert-key-file": "/etc/tls/private/tls.key"}
2025-05-01T02:52:33.562Z        INFO    controller-runtime.certwatcher  Updated current TLS certificate
2025-05-01T02:52:33.562Z        INFO    Creating manager
2025-05-01T02:52:33.563Z        INFO    Discovering APIs
2025-05-01T02:52:33.599Z        ERROR   setup   unable to setup manager {"error": "can't collect cluster info: unable to retrieve the complete list of server APIs: upload.cdi.kubevirt.io/v1beta1: stale GroupVersion discovery: upload.cdi.kubevirt.io/v1beta1"}
main.main
        /opt/app-root/main.go:190
runtime.main
        /usr/local/go/src/runtime/proc.go:272

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions