Skip to content

[Bug]: CruiseControlMetricsReporter cannot connect to broker after upgrading operater to 0.47.0 #11868

@kyleli666

Description

@kyleli666

Bug Description

CruiseControlMetricsReporter cannot connect to broker after upgrading operater to 0.47.0

2025-09-15 11:19:08 INFO  [kafka-producer-network-thread | CruiseControlMetricsReporter] Metadata:314 - [Producer clientId=CruiseControlMetricsReporter] Rebootstrapping with [kafka-sgp-kafka-brokers/10.4.12.228:9091]
2025-09-15 11:19:08 INFO  [kafka-producer-network-thread | CruiseControlMetricsReporter] Metadata:314 - [Producer clientId=CruiseControlMetricsReporter] Rebootstrapping with [kafka-sgp-kafka-brokers/10.4.12.228:9091]
2025-09-15 11:19:08 INFO  [kafka-producer-network-thread | CruiseControlMetricsReporter] Metadata:314 - [Producer clientId=CruiseControlMetricsReporter] Rebootstrapping with [kafka-sgp-kafka-brokers/10.4.12.228:9091]
2025-09-15 11:19:08 INFO  [kafka-producer-network-thread | CruiseControlMetricsReporter] Metadata:314 - [Producer clientId=CruiseControlMetricsReporter] Rebootstrapping with [kafka-sgp-kafka-brokers/10.4.12.228:9091]
2025-09-15 11:19:08 INFO  [kafka-producer-network-thread | CruiseControlMetricsReporter] Metadata:314 - [Producer clientId=CruiseControlMetricsReporter] Rebootstrapping with [kafka-sgp-kafka-brokers/10.4.12.228:9091]
2025-09-15 11:19:08 INFO  [kafka-producer-network-thread | CruiseControlMetricsReporter] Metadata:314 - [Producer clientId=CruiseControlMetricsReporter] Rebootstrapping with [kafka-sgp-kafka-brokers/10.4.12.228:9091]
2025-09-15 11:19:08 INFO  [kafka-producer-network-thread | CruiseControlMetricsReporter] Metadata:314 - [Producer clientId=CruiseControlMetricsReporter] Rebootstrapping with [kafka-sgp-kafka-brokers/10.4.12.228:9091]
2025-09-15 11:19:08 INFO  [kafka-producer-network-thread | CruiseControlMetricsReporter] NetworkClient:1072 - [Producer clientId=CruiseControlMetricsReporter] Node -1 disconnected.
2025-09-15 11:19:08 WARN  [kafka-producer-network-thread | CruiseControlMetricsReporter] NetworkClient:899 - [Producer clientId=CruiseControlMetricsReporter] Connection to node -1 (kafka-sgp-kafka-brokers/10.4.12.228:9091) could not be established. Node may not be available.
2025-09-15 11:19:08 WARN  [kafka-producer-network-thread | CruiseControlMetricsReporter] NetworkClient:1255 - [Producer clientId=CruiseControlMetricsReporter] Bootstrap broker kafka-sgp-kafka-brokers:9091 (id: -1 rack: null isFenced: false) disconnected
2025-09-15 11:19:09 INFO  [kafka-producer-network-thread | CruiseControlMetricsReporter] Metadata:314 - [Producer clientId=CruiseControlMetricsReporter] Rebootstrapping with [kafka-sgp-kafka-brokers/10.4.12.228:9091]
2025-09-15 11:19:09 INFO  [kafka-producer-network-thread | CruiseControlMetricsReporter] Metadata:314 - [Producer clientId=CruiseControlMetricsReporter] Rebootstrapping with [kafka-sgp-kafka-brokers/10.4.12.228:9091]
2025-09-15 11:19:09 INFO  [kafka-producer-network-thread | CruiseControlMetricsReporter] NetworkClient:1072 - [Producer clientId=CruiseControlMetricsReporter] Node -1 disconnected.
2025-09-15 11:19:09 WARN  [kafka-producer-network-thread | CruiseControlMetricsReporter] NetworkClient:899 - [Producer clientId=CruiseControlMetricsReporter] Connection to node -1 (kafka-sgp-kafka-brokers/10.4.16.150:9091) could not be established. Node may not be available.
2025-09-15 11:19:09 WARN  [kafka-producer-network-thread | CruiseControlMetricsReporter] NetworkClient:1255 - [Producer clientId=CruiseControlMetricsReporter] Bootstrap broker kafka-sgp-kafka-brokers:9091 (id: -1 rack: null isFenced: false) disconnected
2025-09-15 11:19:10 INFO  [kafka-producer-network-thread | CruiseControlMetricsReporter] NetworkClient:1072 - [Producer clientId=CruiseControlMetricsReporter] Node -1 disconnected.
2025-09-15 11:19:10 WARN  [kafka-producer-network-thread | CruiseControlMetricsReporter] NetworkClient:899 - [Producer clientId=CruiseControlMetricsReporter] Connection to node -1 (kafka-sgp-kafka-brokers/10.4.19.165:9091) could not be established. Node may not be available.
2025-09-15 11:19:10 WARN  [kafka-producer-network-thread | CruiseControlMetricsReporter] NetworkClient:1255 - [Producer clientId=CruiseControlMetricsReporter] Bootstrap broker kafka-sgp-kafka-brokers:9091 (id: -1 rack: null isFenced: false) disconnected

Steps to reproduce

  1. Install strimzi operator 0.46.1
  2. create kafka resources, we can see CruiseControlMetrics metrics produced in kafka topics.
  3. upgrade strimzi operator 0.47.0 and wait for kafka cluster to be finally stable
  4. CruiseControlMetrics stop producing in kafka and we see above errors.

Expected behavior

CruiseControlMetrics metrics produced in kafka topics

Strimzi version

0.47.0

Kubernetes version

Kubernetes 1.31

Installation method

Helm chart

Infrastructure

Amazon EKS

Configuration files and logs

my resouces: (reposrt.sh failed with error: Could not find any processes matching : 'Thread.print')

---
apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaNodePool
metadata:
  name: controller
  namespace: middleware
  labels:
    strimzi.io/cluster: kafka-sgp
  annotations:
    strimzi.io/next-node-ids: "[991-999]"
spec:
  replicas: 3
  roles:
    - controller
  storage:
    type: jbod
    volumes:
      - id: 0
        type: persistent-claim
        size: 1Gi
        class: gp3
        deleteClaim: true
  resources:
    requests:
      cpu: "0.5"
      memory: 1Gi
    limits:
      cpu: "1"
      memory: 2Gi
  template:
    pod:
      metadata:
        labels:
          sidecar.istio.io/inject: "true"
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
              - matchExpressions:
                  - key: nodeGroup
                    operator: In
                    values:
                      - Test
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
            - weight: 50
              podAffinityTerm:
                labelSelector:
                  matchLabels:
                    strimzi.io/cluster: kafka-sgp
                topologyKey: kubernetes.io/hostname
      topologySpreadConstraints:
        - maxSkew: 1
          topologyKey: topology.kubernetes.io/zone
          whenUnsatisfiable: ScheduleAnyway
          labelSelector:
            matchLabels:
              strimzi.io/cluster: kafka-sgp
---
apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaNodePool
metadata:
  name: broker
  namespace: middleware
  labels:
    strimzi.io/cluster: kafka-sgp
  annotations:
    strimzi.io/next-node-ids: "[1-888]"
spec:
  replicas: 3
  roles:
    - broker
  storage:
    type: jbod
    volumes:
      - id: 0
        type: persistent-claim
        size: 30Gi
        class: gp3
        deleteClaim: true
  resources:
    requests:
      cpu: 1000m
      memory: 6Gi
    limits:
      cpu: 2000m
      memory: 8Gi
  template:
    pod:
      metadata:
        labels:
          sidecar.istio.io/inject: "true"
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
              - matchExpressions:
                  - key: nodeGroup
                    operator: In
                    values:
                      - Test
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
            - weight: 50
              podAffinityTerm:
                labelSelector:
                  matchLabels:
                    strimzi.io/cluster: kafka-sgp
                topologyKey: kubernetes.io/hostname
      topologySpreadConstraints:
        - maxSkew: 1
          topologyKey: topology.kubernetes.io/zone
          whenUnsatisfiable: ScheduleAnyway
          labelSelector:
            matchLabels:
              strimzi.io/cluster: kafka-sgp
---
apiVersion: kafka.strimzi.io/v1beta2
kind: Kafka
metadata:
  name: kafka-sgp
  namespace: middleware
  annotations:
    strimzi.io/kraft: enabled
    strimzi.io/node-pools: enabled
spec:
  kafka:
#    version: 3.9.1
    #metadataVersion: 3.9-IV0
    version: 4.0.0
    listeners:
      - name: plain
        port: 9092
        type: internal
        tls: false
    config:
      # https://kafka.apache.org/documentation/#brokerconfigs
      auto.create.topics.enable: "false"
      num.partitions: 1
      # offsets.topic.replication.factor is for the offsets topic __consumer_offsets
#      offsets.topic.replication.factor: 1
      # default.replication.factor is for auto created topics
      default.replication.factor: 3
      min.insync.replicas: 1
      # max(8, vCPUs)
      num.io.threads: 8
      # max(5, vCPUs / 2)
      num.network.threads: 5
      # max(2, vCPUs / 4)
      num.replica.fetchers: 2
      connections.max.idle.ms: 1800000
      replica.lag.time.max.ms: 30000
      # socket.request.max.bytes < Java heap size, 104857600=100MiB
      socket.request.max.bytes: 104857600
      socket.receive.buffer.bytes: 102400
      socket.send.buffer.bytes: 102400
#      unclean.leader.election.enable: true
      # The minimum age of a log file to be eligible for deletion due to age
      log.retention.hours: 3
#      log.retention.minutes: 30
      log.retention.bytes: 800000000
      # The maximum size of a log segment file. When this size is reached a new log segment will be created.
      log.segment.bytes: 134217728
    rack:
      topologyKey: kubernetes.io/hostname
    metricsConfig:
      type: strimziMetricsReporter
      values:
        allowList:
          - "kafka_.*"
  cruiseControl:
    brokerCapacity:
      inboundNetwork: 125MiB/s
      outboundNetwork: 125MiB/s
    autoRebalance:
      - mode: add-brokers
      - mode: remove-brokers
    resources:
      requests:
        cpu: "0.5"
        memory: 512Mi
      limits:
        cpu: "2"
        memory: 1Gi
  kafkaExporter:
    resources:
      requests:
        cpu: 200m
        memory: 64Mi
      limits:
        cpu: 500m
        memory: 128Mi
---
# Help istio get cross-cluster pod and service dns records, and work as bootstrap while 9092 port excluded in istio.
apiVersion: v1
kind: Service
metadata:
  name: kafka-sgp-broker-list
  namespace: middleware
spec:
  selector:
    strimzi.io/broker-role: 'true'
    strimzi.io/cluster: kafka-sgp
  ports:
    - name: tcp-clients
      protocol: TCP
      port: 9092
      targetPort: tcp-clients
  clusterIP: None
  type: ClusterIP

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions