Skip to content

NPE After 1.3 Upgrade with Amazon MSK #1419

@thebeanogamer

Description

@thebeanogamer

Issue submitter TODO list

  • I've looked up my issue in FAQ
  • I've searched for an already existing issues here
  • I've tried running main-labeled docker image and the issue still persists there
  • I'm running a supported version of the application which is listed here

Describe the bug (actual behavior)

We run Kafbat against Amazon MSK clusters running 4.0.x. On Kafbat 1.2 this worked fine, but since upgrading to 1.3 we get an exception whenever Kafbat tries to initialise its admin client.

ERROR [parallel-1] i.k.u.s.StatisticsService: Failed to collect cluster demo info
java.lang.IllegalStateException: Error while creating AdminClient for the cluster demo
	at io.kafbat.ui.service.AdminClientServiceImpl.lambda$createAdminClient$5(AdminClientServiceImpl.java:58)
	at reactor.core.publisher.Mono.lambda$onErrorMap$29(Mono.java:3862)
	at reactor.core.publisher.FluxOnErrorResume$ResumeSubscriber.onError(FluxOnErrorResume.java:94)
	at reactor.core.publisher.MonoFlatMap$FlatMapMain.secondError(MonoFlatMap.java:241)
	at reactor.core.publisher.MonoFlatMap$FlatMapInner.onError(MonoFlatMap.java:315)
	at reactor.core.publisher.MonoPeekTerminal$MonoTerminalPeekSubscriber.onError(MonoPeekTerminal.java:258)
	at reactor.core.publisher.FluxMap$MapConditionalSubscriber.onError(FluxMap.java:265)
	at reactor.core.publisher.Operators$MonoSubscriber.onError(Operators.java:1911)
	at reactor.core.publisher.MonoCacheTime$CoordinatorSubscriber.signalCached(MonoCacheTime.java:340)
	at reactor.core.publisher.MonoCacheTime$CoordinatorSubscriber.onError(MonoCacheTime.java:363)
	at reactor.core.publisher.MonoFlatMap$FlatMapMain.secondError(MonoFlatMap.java:241)
	at reactor.core.publisher.MonoFlatMap$FlatMapInner.onError(MonoFlatMap.java:315)
	at reactor.core.publisher.MonoFlatMap$FlatMapMain.onNext(MonoFlatMap.java:136)
	at reactor.core.publisher.MonoZip$ZipCoordinator.signal(MonoZip.java:297)
	at reactor.core.publisher.MonoZip$ZipInner.onNext(MonoZip.java:478)
	at reactor.core.publisher.MonoPublishOn$PublishOnSubscriber.run(MonoPublishOn.java:181)
	at reactor.core.scheduler.SchedulerTask.call(SchedulerTask.java:68)
	at reactor.core.scheduler.SchedulerTask.call(SchedulerTask.java:28)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317)
	at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
	at java.base/java.lang.Thread.run(Thread.java:1583)
Caused by: java.lang.NullPointerException: null
	at java.base/java.util.Objects.requireNonNull(Objects.java:233)
	at java.base/java.util.Optional.of(Optional.java:113)
	at io.kafbat.ui.service.ReactiveAdminClient$ConfigRelatedInfo.lambda$extract$2(ReactiveAdminClient.java:166)
	at reactor.core.publisher.MonoFlatMap$FlatMapMain.onNext(MonoFlatMap.java:132)
	... 10 common frames omitted

Attaching the debugger, it looks like this property shows up from the Broker as null.

Image

This aligns with how the official Kafka client sees things. In fact, all sensitive=true properties seem to return null. The Kafka docs say this is expected.

Return whether the config value is sensitive. The value is always set to null by the broker if the config value is sensitive.

$ ./bin/kafka-configs.sh --bootstrap-server $DEMO --entity-type  brokers --entity-name 1 --all --describe --command-config /config/config.properties | grep "inter.broker.protocol.version"
  inter.broker.protocol.version=null sensitive=true synonyms={STATIC_BROKER_CONFIG:inter.broker.protocol.version=null}

Googling around, I can't see any immediate way to ask the broker for the value of sensitive config. I'm not clear what io.kafbat.ui.service.ReactiveAdminClient.ConfigRelatedInfo#extract does, so unsure of the best fix.

Expected behavior

Kafbat should connect to the cluster without issue.

Your installation details

ghcr.io/kafbat/kafka-ui:v1.3.0 container running under Kubernetes.

kafka:
  clusters:
    - name: demo
      bootstrap-servers: "REDACTED"
      read-only: true
      ssl:
        truststore-location: /config/cacerts
        truststore-password: changeit
        verify-ssl: true
      metrics:
        type: PROMETHEUS
        port: 11001
        ssl: false

      properties:
        security.protocol: SASL_SSL
        sasl.mechanism: AWS_MSK_IAM
        sasl.jaas.config: "software.amazon.msk.auth.iam.IAMLoginModule required;"
        sasl.client.callback.handler.class: software.amazon.msk.auth.iam.IAMClientCallbackHandler

  internalTopicPrefix: "__"

  admin-client-timeout: 30000

  polling:
    poll-timeout-ms: 1000
    max-page-size: 500
    default-page-size: 100

webclient:
  response-timeout-ms: 20000
  max-in-memory-buffer-size: 20MB
management:
  endpoint:
    info:
      enabled: true
    health:
      enabled: true
  endpoints:
    web:
      exposure:
        include: "info,health,prometheus"
logging:
  level:
    root: INFO
    io.kafbat.ui: DEBUG
    reactor.netty.http.server.AccessLog: INFO
    org.hibernate.validator: WARN

Steps to reproduce

  1. Create an Amazon MSK Cluster running 4.0.x with the default config.
  2. Put the bootstrap URLs into Kafbat's config.
  3. Start Kafbat.

Screenshots

No response

Logs

Stack trace was provided above.

Additional context

I've raised a ticket with Amazon asking why they set this property but mark it as sensitive. Interestingly Confluent don't set it at all on their containers.

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions