Skip to content

Conversation

@maltesander
Copy link
Member

Description

Based on stackabletech/docker-images#1359.
Part of #480.

Adapt the operator to the docker image changes for https://issues.apache.org/jira/browse/ZOOKEEPER-4276.

The clientPort and secureClientPort were removed and now set via the dynamic config.
The port unification was removed so that in TLS mode no plaintext connections can be established on the client port.

Definition of Done Checklist

  • Not all of these items are applicable to all PRs, the author should update this template to only leave the boxes in that are relevant
  • Please make sure all these things are done and tick the boxes

Author

  • Changes are OpenShift compatible
  • CRD changes approved
  • CRD documentation for all fields, following the style guide.
  • Helm chart can be installed and deployed operator works
  • Integration tests passed (for non trivial changes)
  • Changes need to be "offline" compatible
  • Links to generated (nightly) docs added
  • Release note snippet added

Reviewer

  • Code contains useful comments
  • Code contains useful logging statements
  • (Integration-)Test cases added
  • Documentation added or updated. Follows the style guide.
  • Changelog updated
  • Cargo.toml only contains references to git tags (not specific commits or branches)

Acceptance

  • Feature Tracker has been updated
  • Proper release label has been added
  • Links to generated (nightly) docs added
  • Release note snippet added
  • Add type/deprecation label & add to the deprecation schedule
  • Add type/experimental label & add to the experimental features tracker

@maltesander maltesander self-assigned this Dec 4, 2025
@maltesander maltesander moved this to Development: Waiting for Review in Stackable Engineering Dec 4, 2025
@adwk67 adwk67 self-requested a review December 5, 2025 12:52
@adwk67 adwk67 moved this from Development: Waiting for Review to Development: In Review in Stackable Engineering Dec 5, 2025
Copy link
Member

@adwk67 adwk67 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The image builds without problem and the smoke test runs fine. However, once the test is complete I see the following in the logs:

zookeeper 2025-12-05 14:52:01,995 [myid:] - ERROR [zkNetty-EpollEventLoopGroup-4-1:o.a.z.s.NettyServerCnxnFactory$CertificateVerifier@466] - Unsuccessful handshake with session 0x0
zookeeper 2025-12-05 14:52:01,995 [myid:] - WARN  [zkNetty-EpollEventLoopGroup-4-1:o.a.z.s.NettyServerCnxnFactory$CnxnChannelHandler@302] - Exception caught
zookeeper Caused by: io.netty.handler.ssl.NotSslRecordException: not an SSL/TLS record: 0000001d0000000000000000000000000000000000000000000000000000000000
zookeeper     at io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1353)
zookeeper     at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1428)
zookeeper     at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:530)
zookeeper     at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:469)
zookeeper

Also I couldn't find this that was referred to in the parent issue:

The necessary steps are documented in the code (see rust/crd/src/lib.rs)

Copy link
Member

@adwk67 adwk67 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also got some test failures with 3.9.4:

--- FAIL: kuttl (1298.87s)
    --- FAIL: kuttl/harness (0.00s)
        --- PASS: kuttl/harness/delete-rolegroup_zookeeper-3.9.4_openshift-false (126.37s)
        --- FAIL: kuttl/harness/znode_zookeeper-latest-3.9.4_openshift-false (600.52s)
        --- PASS: kuttl/harness/smoke_zookeeper-3.9.4_use-server-tls-true_use-client-auth-tls-true_openshift-false (52.45s)
        --- FAIL: kuttl/harness/cluster-operation_zookeeper-latest-3.9.4_openshift-false (601.25s)
        --- FAIL: kuttl/harness/logging_zookeeper-3.9.4_openshift-false (645.74s)
FAIL

@adwk67
Copy link
Member

adwk67 commented Dec 5, 2025

Also tests failing with 3.9.3:

zookeeper 2025-12-05 15:24:45,251 [myid:] - ERROR [main:o.a.z.s.q.QuorumPeerMain@99] - Invalid config, exiting abnormally
zookeeper org.apache.zookeeper.server.quorum.QuorumPeerConfig$ConfigException: test-zk-server-primary-0.test-zk-server-primary-headless.kuttl-test-smart-flea.svc.cluster.local:2888:3888;;2282 does not have the form server_config or server_config;client_config where server_config is the pipe separated list of host:port:port or host:port:port:type and client_config is port or host:port zookeeper atorg.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer.initializeWithAddressString(QuorumPeer.java:344)   

@maltesander
Copy link
Member Author

Sorry i had another commit that was not pushed cfd231a. That should fix the other 3.9.4 tests.

3.9.3 is currently not patched, question is if we do the work or remove it.

The error

EpollEventLoopGroup-4-1:o.a.z.s.NettyServerCnxnFactory$CnxnChannelHandler@302] - Exception caught
zookeeper Caused by: io.netty.handler.ssl.NotSslRecordException: not an SSL/TLS record: 0000001d0000000000000000000000000000000000000000000000000000000000

should only happen once the plain text communication is tested? Or did you see it multiple times?

@adwk67
Copy link
Member

adwk67 commented Dec 5, 2025

The error

EpollEventLoopGroup-4-1:o.a.z.s.NettyServerCnxnFactory$CnxnChannelHandler@302] - Exception caught
zookeeper Caused by: io.netty.handler.ssl.NotSslRecordException: not an SSL/TLS record: 0000001d0000000000000000000000000000000000000000000000000000000000

should only happen once the plain text communication is tested? Or did you see it multiple times?

It was in the pod logs multiple times.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Development: In Review

Development

Successfully merging this pull request may close these issues.

3 participants