Skip to content

Conversation

@maltesander
Copy link
Member

@maltesander maltesander commented Nov 4, 2025

Description

2025-11-03 13:03:58,809 INFO  ha.ActiveStandbyElector (ActiveStandbyElector.java:processWatchEvent(649)) - Session connected.
2025-11-03 13:03:58,810 INFO  ha.ActiveStandbyElector (ActiveStandbyElector.java:fenceOldActive(1019)) - Checking for any old active which needs to be fenced...
2025-11-03 13:03:58,811 INFO  ha.ActiveStandbyElector (ActiveStandbyElector.java:fenceOldActive(1040)) - Old node exists: 0a04686466731217686466732d6e616d656e6f64652d64656661756c742d301a47686466732d6e616d656e6f64652d64656661756c742d302e686466732d6e616d656e6f64652d64656661756c742e64656661756c742e7376632e636c75737465722e6c6f63616c20d43e28d33e
2025-11-03 13:03:58,812 WARN  ha.ActiveStandbyElector (ActiveStandbyElector.java:becomeActive(952)) - Exception handling the winning of election
java.lang.RuntimeException: Mismatched address stored in ZK for NameNode at hdfs-namenode-default-0.hdfs-namenode-default-headless.default.svc.cluster.local/100.97.191.176:8020: Stored protobuf was nameserviceId: "hdfs"
namenodeId: "hdfs-namenode-default-0"
hostname: "hdfs-namenode-default-0.hdfs-namenode-default.default.svc.cluster.local"
port: 8020
zkfcPort: 8019
, address from our own configuration for this NameNode was hdfs-namenode-default-0.hdfs-namenode-default-headless.default.svc.cluster.local/100.97.191.176:8020
        at org.apache.hadoop.hdfs.tools.DFSZKFailoverController.dataToTarget(DFSZKFailoverController.java:91)
        at org.apache.hadoop.ha.ZKFailoverController.fenceOldActive(ZKFailoverController.java:533)
        at org.apache.hadoop.ha.ZKFailoverController.access$1100(ZKFailoverController.java:65)
        at org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.fenceOldActive(ZKFailoverController.java:973)
        at org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:1044)
        at org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:943)
        at org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:509)
        at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:675)
        at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:554)
2025-11-03 13:03:58,812 INFO  ha.ActiveStandbyElector (ActiveStandbyElector.java:reJoinElection(799)) - Trying to re-establish ZK session
2025-11-03 13:03:58,916 INFO  zookeeper.ZooKeeper (ZooKeeper.java:close(1232)) - Session: 0x100088e24760797 closed
2025-11-03 13:03:59,917 INFO  zookeeper.ZooKeeper (ZooKeeper.java:<init>(637)) - Initiating client connection, connectString=zookeeper-server.default.svc.cluster.local:2282/znode-2aa300aa-3042-43ca-8076-4a224c72dea6 sessionTimeout=10000 watcher=org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef@6cdbb1a5
2025-11-03 13:03:59,917 INFO  zookeeper.ClientCnxnSocket (ClientCnxnSocket.java:initProperties(239)) - jute.maxbuffer value is 1048575 Bytes
2025-11-03 13:03:59,917 INFO  zookeeper.ClientCnxn (ClientCnxn.java:initRequestTimeout(1747)) - zookeeper.request.timeout value is 0. feature enabled=false
2025-11-03 13:03:59,917 INFO  zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(1177)) - Opening socket connection to server zookeeper-server.default.svc.cluster.local/100.64.47.126:2282.
2025-11-03 13:03:59,918 INFO  zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(1179)) - SASL config status: Will not attempt to authenticate using SASL (unknown error)
2025-11-03 13:03:59,918 INFO  zookeeper.ClientCnxn (ClientCnxn.java:primeConnection(1013)) - Socket connection established, initiating session, client: /100.97.191.176:34248, server: zookeeper-server.default.svc.cluster.local/100.64.47.126:2282
2025-11-03 13:03:59,926 INFO  zookeeper.ClientCnxn (ClientCnxn.java:onConnected(1453)) - Session establishment complete on server zookeeper-server.default.svc.cluster.local/100.64.47.126:2282, session id = 0x100088e24760799, negotiated timeout = 10000
2025-11-03 13:03:59,926 WARN  ha.ActiveStandbyElector (ActiveStandbyElector.java:isStaleClient(1176)) - Ignoring stale result from old client with sessionId 0x100088e24760797
2025-11-03 13:03:59,926 INFO  zookeeper.ClientCnxn (ClientCnxn.java:run(569)) - EventThread shut down for session: 0x100088e24760797

Definition of Done Checklist

  • Not all of these items are applicable to all PRs, the author should update this template to only leave the boxes in that are relevant
  • Please make sure all these things are done and tick the boxes

Author

  • Changes are OpenShift compatible
  • CRD changes approved
  • CRD documentation for all fields, following the style guide.
  • Helm chart can be installed and deployed operator works
  • Integration tests passed (for non trivial changes)
  • Changes need to be "offline" compatible
  • Links to generated (nightly) docs added
  • Release note snippet added

Reviewer

  • Code contains useful comments
  • Code contains useful logging statements
  • (Integration-)Test cases added
  • Documentation added or updated. Follows the style guide.
  • Changelog updated
  • Cargo.toml only contains references to git tags (not specific commits or branches)

Acceptance

  • Feature Tracker has been updated
  • Proper release label has been added
  • Links to generated (nightly) docs added
  • Release note snippet added
  • Add type/deprecation label & add to the deprecation schedule
  • Add type/experimental label & add to the experimental features tracker

sbernauer
sbernauer previously approved these changes Nov 4, 2025
Copy link
Member

@sbernauer sbernauer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did not test it but LGTM, thanks!

@sbernauer sbernauer moved this to Development: In Review in Stackable Engineering Nov 4, 2025
@sbernauer sbernauer changed the title revert/remove headless suffix from headless service fix: Revert/remove headless suffix from headless service Nov 4, 2025
@sbernauer sbernauer enabled auto-merge November 4, 2025 12:19
@sbernauer sbernauer moved this from Development: In Review to Development: Done in Stackable Engineering Nov 4, 2025
@sbernauer sbernauer added this pull request to the merge queue Nov 4, 2025
Merged via the queue into main with commit 3238cd1 Nov 4, 2025
17 checks passed
@sbernauer sbernauer deleted the fix/revert-headless-service-name branch November 4, 2025 13:20
@lfrancke lfrancke moved this from Development: Done to Acceptance: In Progress in Stackable Engineering Nov 11, 2025
@lfrancke lfrancke moved this from Acceptance: In Progress to Done in Stackable Engineering Dec 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Development

Successfully merging this pull request may close these issues.

4 participants