Reducing io.aeron.cluster.client.AeronCluster leader down / failover detection time #1938

the-thing · 2026-02-02T10:48:07Z

the-thing
Feb 2, 2026

When there is a leader failover / shutdown etc., there is a period of time when io.aeron.cluster.client.AeronCluster is not aware of the leader being down and still allows messages to be sent that never reach the cluster - this is expected.

However, the detection time seems to always be ~5 seconds with the default configuration. I think that a 5-second client reaction time is not something that most systems are willing to tolerate. I tried to experiment semi randomly with different cluster / client timeout configurations, but I struggled to get satisfying results.

Can you please suggest client and cluster configurations that could help reduce io.aeron.cluster.client.AeronCluster leader down detection time? This most likely will have implications in other areas, but that might be acceptable.
Are there any other checks that I am missing to detect the client being connected to the stale leader?

Running on Windows 11 / Java 21. Minimum working example attached. Thanks in advance.

LeaderFailoverTest.java

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reducing io.aeron.cluster.client.AeronCluster leader down / failover detection time #1938

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Reducing io.aeron.cluster.client.AeronCluster leader down / failover detection time #1938

Uh oh!

Uh oh!

the-thing Feb 2, 2026

Replies: 0 comments

the-thing
Feb 2, 2026