KAFKA-14249: Fix flaky TLSv1.3 idle expiry test (Tls13SelectorTest.testCloseOldestConnection) #20622
+19
−2
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What
This PR addresses the flakiness in
Tls13SelectorTest.testCloseOldestConnection
, which intermittently failed to observe idle connection expiry under TLSv1.3.Reproduction of flakiness
To reproduce the flaky behavior prior to this fix:
You can increase the loop count to 20 or more if needed. In my environment, running 10 iterations was enough to reproduce the failure consistently, so I used 10.
Why the flakiness
The base test in
SelectorTest
assumed that once a connection is established, idle time measurement could begin immediately. This works under PLAINTEXT, but under TLSv1.3:NewSessionTicket
) often arrive after the handshake completes.How
This PR makes the TLSv1.3 override robust by:
selector.poll(50)
immediately after READY to drain any post-handshake records at the current time.TestUtils.waitForCondition
with small polls until the channel is observed asChannelState.EXPIRED
.These changes ensure idle expiry is deterministically surfaced without altering production logic.
Verification of stability
After applying the changes, the same looped command can be used to confirm the test no longer flakes:
Here the loop count is set to 20 to provide stronger assurance of stability, though fewer iterations may already be sufficient.
Scope
JIRA
KAFKA-14249