Skip to content

Conversation

@apalan60
Copy link
Contributor

This PR backports the following changes to the 3.9 branch to improve test stability: #20594, #20772

lianetm and others added 2 commits November 24, 2025 21:29
Fixing flakiness seen on this test, where static consumers could not
join as expected after shutting down previous consumers with the same
instance ID, and logs showed `UnreleasedInstanceIdException`.

I expect the flakiness could happen if a consumer with instanceId1 is
closed but not effectively removed from the group due to leave group
fail/delayed (the leave group request is sent on a best effort, not
retried if fails or times out).

Fix by adding check to ensure the group is empty before attempting to
reuse the instance ID

Reviewers: Matthias J. Sax <[email protected]>
(cherry picked from commit 9e9d2a2)
…nsuring conflicting static consumers terminate (apache#20772)

Related discussion:
apache#20594 (review)

### Problem
The test `OffsetValidationTest.test_fencing_static_consumer` failed when
executed with
`fencing_stage=stable` and `group_protocol=consumer`.
It timed out while waiting for the group to become empty because the
conflicting static consumers re-joined after the original members
stopped, keeping the group non-empty and causing the timeout.

### Fix
For the consumer-protocol path, the test now waits for all conflicting
consumer processes to terminate before stopping the original static
members. This ensures that each conflicting consumers is fully fenced
and cannot re-join the group after the original members stop.

Reviewers: Chia-Ping Tsai <[email protected]>
(cherry picked from commit 75768dd)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants