Skip to content

MINOR: Keep pendingTask as WakeupFuture if currentTask is completed already. #21586

Draft
Nikita-Shupletsov wants to merge 3 commits intoapache:trunkfrom
Nikita-Shupletsov:minor-verifiable-consumer-shutdown
Draft

MINOR: Keep pendingTask as WakeupFuture if currentTask is completed already. #21586
Nikita-Shupletsov wants to merge 3 commits intoapache:trunkfrom
Nikita-Shupletsov:minor-verifiable-consumer-shutdown

Conversation

@Nikita-Shupletsov
Copy link
Contributor

@Nikita-Shupletsov Nikita-Shupletsov commented Feb 26, 2026

System tests that use VerifiableConsumer are flaky because VerifiableConsumer isn't shutting down on request in certain situations.
There can be a race condition in the commitSync method, as the future that we set as the active task to the wakeupTrigger can be already completed by the time we are setting it. Which leads to the wakeup request never being fulfilled.
Added a check if the task we are receiving in setActiveTask was triggered when we complete it exceptionally.

Also added additional logging when a shutdown is requested to make debugging easier.

Added additional logging when a shutdown is requested.
@github-actions github-actions bot added triage PRs from the community tools small Small PRs labels Feb 26, 2026
Copy link
Member

@bbejeck bbejeck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @Nikita-Shupletsov LGTM, but lets get another look

private final int maxMessages;
private final CountDownLatch shutdownLatch = new CountDownLatch(1);
private int consumedMessages = 0;
private boolean shutdownRequested = false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be overkill but maybe this should be marked volitile not sure if that's necessary if this is running on the main thread

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about it as well as AtomicBoolean. but I don't think they would make that much of a difference: we change the value only once. there is ultimately no harm if we run the loop one more time before we exit.
but I can mark it volatile to be 100% by the book

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either volatile or AtomicBoolean should work fine.

Copy link
Contributor

@kirktrue kirktrue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @Nikita-Shupletsov!

Does this PR fix a bug in the system tests or is it a workaround for a bug in the new consumer? My read of the PR description is the latter.


@Override
public String name() {
return "shutdown_requested";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not seeing anything in the verifiable_consumer.py (or elsewhere) that's looking for this event. Am I missing something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's for debugging. right now when a test fails because it couldn't stop a VerifiableConsumer it's impossible to tell what exactly happened: did we even receive a shutdown event? when did we receive it?
adding another log message should help

private final int maxMessages;
private final CountDownLatch shutdownLatch = new CountDownLatch(1);
private int consumedMessages = 0;
private boolean shutdownRequested = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either volatile or AtomicBoolean should work fine.

@github-actions github-actions bot removed the triage PRs from the community label Feb 27, 2026
@Nikita-Shupletsov
Copy link
Contributor Author

Does this PR fix a bug in the system tests or is it a workaround for a bug in the new consumer? My read of the PR description is the latter.

you are right. sorry, I overlooked. clearTask does not clear WakeupFuture. so it shouldn't cleaned up.

@Nikita-Shupletsov Nikita-Shupletsov marked this pull request as draft February 27, 2026 07:21
@github-actions github-actions bot added the triage PRs from the community label Feb 28, 2026
@Nikita-Shupletsov Nikita-Shupletsov changed the title MINOR: Made shutdown request more robust. MINOR: Keep pendingTask as WakeupFuture if currentTask is completed already. Feb 28, 2026
@Nikita-Shupletsov
Copy link
Contributor Author

@kirktrue
I think I found a problem with wakeup. not sure if it's the problem, so I propose to keep the logging as well to make it slightly easier to debug in the future if there are more issues.

@github-actions github-actions bot removed the triage PRs from the community label Feb 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants