Skip to content

Conversation

@gianm
Copy link
Contributor

@gianm gianm commented Nov 21, 2025

Previously, if a supervisor was stopped while discovering tasks or updating their status, it could end up trying to kill those tasks because the callbacks on the requests to those tasks could fail as a result of the stopping. There should be no reason for an actively-stopping supervisor to kill a task, so this patch causes the shutdown functions to be no-ops while a supervisor is stopping.

This patch also ensures that when a supervisor transitions into the STOPPING state, that state takes priority over any other state.

Previously, if a supervisor was stopped while discovering tasks or
updating their status, it could end up trying to kill those tasks because
the callbacks on the requests to those tasks could fail as a result of
the stopping. There should be no reason for an actively-stopping supervisor
to kill a task, so this patch causes the shutdown functions to be no-ops
while a supervisor is stopping.

This patch also ensures that when a supervisor transitions into the STOPPING
state, that state takes priority over any other state.
@gianm
Copy link
Contributor Author

gianm commented Nov 21, 2025

The specific scenario I saw this in had an error like:

Failed to return start time: Task java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@703f49ef[Not completed, task = java.util.concurrent.Executors$RunnableAdapter@7219380a[Wrapped task = org.apache.druid.indexing.seekablestream.supervisor.SeekableStreamSupervisor$$Lambda$1573/0x000000f801db8228@6ff15c29]] rejected from java.util.concurrent.ScheduledThreadPoolExecutor@1eb21b5c[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 1]

It happened because the callback on getStartTimeAsync tried to submit a task to a scheduledExec that had already been stopped.

Copy link
Contributor

@kfaraz kfaraz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch!

@gianm gianm closed this Nov 21, 2025
@gianm gianm reopened this Nov 21, 2025
@gianm gianm merged commit 559d694 into apache:master Nov 21, 2025
61 of 94 checks passed
@gianm gianm deleted the supervisor-stopping-no-kill branch November 21, 2025 15:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants