Use more aggressive server shutdown and resequence termination #112

xenoscopic · 2025-07-18T10:30:58Z

Using Shutdown with an already cancelled context will cause the method to return almost immediately, with only idle connections being closed. More problematically, it can't close active connections, which can remain in flight indefinitely. At the moment, these active connections (and their associated contexts) can cause loader.load() to block loader.run() from exiting, especially if a backend is misbehaving, which can cause shutdown to halt waiting on the request. Even if a backend isn't misbehaving, an inference request can take many seconds. The best solution would be to make loader.load() unblock if the context passed to loader.run() is cancelled, but this is fairly complicated to implemented. The easier solution for now is just to use a hard server Close() to cancel inflight requests (and their contexts) and then wait for scheduler shutdown. This is what we do in Docker Desktop.

Using Shutdown with an already cancelled context will cause the method to return almost immediately, with only idle connections being closed. More problematically, it can't close active connections, which can remain in flight indefinitely. At the moment, these active connections (and their associated contexts) can cause loader.load() to block loader.run() from exiting, especially if a backend is misbehaving, which can cause shutdown to halt waiting on the request. Even if a backend isn't misbehaving, an inference request can take many seconds. The best solution would be to make loader.load() unblock if the context passed to loader.run() is cancelled, but this is fairly complicated to implemented. The easier solution for now is just to use a hard server Close() to cancel inflight requests (and their contexts) and then wait for scheduler shutdown. This is what we do in Docker Desktop. Signed-off-by: Jacob Howard <[email protected]>

p1-0tr

LGTM

* adds --mmproj param * Update workflow to support mmproj * Try to copy mmproj if exists * Try to copy mmproj if exists * Try to copy mmproj if exists * Try to copy mmproj if exists * Try to copy mmproj if exists

chore: use constant format strings

p1-0tr approved these changes Jul 18, 2025

View reviewed changes

xenoscopic merged commit c873021 into main Jul 18, 2025
4 checks passed

xenoscopic deleted the context-regulation branch July 18, 2025 15:19

doringeman added a commit to doringeman/model-runner that referenced this pull request Oct 2, 2025

Merge pull request docker#112 from doringeman/misc

cf6c379

chore: use constant format strings

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Use more aggressive server shutdown and resequence termination #112

Use more aggressive server shutdown and resequence termination #112

Uh oh!

xenoscopic commented Jul 18, 2025

Uh oh!

p1-0tr left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Use more aggressive server shutdown and resequence termination #112

Use more aggressive server shutdown and resequence termination #112

Uh oh!

Conversation

xenoscopic commented Jul 18, 2025

Uh oh!

p1-0tr left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants