Skip to content

Conversation

@xenoscopic
Copy link
Contributor

Using Shutdown with an already cancelled context will cause the method to return almost immediately, with only idle connections being closed. More problematically, it can't close active connections, which can remain in flight indefinitely. At the moment, these active connections (and their associated contexts) can cause loader.load() to block loader.run() from exiting, especially if a backend is misbehaving, which can cause shutdown to halt waiting on the request. Even if a backend isn't misbehaving, an inference request can take many seconds. The best solution would be to make loader.load() unblock if the context passed to loader.run() is cancelled, but this is fairly complicated to implemented. The easier solution for now is just to use a hard server Close() to cancel inflight requests (and their contexts) and then wait for scheduler shutdown. This is what we do in Docker Desktop.

Using Shutdown with an already cancelled context will cause the method
to return almost immediately, with only idle connections being closed.
More problematically, it can't close active connections, which can
remain in flight indefinitely. At the moment, these active connections
(and their associated contexts) can cause loader.load() to block
loader.run() from exiting, especially if a backend is misbehaving, which
can cause shutdown to halt waiting on the request. Even if a backend
isn't misbehaving, an inference request can take many seconds. The
best solution would be to make loader.load() unblock if the context
passed to loader.run() is cancelled, but this is fairly complicated to
implemented. The easier solution for now is just to use a hard server
Close() to cancel inflight requests (and their contexts) and then wait
for scheduler shutdown. This is what we do in Docker Desktop.

Signed-off-by: Jacob Howard <[email protected]>
Copy link

@p1-0tr p1-0tr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@xenoscopic xenoscopic merged commit c873021 into main Jul 18, 2025
4 checks passed
@xenoscopic xenoscopic deleted the context-regulation branch July 18, 2025 15:19
doringeman pushed a commit to doringeman/model-runner that referenced this pull request Sep 23, 2025
* adds --mmproj param

* Update workflow to support mmproj

* Try to copy mmproj if exists

* Try to copy mmproj if exists

* Try to copy mmproj if exists

* Try to copy mmproj if exists

* Try to copy mmproj if exists
doringeman pushed a commit to doringeman/model-runner that referenced this pull request Sep 23, 2025
* adds --mmproj param

* Update workflow to support mmproj

* Try to copy mmproj if exists

* Try to copy mmproj if exists

* Try to copy mmproj if exists

* Try to copy mmproj if exists

* Try to copy mmproj if exists
doringeman pushed a commit to doringeman/model-runner that referenced this pull request Sep 24, 2025
* adds --mmproj param

* Update workflow to support mmproj

* Try to copy mmproj if exists

* Try to copy mmproj if exists

* Try to copy mmproj if exists

* Try to copy mmproj if exists

* Try to copy mmproj if exists
doringeman pushed a commit to doringeman/model-runner that referenced this pull request Sep 24, 2025
* adds --mmproj param

* Update workflow to support mmproj

* Try to copy mmproj if exists

* Try to copy mmproj if exists

* Try to copy mmproj if exists

* Try to copy mmproj if exists

* Try to copy mmproj if exists
doringeman added a commit to doringeman/model-runner that referenced this pull request Oct 2, 2025
chore: use constant format strings
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants