Skip to content

Conversation

akyang-anyscale
Copy link
Contributor

@akyang-anyscale akyang-anyscale commented Aug 12, 2025

Why are these changes needed?

In the case user calls serve.shutdown(), we'd still want to be able to shutdown the handle if user has initialized it running in the same event loop. The current behaior may throw a runtime error.

In order to block on the shutdown result in the same event loop without causing deadlock, the shutdown sequence in CurrentLoopRouter needs to happen in a separate thread (instead of the same event loop).

Related issue number

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Signed-off-by: akyang-anyscale <[email protected]>
@akyang-anyscale akyang-anyscale requested a review from a team as a code owner August 12, 2025 22:14
Signed-off-by: akyang-anyscale <[email protected]>
@akyang-anyscale akyang-anyscale added the go add ONLY when ready to merge, run all tests label Aug 12, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request aims to allow handle.shutdown() to be called from a synchronous context even when the handle is associated with an event loop. The changes in _private/client.py and tests/conftest.py are simple parameter removals that are consistent with the main change. However, the core logic modification in handle.py introduces a couple of issues. Specifically, it incorrectly handles the case where no event loop is running by checking for None from asyncio.get_running_loop(), which actually raises a RuntimeError. Additionally, it removes a crucial check that prevents deadlocks when the method is called from within an asyncio task. I've provided a suggestion to fix these issues.

Signed-off-by: akyang-anyscale <[email protected]>
Signed-off-by: akyang-anyscale <[email protected]>
Signed-off-by: akyang-anyscale <[email protected]>
@akyang-anyscale akyang-anyscale marked this pull request as draft August 12, 2025 22:53
Signed-off-by: akyang-anyscale <[email protected]>
@akyang-anyscale akyang-anyscale marked this pull request as ready for review August 12, 2025 23:28
@ray-gardener ray-gardener bot added the serve Ray Serve Related Issue label Aug 14, 2025
Copy link
Contributor

@abrarsheikh abrarsheikh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you

loop = asyncio.new_event_loop()
asyncio.set_event_loop(loop)
try:
return loop.run_until_complete(self._asyncio_router.shutdown())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

discussed offline with @akyang-anyscale, _asyncio_router.shutdown() is not threadsafe

Signed-off-by: akyang-anyscale <[email protected]>
Signed-off-by: akyang-anyscale <[email protected]>
Signed-off-by: akyang-anyscale <[email protected]>
Signed-off-by: akyang-anyscale <[email protected]>
Signed-off-by: akyang-anyscale <[email protected]>
Signed-off-by: akyang-anyscale <[email protected]>
Signed-off-by: akyang-anyscale <[email protected]>
@akyang-anyscale akyang-anyscale requested a review from a team as a code owner August 19, 2025 02:23
@akyang-anyscale
Copy link
Contributor Author

akyang-anyscale commented Aug 19, 2025

@aslonnie @elliot-barn @khluu can I get a CI team review?

Copy link
Contributor

@zcin zcin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm pending adding a public api

Signed-off-by: akyang-anyscale <[email protected]>
Signed-off-by: akyang-anyscale <[email protected]>
Signed-off-by: akyang-anyscale <[email protected]>
Signed-off-by: akyang-anyscale <[email protected]>
@akyang-anyscale akyang-anyscale requested a review from a team as a code owner August 21, 2025 16:49
Signed-off-by: akyang-anyscale <[email protected]>
Comment on lines +140 to +142
await asyncio.wait_for(
self._controller.graceful_shutdown.remote(), timeout=timeout_s
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will this cancel the ray task on timeout?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't believe so, does ray.get with timeout also cancel the ray task if timeout is hit?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is that the desired behavior? I would've assumed we want the shutdown task to continue after the timeout

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah it's not the desired behavior. But asyncio.wait_for will cancel the asyncio task when timeout is reached, and I don't know if that affects the underlying ray task.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested this out locally and the ray task continued to run.

import asyncio
import ray
import time


@ray.remote
def foo():
    time.sleep(5)
    print("hello")


async def bar(timeout):
    await asyncio.wait_for(foo.remote(), timeout=timeout)

try:
    print("starting timeout=10")
    asyncio.run(bar(10))
except TimeoutError:
    print("timed out!")

try:
    print("starting timeout=3")
    asyncio.run(bar(3))
except TimeoutError:
    print("timed out!")
    time.sleep(10)

hello gets printed twice

Signed-off-by: akyang-anyscale <[email protected]>
Signed-off-by: akyang-anyscale <[email protected]>
Signed-off-by: akyang-anyscale <[email protected]>
@zcin zcin merged commit 5b3f4a0 into ray-project:master Aug 25, 2025
5 checks passed
liulehui pushed a commit to liulehui/ray that referenced this pull request Aug 26, 2025
…project#55551)

## Why are these changes needed?

In the case user calls `serve.shutdown()`, we'd still want to be able to
shutdown the handle if user has initialized it running in the same event
loop. The current behaior may throw a runtime error.

In order to block on the shutdown result in the same event loop without
causing deadlock, the shutdown sequence in `CurrentLoopRouter` needs to
happen in a separate thread (instead of the same event loop).

<!-- Please give a short summary of the change and the problem this
solves. -->

## Related issue number

<!-- For example: "Closes ray-project#1234" -->

## Checks

- [ ] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [ ] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
           corresponding `.rst` file.
- [ ] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [ ] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(

---------

Signed-off-by: akyang-anyscale <[email protected]>
Signed-off-by: alexyang <[email protected]>
Signed-off-by: Lehui Liu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
go add ONLY when ready to merge, run all tests serve Ray Serve Related Issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants