-
Notifications
You must be signed in to change notification settings - Fork 78
[monarch] Enable pickling HostMesh and ProcMesh from inside tokio thread in certain cases #1473
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
+85
−8
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…ead in certain cases Previously, both `HostMesh` and `ProcMesh` stored their underlying rust mesh in an asynchronous `Shared[...]` pytokio object, so any attempts to access the rust meshes from a tokio thread would need to be called from a coroutine. This is a problem for pickling, which needs to be synchronous, and would therefore have to call `Shared[...].block_on()` to get the underlying rust mesh. This diff makes it so that when the internal `Shared[...]` task for `HostMesh` and `ProcMesh` completes, the result is stored so that it can be accessed without the use of `block_on()`. This enables pickling `HostMesh` and `ProcMesh` inside tokio threads, as long as their backing rust meshes have finished initializing -- this will always be the case inside actor endpoints, since the `HostMesh` and `ProcMesh` both need to be done initializing by the time the actor endpoint runs. Differential Revision: [D84195494](https://our.internmc.facebook.com/intern/diff/D84195494/) [ghstack-poisoned]
samlurye
added a commit
that referenced
this pull request
Oct 9, 2025
…ead in certain cases Previously, both `HostMesh` and `ProcMesh` stored their underlying rust mesh in an asynchronous `Shared[...]` pytokio object, so any attempts to access the rust meshes from a tokio thread would need to be called from a coroutine. This is a problem for pickling, which needs to be synchronous, and would therefore have to call `Shared[...].block_on()` to get the underlying rust mesh. This diff makes it so that when the internal `Shared[...]` task for `HostMesh` and `ProcMesh` completes, the result is stored so that it can be accessed without the use of `block_on()`. This enables pickling `HostMesh` and `ProcMesh` inside tokio threads, as long as their backing rust meshes have finished initializing -- this will always be the case inside actor endpoints, since the `HostMesh` and `ProcMesh` both need to be done initializing by the time the actor endpoint runs. Differential Revision: [D84195494](https://our.internmc.facebook.com/intern/diff/D84195494/) ghstack-source-id: 315055002 Pull Request resolved: #1473
…e tokio thread in certain cases" Previously, both `HostMesh` and `ProcMesh` stored their underlying rust mesh in an asynchronous `Shared[...]` pytokio object, so any attempts to access the rust meshes from a tokio thread would need to be called from a coroutine. This is a problem for pickling, which needs to be synchronous, and would therefore have to call `Shared[...].block_on()` to get the underlying rust mesh. This diff makes it so that when the internal `Shared[...]` task for `HostMesh` and `ProcMesh` completes, the result is stored so that it can be accessed without the use of `block_on()`. This enables pickling `HostMesh` and `ProcMesh` inside tokio threads, as long as their backing rust meshes have finished initializing -- this will always be the case inside actor endpoints, since the `HostMesh` and `ProcMesh` both need to be done initializing by the time the actor endpoint runs. Differential Revision: [D84195494](https://our.internmc.facebook.com/intern/diff/D84195494/) [ghstack-poisoned]
samlurye
added a commit
that referenced
this pull request
Oct 9, 2025
…ead in certain cases Pull Request resolved: #1473 Previously, both `HostMesh` and `ProcMesh` stored their underlying rust mesh in an asynchronous `Shared[...]` pytokio object, so any attempts to access the rust meshes from a tokio thread would need to be called from a coroutine. This is a problem for pickling, which needs to be synchronous, and would therefore have to call `Shared[...].block_on()` to get the underlying rust mesh. This diff makes it so that when the internal `Shared[...]` task for `HostMesh` and `ProcMesh` completes, the result is stored so that it can be accessed without the use of `block_on()`. This enables pickling `HostMesh` and `ProcMesh` inside tokio threads, as long as their backing rust meshes have finished initializing -- this will always be the case inside actor endpoints, since the `HostMesh` and `ProcMesh` both need to be done initializing by the time the actor endpoint runs. ghstack-source-id: 315063514 @exported-using-ghexport Differential Revision: [D84195494](https://our.internmc.facebook.com/intern/diff/D84195494/)
…e tokio thread in certain cases" Previously, both `HostMesh` and `ProcMesh` stored their underlying rust mesh in an asynchronous `Shared[...]` pytokio object, so any attempts to access the rust meshes from a tokio thread would need to be called from a coroutine. This is a problem for pickling, which needs to be synchronous, and would therefore have to call `Shared[...].block_on()` to get the underlying rust mesh. This diff makes it so that when the internal `Shared[...]` task for `HostMesh` and `ProcMesh` completes, the result is stored so that it can be accessed without the use of `block_on()`. This enables pickling `HostMesh` and `ProcMesh` inside tokio threads, as long as their backing rust meshes have finished initializing -- this will always be the case inside actor endpoints, since the `HostMesh` and `ProcMesh` both need to be done initializing by the time the actor endpoint runs. Differential Revision: [D84195494](https://our.internmc.facebook.com/intern/diff/D84195494/) [ghstack-poisoned]
samlurye
added a commit
that referenced
this pull request
Oct 9, 2025
…ead in certain cases Pull Request resolved: #1473 Previously, both `HostMesh` and `ProcMesh` stored their underlying rust mesh in an asynchronous `Shared[...]` pytokio object, so any attempts to access the rust meshes from a tokio thread would need to be called from a coroutine. This is a problem for pickling, which needs to be synchronous, and would therefore have to call `Shared[...].block_on()` to get the underlying rust mesh. This diff makes it so that when the internal `Shared[...]` task for `HostMesh` and `ProcMesh` completes, the result is stored so that it can be accessed without the use of `block_on()`. This enables pickling `HostMesh` and `ProcMesh` inside tokio threads, as long as their backing rust meshes have finished initializing -- this will always be the case inside actor endpoints, since the `HostMesh` and `ProcMesh` both need to be done initializing by the time the actor endpoint runs. ghstack-source-id: 315174591 @exported-using-ghexport Differential Revision: [D84195494](https://our.internmc.facebook.com/intern/diff/D84195494/)
This was referenced Oct 9, 2025
This pull request has been merged in 1eb8853. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
ciflow/rocm
CLA Signed
This label is managed by the Meta Open Source bot.
fb-exported
Merged
meta-exported
module: rocm
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Stack from ghstack (oldest at bottom):
Previously, both
HostMesh
andProcMesh
stored their underlying rust mesh in an asynchronousShared[...]
pytokio object, so any attempts to access the rust meshes from a tokio thread would need to be called from a coroutine. This is a problem for pickling, which needs to be synchronous, and would therefore have to callShared[...].block_on()
to get the underlying rust mesh.This diff makes it so that when the internal
Shared[...]
task forHostMesh
andProcMesh
completes, the result is stored so that it can be accessed without the use ofblock_on()
. This enables picklingHostMesh
andProcMesh
inside tokio threads, as long as their backing rust meshes have finished initializing -- this will always be the case inside actor endpoints, since theHostMesh
andProcMesh
both need to be done initializing by the time the actor endpoint runs.Differential Revision: D84195494