[monarch] Enable pickling HostMesh and ProcMesh from inside tokio thread in certain cases #1473

samlurye · 2025-10-09T06:25:03Z

Stack from ghstack (oldest at bottom):

Previously, both HostMesh and ProcMesh stored their underlying rust mesh in an asynchronous Shared[...] pytokio object, so any attempts to access the rust meshes from a tokio thread would need to be called from a coroutine. This is a problem for pickling, which needs to be synchronous, and would therefore have to call Shared[...].block_on() to get the underlying rust mesh.

This diff makes it so that when the internal Shared[...] task for HostMesh and ProcMesh completes, the result is stored so that it can be accessed without the use of block_on(). This enables pickling HostMesh and ProcMesh inside tokio threads, as long as their backing rust meshes have finished initializing -- this will always be the case inside actor endpoints, since the HostMesh and ProcMesh both need to be done initializing by the time the actor endpoint runs.

Differential Revision: D84195494

…ead in certain cases Previously, both `HostMesh` and `ProcMesh` stored their underlying rust mesh in an asynchronous `Shared[...]` pytokio object, so any attempts to access the rust meshes from a tokio thread would need to be called from a coroutine. This is a problem for pickling, which needs to be synchronous, and would therefore have to call `Shared[...].block_on()` to get the underlying rust mesh. This diff makes it so that when the internal `Shared[...]` task for `HostMesh` and `ProcMesh` completes, the result is stored so that it can be accessed without the use of `block_on()`. This enables pickling `HostMesh` and `ProcMesh` inside tokio threads, as long as their backing rust meshes have finished initializing -- this will always be the case inside actor endpoints, since the `HostMesh` and `ProcMesh` both need to be done initializing by the time the actor endpoint runs. Differential Revision: [D84195494](https://our.internmc.facebook.com/intern/diff/D84195494/) [ghstack-poisoned]

…ead in certain cases Previously, both `HostMesh` and `ProcMesh` stored their underlying rust mesh in an asynchronous `Shared[...]` pytokio object, so any attempts to access the rust meshes from a tokio thread would need to be called from a coroutine. This is a problem for pickling, which needs to be synchronous, and would therefore have to call `Shared[...].block_on()` to get the underlying rust mesh. This diff makes it so that when the internal `Shared[...]` task for `HostMesh` and `ProcMesh` completes, the result is stored so that it can be accessed without the use of `block_on()`. This enables pickling `HostMesh` and `ProcMesh` inside tokio threads, as long as their backing rust meshes have finished initializing -- this will always be the case inside actor endpoints, since the `HostMesh` and `ProcMesh` both need to be done initializing by the time the actor endpoint runs. Differential Revision: [D84195494](https://our.internmc.facebook.com/intern/diff/D84195494/) ghstack-source-id: 315055002 Pull Request resolved: #1473

…e tokio thread in certain cases" Previously, both `HostMesh` and `ProcMesh` stored their underlying rust mesh in an asynchronous `Shared[...]` pytokio object, so any attempts to access the rust meshes from a tokio thread would need to be called from a coroutine. This is a problem for pickling, which needs to be synchronous, and would therefore have to call `Shared[...].block_on()` to get the underlying rust mesh. This diff makes it so that when the internal `Shared[...]` task for `HostMesh` and `ProcMesh` completes, the result is stored so that it can be accessed without the use of `block_on()`. This enables pickling `HostMesh` and `ProcMesh` inside tokio threads, as long as their backing rust meshes have finished initializing -- this will always be the case inside actor endpoints, since the `HostMesh` and `ProcMesh` both need to be done initializing by the time the actor endpoint runs. Differential Revision: [D84195494](https://our.internmc.facebook.com/intern/diff/D84195494/) [ghstack-poisoned]

…ead in certain cases Pull Request resolved: #1473 Previously, both `HostMesh` and `ProcMesh` stored their underlying rust mesh in an asynchronous `Shared[...]` pytokio object, so any attempts to access the rust meshes from a tokio thread would need to be called from a coroutine. This is a problem for pickling, which needs to be synchronous, and would therefore have to call `Shared[...].block_on()` to get the underlying rust mesh. This diff makes it so that when the internal `Shared[...]` task for `HostMesh` and `ProcMesh` completes, the result is stored so that it can be accessed without the use of `block_on()`. This enables pickling `HostMesh` and `ProcMesh` inside tokio threads, as long as their backing rust meshes have finished initializing -- this will always be the case inside actor endpoints, since the `HostMesh` and `ProcMesh` both need to be done initializing by the time the actor endpoint runs. ghstack-source-id: 315063514 @exported-using-ghexport Differential Revision: [D84195494](https://our.internmc.facebook.com/intern/diff/D84195494/)

…e tokio thread in certain cases" Previously, both `HostMesh` and `ProcMesh` stored their underlying rust mesh in an asynchronous `Shared[...]` pytokio object, so any attempts to access the rust meshes from a tokio thread would need to be called from a coroutine. This is a problem for pickling, which needs to be synchronous, and would therefore have to call `Shared[...].block_on()` to get the underlying rust mesh. This diff makes it so that when the internal `Shared[...]` task for `HostMesh` and `ProcMesh` completes, the result is stored so that it can be accessed without the use of `block_on()`. This enables pickling `HostMesh` and `ProcMesh` inside tokio threads, as long as their backing rust meshes have finished initializing -- this will always be the case inside actor endpoints, since the `HostMesh` and `ProcMesh` both need to be done initializing by the time the actor endpoint runs. Differential Revision: [D84195494](https://our.internmc.facebook.com/intern/diff/D84195494/) [ghstack-poisoned]

…ead in certain cases Pull Request resolved: #1473 Previously, both `HostMesh` and `ProcMesh` stored their underlying rust mesh in an asynchronous `Shared[...]` pytokio object, so any attempts to access the rust meshes from a tokio thread would need to be called from a coroutine. This is a problem for pickling, which needs to be synchronous, and would therefore have to call `Shared[...].block_on()` to get the underlying rust mesh. This diff makes it so that when the internal `Shared[...]` task for `HostMesh` and `ProcMesh` completes, the result is stored so that it can be accessed without the use of `block_on()`. This enables pickling `HostMesh` and `ProcMesh` inside tokio threads, as long as their backing rust meshes have finished initializing -- this will always be the case inside actor endpoints, since the `HostMesh` and `ProcMesh` both need to be done initializing by the time the actor endpoint runs. ghstack-source-id: 315174591 @exported-using-ghexport Differential Revision: [D84195494](https://our.internmc.facebook.com/intern/diff/D84195494/)

meta-codesync · 2025-10-09T22:08:03Z

This pull request has been merged in 1eb8853.

pytorch-bot bot added ciflow/rocm module: rocm labels Oct 9, 2025

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Oct 9, 2025

meta-codesync bot added fb-exported meta-exported labels Oct 9, 2025

samlurye mentioned this pull request Oct 9, 2025

[monarch][supervision] Increase GetState::<ActorState> default timeout and make it configurable #1474

Closed

This was referenced Oct 9, 2025

[monarch] mesh: support v1 actor meshes in RDMA buffer #1467

Closed

[monarch] extension: make CodeSyncMeshClient ActorMesh version polymorphic #1482

Closed

meta-codesync bot closed this in 1eb8853 Oct 9, 2025

facebook-github-bot added the Merged label Oct 9, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[monarch] Enable pickling HostMesh and ProcMesh from inside tokio thread in certain cases #1473

[monarch] Enable pickling HostMesh and ProcMesh from inside tokio thread in certain cases #1473

samlurye commented Oct 9, 2025 •

edited by mariusae

Loading

Uh oh!

meta-codesync bot commented Oct 9, 2025

Uh oh!

Uh oh!

[monarch] Enable pickling HostMesh and ProcMesh from inside tokio thread in certain cases #1473

[monarch] Enable pickling HostMesh and ProcMesh from inside tokio thread in certain cases #1473

Conversation

samlurye commented Oct 9, 2025 • edited by mariusae Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

meta-codesync bot commented Oct 9, 2025

Uh oh!

Uh oh!

samlurye commented Oct 9, 2025 •

edited by mariusae

Loading