[monarch] The root client is just a PythonActor #1985

samlurye · 2025-11-24T19:13:07Z

Stack from ghstack (oldest at bottom):

This diff makes the root client actor just another PythonActor.

Why?

Right now the monarch codebase is peppered with special handling to distinguish between normal python actors and the root client "actor", which has type () and is actually just a detached Instance with no actor loop; it therefore has no message handlers and can't even process supervision events. As a result, we have to wrap the current context's instance in a special ContextInstance enum, and everywhere we want to use it, we either have to use the instance_dispatch! macro, or insert code that looks like:

match instance {
  ContextInstance::PythonActor(ins) => { do something },
  ContextInstance::Client(ins) => { do something else },
}

This makes the code more error-prone and harder to understand, with the added complication that the client handling is often not idiomatic w.r.t hyperactor due to the lack of message handlers/actor loop. Some examples:

Making the root client a normal python actor solves these problems, because:

We don't need a ContextInstance enum anymore -- PyInstance always contains Instance<PythonActor>.
Supervision events follow a unified path as they bubble up through the hierarchy, and every unhandled event reaches RootClientActor.__supervise__, defined in python, without special handling.
The root client can handle undeliverable messages using RootClientActor._handle_undeliverable_message, defined in python, without special handling.

Navigating the code changes (guide for reviewers)

There are a lot of file changes here but only some of them are important. I would recommend reviewing them in the following order:

monarch/_src/actor/actor_mesh.py
- Defines the RootClientActor python class and its behavior.
hyperactor/src/proc.rs
- Introduces Proc::actor_instance::<A>(...), which returns a detached A-typed actor instance/handle, along with its supervision receiver, signal receiver and message receiver.
monarch_hyperactor/src/actor.rs
- Introduces PythonActor::bootstrap_client(), which replaces global_root_client() in the root client context. This function starts the root client proc, spawns the RootClientActor, starts its actor loop and returns the Instance<PythonActor>.
- The root client actor can now handle SupervisionFailureMessage just like every other actor in the hierarchy.
- Implements PythonActor::handle_supervision_event to pass the event to the actor's SupervisionFailureMessage handler. This way, every unhandled supervision event in the system makes its way to RootClientActor.__supervise__ eventually.
monarch_hyperactor/src/v1/actor_mesh.rs
- Deletes the special handling from the actor states monitor like is_owned and the explicit unhandled_fault_hook call. If owner is defined, it forwards the SupervisionFailureMessage, or else it does nothing.
- Fixes (what I think was) a bug in send_state_change. A supervision event should only be forwarded as SupervisionFailureMessage to owner if it represents a failure. With the logic before this diff, stopping an actor mesh from inside an actor endpoint would generate a supervision event that reaches unhandled_fault_hook and crashes the root process even if it was a healthy stop.
monarch_hyperactor/src/context.rs
- Deletes ContextInstance and replaces it in PyInstance with Instance<PythonActor>.
The rest of the changes are pretty much just cleaning up instance_dispatch! calls.

Differential Revision: D87296357

NOTE FOR REVIEWERS: This PR has internal Meta-specific changes or comments, please review them on Phabricator!

This diff makes the root client actor just another `PythonActor`. # Why? Right now the monarch codebase is peppered with special handling to distinguish between normal python actors and the root client "actor", which has type `()` and is actually just a detached `Instance` with no actor loop; it therefore has no message handlers and can't even process supervision events. As a result, we have to wrap the current context's instance in a special `ContextInstance` enum, and everywhere we want to use it, we either have to use the `instance_dispatch!` macro, or insert code that looks like: ```rust match instance { ContextInstance::PythonActor(ins) => { do something }, ContextInstance::Client(ins) => { do something else }, } ``` This makes the code more error-prone and harder to understand, with the added complication that the client handling is often not idiomatic w.r.t hyperactor due to the lack of message handlers/actor loop. Some examples: - [Confusing supervision handling where `owner` might not be defined but `is_owned` is still true and so we need to call into a special `unhandled` function instead of continuing to propagate up the hierarchy](https://fburl.com/code/andy3ggr) - [The root client can't have child actors due to no supervision event handling, so they have to be spawned directly on the root client proc, and even then, there is no way for the supervision event to reach `monarch.actor.unhandled_fault_hook`](https://fburl.com/code/kqd2iwvc) - [The root client handles undeliverable messages via a bespoke tokio task/thread](https://fburl.com/code/jjgfy5d5) Making the root client a normal python actor solves these problems, because: - We don't need a `ContextInstance` enum anymore -- `PyInstance` *always* contains `Instance<PythonActor>`. - Supervision events follow a unified path as they bubble up through the hierarchy, and *every* unhandled event reaches `RootClientActor.__supervise__`, defined in python, without special handling. - The root client can handle undeliverable messages using `RootClientActor._handle_undeliverable_message`, defined in python, without special handling. # Navigating the code changes (guide for reviewers) There are a lot of file changes here but only some of them are important. I would recommend reviewing them in the following order: - `monarch/_src/actor/actor_mesh.py` - Defines the `RootClientActor` python class and its behavior. - `hyperactor/src/proc.rs` - Introduces `Proc::actor_instance::<A>(...)`, which returns a detached `A`-typed actor instance/handle, along with its supervision receiver, signal receiver and message receiver. - `monarch_hyperactor/src/actor.rs` - Introduces `PythonActor::bootstrap_client()`, which replaces `global_root_client()` in the root client context. This function starts the root client proc, spawns the `RootClientActor`, starts its actor loop and returns the `Instance<PythonActor>`. - The root client actor can now handle `SupervisionFailureMessage` just like every other actor in the hierarchy. - Implements `PythonActor::handle_supervision_event` to pass the event to the actor's `SupervisionFailureMessage` handler. This way, **every unhandled supervision event in the system makes its way to `RootClientActor.__supervise__` eventually**. - `monarch_hyperactor/src/v1/actor_mesh.rs` - Deletes the special handling from the actor states monitor like `is_owned` and the explicit `unhandled_fault_hook` call. If `owner` is defined, it forwards the `SupervisionFailureMessage`, or else it does nothing. - Fixes (what I think was) a bug in `send_state_change`. A supervision event should only be forwarded as `SupervisionFailureMessage` to `owner` if it represents a failure. With the logic before this diff, stopping an actor mesh from inside an actor endpoint would generate a supervision event that reaches `unhandled_fault_hook` and crashes the root process even if it was a healthy stop. - `monarch_hyperactor/src/context.rs` - Deletes `ContextInstance` and replaces it in `PyInstance` with `Instance<PythonActor>`. - The rest of the changes are pretty much just cleaning up `instance_dispatch!` calls. Differential Revision: [D87296357](https://our.internmc.facebook.com/intern/diff/D87296357/) **NOTE FOR REVIEWERS**: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D87296357/)! [ghstack-poisoned]

This diff makes the root client actor just another `PythonActor`. # Why? Right now the monarch codebase is peppered with special handling to distinguish between normal python actors and the root client "actor", which has type `()` and is actually just a detached `Instance` with no actor loop; it therefore has no message handlers and can't even process supervision events. As a result, we have to wrap the current context's instance in a special `ContextInstance` enum, and everywhere we want to use it, we either have to use the `instance_dispatch!` macro, or insert code that looks like: ```rust match instance { ContextInstance::PythonActor(ins) => { do something }, ContextInstance::Client(ins) => { do something else }, } ``` This makes the code more error-prone and harder to understand, with the added complication that the client handling is often not idiomatic w.r.t hyperactor due to the lack of message handlers/actor loop. Some examples: - [Confusing supervision handling where `owner` might not be defined but `is_owned` is still true and so we need to call into a special `unhandled` function instead of continuing to propagate up the hierarchy](https://fburl.com/code/andy3ggr) - [The root client can't have child actors due to no supervision event handling, so they have to be spawned directly on the root client proc, and even then, there is no way for the supervision event to reach `monarch.actor.unhandled_fault_hook`](https://fburl.com/code/kqd2iwvc) - [The root client handles undeliverable messages via a bespoke tokio task/thread](https://fburl.com/code/jjgfy5d5) Making the root client a normal python actor solves these problems, because: - We don't need a `ContextInstance` enum anymore -- `PyInstance` *always* contains `Instance<PythonActor>`. - Supervision events follow a unified path as they bubble up through the hierarchy, and *every* unhandled event reaches `RootClientActor.__supervise__`, defined in python, without special handling. - The root client can handle undeliverable messages using `RootClientActor._handle_undeliverable_message`, defined in python, without special handling. # Navigating the code changes (guide for reviewers) There are a lot of file changes here but only some of them are important. I would recommend reviewing them in the following order: - `monarch/_src/actor/actor_mesh.py` - Defines the `RootClientActor` python class and its behavior. - `hyperactor/src/proc.rs` - Introduces `Proc::actor_instance::<A>(...)`, which returns a detached `A`-typed actor instance/handle, along with its supervision receiver, signal receiver and message receiver. - `monarch_hyperactor/src/actor.rs` - Introduces `PythonActor::bootstrap_client()`, which replaces `global_root_client()` in the root client context. This function starts the root client proc, spawns the `RootClientActor`, starts its actor loop and returns the `Instance<PythonActor>`. - The root client actor can now handle `SupervisionFailureMessage` just like every other actor in the hierarchy. - Implements `PythonActor::handle_supervision_event` to pass the event to the actor's `SupervisionFailureMessage` handler. This way, **every unhandled supervision event in the system makes its way to `RootClientActor.__supervise__` eventually**. - `monarch_hyperactor/src/v1/actor_mesh.rs` - Deletes the special handling from the actor states monitor like `is_owned` and the explicit `unhandled_fault_hook` call. If `owner` is defined, it forwards the `SupervisionFailureMessage`, or else it does nothing. - Fixes (what I think was) a bug in `send_state_change`. A supervision event should only be forwarded as `SupervisionFailureMessage` to `owner` if it represents a failure. With the logic before this diff, stopping an actor mesh from inside an actor endpoint would generate a supervision event that reaches `unhandled_fault_hook` and crashes the root process even if it was a healthy stop. - `monarch_hyperactor/src/context.rs` - Deletes `ContextInstance` and replaces it in `PyInstance` with `Instance<PythonActor>`. - The rest of the changes are pretty much just cleaning up `instance_dispatch!` calls. Differential Revision: [D87296357](https://our.internmc.facebook.com/intern/diff/D87296357/) **NOTE FOR REVIEWERS**: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D87296357/)! ghstack-source-id: 325421344 Pull Request resolved: #1985

This diff makes the root client actor just another `PythonActor`. # Why? Right now the monarch codebase is peppered with special handling to distinguish between normal python actors and the root client "actor", which has type `()` and is actually just a detached `Instance` with no actor loop; it therefore has no message handlers and can't even process supervision events. As a result, we have to wrap the current context's instance in a special `ContextInstance` enum, and everywhere we want to use it, we either have to use the `instance_dispatch!` macro, or insert code that looks like: ```rust match instance { ContextInstance::PythonActor(ins) => { do something }, ContextInstance::Client(ins) => { do something else }, } ``` This makes the code more error-prone and harder to understand, with the added complication that the client handling is often not idiomatic w.r.t hyperactor due to the lack of message handlers/actor loop. Some examples: - [Confusing supervision handling where `owner` might not be defined but `is_owned` is still true and so we need to call into a special `unhandled` function instead of continuing to propagate up the hierarchy](https://fburl.com/code/andy3ggr) - [The root client can't have child actors due to no supervision event handling, so they have to be spawned directly on the root client proc, and even then, there is no way for the supervision event to reach `monarch.actor.unhandled_fault_hook`](https://fburl.com/code/kqd2iwvc) - [The root client handles undeliverable messages via a bespoke tokio task/thread](https://fburl.com/code/jjgfy5d5) Making the root client a normal python actor solves these problems, because: - We don't need a `ContextInstance` enum anymore -- `PyInstance` *always* contains `Instance<PythonActor>`. - Supervision events follow a unified path as they bubble up through the hierarchy, and *every* unhandled event reaches `RootClientActor.__supervise__`, defined in python, without special handling. - The root client can handle undeliverable messages using `RootClientActor._handle_undeliverable_message`, defined in python, without special handling. # Navigating the code changes (guide for reviewers) There are a lot of file changes here but only some of them are important. I would recommend reviewing them in the following order: - `monarch/_src/actor/actor_mesh.py` - Defines the `RootClientActor` python class and its behavior. - `hyperactor/src/proc.rs` - Introduces `Proc::actor_instance::<A>(...)`, which returns a detached `A`-typed actor instance/handle, along with its supervision receiver, signal receiver and message receiver. - `monarch_hyperactor/src/actor.rs` - Introduces `PythonActor::bootstrap_client()`, which replaces `global_root_client()` in the root client context. This function starts the root client proc, spawns the `RootClientActor`, starts its actor loop and returns the `Instance<PythonActor>`. - The root client actor can now handle `SupervisionFailureMessage` just like every other actor in the hierarchy. - Implements `PythonActor::handle_supervision_event` to pass the event to the actor's `SupervisionFailureMessage` handler. This way, **every unhandled supervision event in the system makes its way to `RootClientActor.__supervise__` eventually**. - `monarch_hyperactor/src/v1/actor_mesh.rs` - Deletes the special handling from the actor states monitor like `is_owned` and the explicit `unhandled_fault_hook` call. If `owner` is defined, it forwards the `SupervisionFailureMessage`, or else it does nothing. - Fixes (what I think was) a bug in `send_state_change`. A supervision event should only be forwarded as `SupervisionFailureMessage` to `owner` if it represents a failure. With the logic before this diff, stopping an actor mesh from inside an actor endpoint would generate a supervision event that reaches `unhandled_fault_hook` and crashes the root process even if it was a healthy stop. - `monarch_hyperactor/src/context.rs` - Deletes `ContextInstance` and replaces it in `PyInstance` with `Instance<PythonActor>`. - The rest of the changes are pretty much just cleaning up `instance_dispatch!` calls. Differential Revision: [D87296357](https://our.internmc.facebook.com/intern/diff/D87296357/) **NOTE FOR REVIEWERS**: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D87296357/)! [ghstack-poisoned]

Pull Request resolved: #1985 This diff makes the root client actor just another `PythonActor`. # Why? Right now the monarch codebase is peppered with special handling to distinguish between normal python actors and the root client "actor", which has type `()` and is actually just a detached `Instance` with no actor loop; it therefore has no message handlers and can't even process supervision events. As a result, we have to wrap the current context's instance in a special `ContextInstance` enum, and everywhere we want to use it, we either have to use the `instance_dispatch!` macro, or insert code that looks like: ```rust match instance { ContextInstance::PythonActor(ins) => { do something }, ContextInstance::Client(ins) => { do something else }, } ``` This makes the code more error-prone and harder to understand, with the added complication that the client handling is often not idiomatic w.r.t hyperactor due to the lack of message handlers/actor loop. Some examples: - [Confusing supervision handling where `owner` might not be defined but `is_owned` is still true and so we need to call into a special `unhandled` function instead of continuing to propagate up the hierarchy](https://fburl.com/code/andy3ggr) - [The root client can't have child actors due to no supervision event handling, so they have to be spawned directly on the root client proc, and even then, there is no way for the supervision event to reach `monarch.actor.unhandled_fault_hook`](https://fburl.com/code/kqd2iwvc) - [The root client handles undeliverable messages via a bespoke tokio task/thread](https://fburl.com/code/jjgfy5d5) Making the root client a normal python actor solves these problems, because: - We don't need a `ContextInstance` enum anymore -- `PyInstance` *always* contains `Instance<PythonActor>`. - Supervision events follow a unified path as they bubble up through the hierarchy, and *every* unhandled event reaches `RootClientActor.__supervise__`, defined in python, without special handling. - The root client can handle undeliverable messages using `RootClientActor._handle_undeliverable_message`, defined in python, without special handling. # Navigating the code changes (guide for reviewers) There are a lot of file changes here but only some of them are important. I would recommend reviewing them in the following order: - `monarch/_src/actor/actor_mesh.py` - Defines the `RootClientActor` python class and its behavior. - `hyperactor/src/proc.rs` - Introduces `Proc::actor_instance::<A>(...)`, which returns a detached `A`-typed actor instance/handle, along with its supervision receiver, signal receiver and message receiver. - `monarch_hyperactor/src/actor.rs` - Introduces `PythonActor::bootstrap_client()`, which replaces `global_root_client()` in the root client context. This function starts the root client proc, spawns the `RootClientActor`, starts its actor loop and returns the `Instance<PythonActor>`. - The root client actor can now handle `SupervisionFailureMessage` just like every other actor in the hierarchy. - Implements `PythonActor::handle_supervision_event` to pass the event to the actor's `SupervisionFailureMessage` handler. This way, **every unhandled supervision event in the system makes its way to `RootClientActor.__supervise__` eventually**. - `monarch_hyperactor/src/v1/actor_mesh.rs` - Deletes the special handling from the actor states monitor like `is_owned` and the explicit `unhandled_fault_hook` call. If `owner` is defined, it forwards the `SupervisionFailureMessage`, or else it does nothing. - Fixes (what I think was) a bug in `send_state_change`. A supervision event should only be forwarded as `SupervisionFailureMessage` to `owner` if it represents a failure. With the logic before this diff, stopping an actor mesh from inside an actor endpoint would generate a supervision event that reaches `unhandled_fault_hook` and crashes the root process even if it was a healthy stop. - `monarch_hyperactor/src/context.rs` - Deletes `ContextInstance` and replaces it in `PyInstance` with `Instance<PythonActor>`. - The rest of the changes are pretty much just cleaning up `instance_dispatch!` calls. Differential Revision: [D87296357](https://our.internmc.facebook.com/intern/diff/D87296357/) **NOTE FOR REVIEWERS**: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D87296357/)! ghstack-source-id: 325487297

This was referenced Nov 24, 2025

[hyperactor] remove new from actor #1962

Open

[hyperactor] make all variants of spawn sync #1968

Open

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Nov 24, 2025

meta-codesync bot added fb-exported meta-exported labels Nov 24, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[monarch] The root client is just a PythonActor #1985

[monarch] The root client is just a PythonActor #1985

Uh oh!

samlurye commented Nov 24, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[monarch] The root client is just a PythonActor #1985

Are you sure you want to change the base?

[monarch] The root client is just a PythonActor #1985

Uh oh!

Conversation

samlurye commented Nov 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why?

Navigating the code changes (guide for reviewers)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

samlurye commented Nov 24, 2025 •

edited

Loading