-
Notifications
You must be signed in to change notification settings - Fork 107
[monarch] The root client is just a PythonActor #1985
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
samlurye
wants to merge
2
commits into
gh/samlurye/93/base
Choose a base branch
from
gh/samlurye/93/head
base: gh/samlurye/93/base
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This diff makes the root client actor just another `PythonActor`.
# Why?
Right now the monarch codebase is peppered with special handling to distinguish between normal python actors and the root client "actor", which has type `()` and is actually just a detached `Instance` with no actor loop; it therefore has no message handlers and can't even process supervision events. As a result, we have to wrap the current context's instance in a special `ContextInstance` enum, and everywhere we want to use it, we either have to use the `instance_dispatch!` macro, or insert code that looks like:
```rust
match instance {
ContextInstance::PythonActor(ins) => { do something },
ContextInstance::Client(ins) => { do something else },
}
```
This makes the code more error-prone and harder to understand, with the added complication that the client handling is often not idiomatic w.r.t hyperactor due to the lack of message handlers/actor loop. Some examples:
- [Confusing supervision handling where `owner` might not be defined but `is_owned` is still true and so we need to call into a special `unhandled` function instead of continuing to propagate up the hierarchy](https://fburl.com/code/andy3ggr)
- [The root client can't have child actors due to no supervision event handling, so they have to be spawned directly on the root client proc, and even then, there is no way for the supervision event to reach `monarch.actor.unhandled_fault_hook`](https://fburl.com/code/kqd2iwvc)
- [The root client handles undeliverable messages via a bespoke tokio task/thread](https://fburl.com/code/jjgfy5d5)
Making the root client a normal python actor solves these problems, because:
- We don't need a `ContextInstance` enum anymore -- `PyInstance` *always* contains `Instance<PythonActor>`.
- Supervision events follow a unified path as they bubble up through the hierarchy, and *every* unhandled event reaches `RootClientActor.__supervise__`, defined in python, without special handling.
- The root client can handle undeliverable messages using `RootClientActor._handle_undeliverable_message`, defined in python, without special handling.
# Navigating the code changes (guide for reviewers)
There are a lot of file changes here but only some of them are important. I would recommend reviewing them in the following order:
- `monarch/_src/actor/actor_mesh.py`
- Defines the `RootClientActor` python class and its behavior.
- `hyperactor/src/proc.rs`
- Introduces `Proc::actor_instance::<A>(...)`, which returns a detached `A`-typed actor instance/handle, along with its supervision receiver, signal receiver and message receiver.
- `monarch_hyperactor/src/actor.rs`
- Introduces `PythonActor::bootstrap_client()`, which replaces `global_root_client()` in the root client context. This function starts the root client proc, spawns the `RootClientActor`, starts its actor loop and returns the `Instance<PythonActor>`.
- The root client actor can now handle `SupervisionFailureMessage` just like every other actor in the hierarchy.
- Implements `PythonActor::handle_supervision_event` to pass the event to the actor's `SupervisionFailureMessage` handler. This way, **every unhandled supervision event in the system makes its way to `RootClientActor.__supervise__` eventually**.
- `monarch_hyperactor/src/v1/actor_mesh.rs`
- Deletes the special handling from the actor states monitor like `is_owned` and the explicit `unhandled_fault_hook` call. If `owner` is defined, it forwards the `SupervisionFailureMessage`, or else it does nothing.
- Fixes (what I think was) a bug in `send_state_change`. A supervision event should only be forwarded as `SupervisionFailureMessage` to `owner` if it represents a failure. With the logic before this diff, stopping an actor mesh from inside an actor endpoint would generate a supervision event that reaches `unhandled_fault_hook` and crashes the root process even if it was a healthy stop.
- `monarch_hyperactor/src/context.rs`
- Deletes `ContextInstance` and replaces it in `PyInstance` with `Instance<PythonActor>`.
- The rest of the changes are pretty much just cleaning up `instance_dispatch!` calls.
Differential Revision: [D87296357](https://our.internmc.facebook.com/intern/diff/D87296357/)
**NOTE FOR REVIEWERS**: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D87296357/)!
[ghstack-poisoned]
This was referenced Nov 24, 2025
samlurye
added a commit
that referenced
this pull request
Nov 24, 2025
This diff makes the root client actor just another `PythonActor`.
# Why?
Right now the monarch codebase is peppered with special handling to distinguish between normal python actors and the root client "actor", which has type `()` and is actually just a detached `Instance` with no actor loop; it therefore has no message handlers and can't even process supervision events. As a result, we have to wrap the current context's instance in a special `ContextInstance` enum, and everywhere we want to use it, we either have to use the `instance_dispatch!` macro, or insert code that looks like:
```rust
match instance {
ContextInstance::PythonActor(ins) => { do something },
ContextInstance::Client(ins) => { do something else },
}
```
This makes the code more error-prone and harder to understand, with the added complication that the client handling is often not idiomatic w.r.t hyperactor due to the lack of message handlers/actor loop. Some examples:
- [Confusing supervision handling where `owner` might not be defined but `is_owned` is still true and so we need to call into a special `unhandled` function instead of continuing to propagate up the hierarchy](https://fburl.com/code/andy3ggr)
- [The root client can't have child actors due to no supervision event handling, so they have to be spawned directly on the root client proc, and even then, there is no way for the supervision event to reach `monarch.actor.unhandled_fault_hook`](https://fburl.com/code/kqd2iwvc)
- [The root client handles undeliverable messages via a bespoke tokio task/thread](https://fburl.com/code/jjgfy5d5)
Making the root client a normal python actor solves these problems, because:
- We don't need a `ContextInstance` enum anymore -- `PyInstance` *always* contains `Instance<PythonActor>`.
- Supervision events follow a unified path as they bubble up through the hierarchy, and *every* unhandled event reaches `RootClientActor.__supervise__`, defined in python, without special handling.
- The root client can handle undeliverable messages using `RootClientActor._handle_undeliverable_message`, defined in python, without special handling.
# Navigating the code changes (guide for reviewers)
There are a lot of file changes here but only some of them are important. I would recommend reviewing them in the following order:
- `monarch/_src/actor/actor_mesh.py`
- Defines the `RootClientActor` python class and its behavior.
- `hyperactor/src/proc.rs`
- Introduces `Proc::actor_instance::<A>(...)`, which returns a detached `A`-typed actor instance/handle, along with its supervision receiver, signal receiver and message receiver.
- `monarch_hyperactor/src/actor.rs`
- Introduces `PythonActor::bootstrap_client()`, which replaces `global_root_client()` in the root client context. This function starts the root client proc, spawns the `RootClientActor`, starts its actor loop and returns the `Instance<PythonActor>`.
- The root client actor can now handle `SupervisionFailureMessage` just like every other actor in the hierarchy.
- Implements `PythonActor::handle_supervision_event` to pass the event to the actor's `SupervisionFailureMessage` handler. This way, **every unhandled supervision event in the system makes its way to `RootClientActor.__supervise__` eventually**.
- `monarch_hyperactor/src/v1/actor_mesh.rs`
- Deletes the special handling from the actor states monitor like `is_owned` and the explicit `unhandled_fault_hook` call. If `owner` is defined, it forwards the `SupervisionFailureMessage`, or else it does nothing.
- Fixes (what I think was) a bug in `send_state_change`. A supervision event should only be forwarded as `SupervisionFailureMessage` to `owner` if it represents a failure. With the logic before this diff, stopping an actor mesh from inside an actor endpoint would generate a supervision event that reaches `unhandled_fault_hook` and crashes the root process even if it was a healthy stop.
- `monarch_hyperactor/src/context.rs`
- Deletes `ContextInstance` and replaces it in `PyInstance` with `Instance<PythonActor>`.
- The rest of the changes are pretty much just cleaning up `instance_dispatch!` calls.
Differential Revision: [D87296357](https://our.internmc.facebook.com/intern/diff/D87296357/)
**NOTE FOR REVIEWERS**: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D87296357/)!
ghstack-source-id: 325421344
Pull Request resolved: #1985
This diff makes the root client actor just another `PythonActor`.
# Why?
Right now the monarch codebase is peppered with special handling to distinguish between normal python actors and the root client "actor", which has type `()` and is actually just a detached `Instance` with no actor loop; it therefore has no message handlers and can't even process supervision events. As a result, we have to wrap the current context's instance in a special `ContextInstance` enum, and everywhere we want to use it, we either have to use the `instance_dispatch!` macro, or insert code that looks like:
```rust
match instance {
ContextInstance::PythonActor(ins) => { do something },
ContextInstance::Client(ins) => { do something else },
}
```
This makes the code more error-prone and harder to understand, with the added complication that the client handling is often not idiomatic w.r.t hyperactor due to the lack of message handlers/actor loop. Some examples:
- [Confusing supervision handling where `owner` might not be defined but `is_owned` is still true and so we need to call into a special `unhandled` function instead of continuing to propagate up the hierarchy](https://fburl.com/code/andy3ggr)
- [The root client can't have child actors due to no supervision event handling, so they have to be spawned directly on the root client proc, and even then, there is no way for the supervision event to reach `monarch.actor.unhandled_fault_hook`](https://fburl.com/code/kqd2iwvc)
- [The root client handles undeliverable messages via a bespoke tokio task/thread](https://fburl.com/code/jjgfy5d5)
Making the root client a normal python actor solves these problems, because:
- We don't need a `ContextInstance` enum anymore -- `PyInstance` *always* contains `Instance<PythonActor>`.
- Supervision events follow a unified path as they bubble up through the hierarchy, and *every* unhandled event reaches `RootClientActor.__supervise__`, defined in python, without special handling.
- The root client can handle undeliverable messages using `RootClientActor._handle_undeliverable_message`, defined in python, without special handling.
# Navigating the code changes (guide for reviewers)
There are a lot of file changes here but only some of them are important. I would recommend reviewing them in the following order:
- `monarch/_src/actor/actor_mesh.py`
- Defines the `RootClientActor` python class and its behavior.
- `hyperactor/src/proc.rs`
- Introduces `Proc::actor_instance::<A>(...)`, which returns a detached `A`-typed actor instance/handle, along with its supervision receiver, signal receiver and message receiver.
- `monarch_hyperactor/src/actor.rs`
- Introduces `PythonActor::bootstrap_client()`, which replaces `global_root_client()` in the root client context. This function starts the root client proc, spawns the `RootClientActor`, starts its actor loop and returns the `Instance<PythonActor>`.
- The root client actor can now handle `SupervisionFailureMessage` just like every other actor in the hierarchy.
- Implements `PythonActor::handle_supervision_event` to pass the event to the actor's `SupervisionFailureMessage` handler. This way, **every unhandled supervision event in the system makes its way to `RootClientActor.__supervise__` eventually**.
- `monarch_hyperactor/src/v1/actor_mesh.rs`
- Deletes the special handling from the actor states monitor like `is_owned` and the explicit `unhandled_fault_hook` call. If `owner` is defined, it forwards the `SupervisionFailureMessage`, or else it does nothing.
- Fixes (what I think was) a bug in `send_state_change`. A supervision event should only be forwarded as `SupervisionFailureMessage` to `owner` if it represents a failure. With the logic before this diff, stopping an actor mesh from inside an actor endpoint would generate a supervision event that reaches `unhandled_fault_hook` and crashes the root process even if it was a healthy stop.
- `monarch_hyperactor/src/context.rs`
- Deletes `ContextInstance` and replaces it in `PyInstance` with `Instance<PythonActor>`.
- The rest of the changes are pretty much just cleaning up `instance_dispatch!` calls.
Differential Revision: [D87296357](https://our.internmc.facebook.com/intern/diff/D87296357/)
**NOTE FOR REVIEWERS**: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D87296357/)!
[ghstack-poisoned]
samlurye
added a commit
that referenced
this pull request
Nov 24, 2025
Pull Request resolved: #1985 This diff makes the root client actor just another `PythonActor`. # Why? Right now the monarch codebase is peppered with special handling to distinguish between normal python actors and the root client "actor", which has type `()` and is actually just a detached `Instance` with no actor loop; it therefore has no message handlers and can't even process supervision events. As a result, we have to wrap the current context's instance in a special `ContextInstance` enum, and everywhere we want to use it, we either have to use the `instance_dispatch!` macro, or insert code that looks like: ```rust match instance { ContextInstance::PythonActor(ins) => { do something }, ContextInstance::Client(ins) => { do something else }, } ``` This makes the code more error-prone and harder to understand, with the added complication that the client handling is often not idiomatic w.r.t hyperactor due to the lack of message handlers/actor loop. Some examples: - [Confusing supervision handling where `owner` might not be defined but `is_owned` is still true and so we need to call into a special `unhandled` function instead of continuing to propagate up the hierarchy](https://fburl.com/code/andy3ggr) - [The root client can't have child actors due to no supervision event handling, so they have to be spawned directly on the root client proc, and even then, there is no way for the supervision event to reach `monarch.actor.unhandled_fault_hook`](https://fburl.com/code/kqd2iwvc) - [The root client handles undeliverable messages via a bespoke tokio task/thread](https://fburl.com/code/jjgfy5d5) Making the root client a normal python actor solves these problems, because: - We don't need a `ContextInstance` enum anymore -- `PyInstance` *always* contains `Instance<PythonActor>`. - Supervision events follow a unified path as they bubble up through the hierarchy, and *every* unhandled event reaches `RootClientActor.__supervise__`, defined in python, without special handling. - The root client can handle undeliverable messages using `RootClientActor._handle_undeliverable_message`, defined in python, without special handling. # Navigating the code changes (guide for reviewers) There are a lot of file changes here but only some of them are important. I would recommend reviewing them in the following order: - `monarch/_src/actor/actor_mesh.py` - Defines the `RootClientActor` python class and its behavior. - `hyperactor/src/proc.rs` - Introduces `Proc::actor_instance::<A>(...)`, which returns a detached `A`-typed actor instance/handle, along with its supervision receiver, signal receiver and message receiver. - `monarch_hyperactor/src/actor.rs` - Introduces `PythonActor::bootstrap_client()`, which replaces `global_root_client()` in the root client context. This function starts the root client proc, spawns the `RootClientActor`, starts its actor loop and returns the `Instance<PythonActor>`. - The root client actor can now handle `SupervisionFailureMessage` just like every other actor in the hierarchy. - Implements `PythonActor::handle_supervision_event` to pass the event to the actor's `SupervisionFailureMessage` handler. This way, **every unhandled supervision event in the system makes its way to `RootClientActor.__supervise__` eventually**. - `monarch_hyperactor/src/v1/actor_mesh.rs` - Deletes the special handling from the actor states monitor like `is_owned` and the explicit `unhandled_fault_hook` call. If `owner` is defined, it forwards the `SupervisionFailureMessage`, or else it does nothing. - Fixes (what I think was) a bug in `send_state_change`. A supervision event should only be forwarded as `SupervisionFailureMessage` to `owner` if it represents a failure. With the logic before this diff, stopping an actor mesh from inside an actor endpoint would generate a supervision event that reaches `unhandled_fault_hook` and crashes the root process even if it was a healthy stop. - `monarch_hyperactor/src/context.rs` - Deletes `ContextInstance` and replaces it in `PyInstance` with `Instance<PythonActor>`. - The rest of the changes are pretty much just cleaning up `instance_dispatch!` calls. Differential Revision: [D87296357](https://our.internmc.facebook.com/intern/diff/D87296357/) **NOTE FOR REVIEWERS**: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D87296357/)! ghstack-source-id: 325487297
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Stack from ghstack (oldest at bottom):
spawnsync #1968newfrom actor #1962This diff makes the root client actor just another
PythonActor.Why?
Right now the monarch codebase is peppered with special handling to distinguish between normal python actors and the root client "actor", which has type
()and is actually just a detachedInstancewith no actor loop; it therefore has no message handlers and can't even process supervision events. As a result, we have to wrap the current context's instance in a specialContextInstanceenum, and everywhere we want to use it, we either have to use theinstance_dispatch!macro, or insert code that looks like:This makes the code more error-prone and harder to understand, with the added complication that the client handling is often not idiomatic w.r.t hyperactor due to the lack of message handlers/actor loop. Some examples:
ownermight not be defined butis_ownedis still true and so we need to call into a specialunhandledfunction instead of continuing to propagate up the hierarchymonarch.actor.unhandled_fault_hookMaking the root client a normal python actor solves these problems, because:
ContextInstanceenum anymore --PyInstancealways containsInstance<PythonActor>.RootClientActor.__supervise__, defined in python, without special handling.RootClientActor._handle_undeliverable_message, defined in python, without special handling.Navigating the code changes (guide for reviewers)
There are a lot of file changes here but only some of them are important. I would recommend reviewing them in the following order:
monarch/_src/actor/actor_mesh.pyRootClientActorpython class and its behavior.hyperactor/src/proc.rsProc::actor_instance::<A>(...), which returns a detachedA-typed actor instance/handle, along with its supervision receiver, signal receiver and message receiver.monarch_hyperactor/src/actor.rsPythonActor::bootstrap_client(), which replacesglobal_root_client()in the root client context. This function starts the root client proc, spawns theRootClientActor, starts its actor loop and returns theInstance<PythonActor>.SupervisionFailureMessagejust like every other actor in the hierarchy.PythonActor::handle_supervision_eventto pass the event to the actor'sSupervisionFailureMessagehandler. This way, every unhandled supervision event in the system makes its way toRootClientActor.__supervise__eventually.monarch_hyperactor/src/v1/actor_mesh.rsis_ownedand the explicitunhandled_fault_hookcall. Ifowneris defined, it forwards theSupervisionFailureMessage, or else it does nothing.send_state_change. A supervision event should only be forwarded asSupervisionFailureMessagetoownerif it represents a failure. With the logic before this diff, stopping an actor mesh from inside an actor endpoint would generate a supervision event that reachesunhandled_fault_hookand crashes the root process even if it was a healthy stop.monarch_hyperactor/src/context.rsContextInstanceand replaces it inPyInstancewithInstance<PythonActor>.instance_dispatch!calls.Differential Revision: D87296357
NOTE FOR REVIEWERS: This PR has internal Meta-specific changes or comments, please review them on Phabricator!