Skip to content

Conversation

@samlurye
Copy link
Contributor

@samlurye samlurye commented Nov 24, 2025

Stack from ghstack (oldest at bottom):

This diff makes the root client actor just another PythonActor.

Why?

Right now the monarch codebase is peppered with special handling to distinguish between normal python actors and the root client "actor", which has type () and is actually just a detached Instance with no actor loop; it therefore has no message handlers and can't even process supervision events. As a result, we have to wrap the current context's instance in a special ContextInstance enum, and everywhere we want to use it, we either have to use the instance_dispatch! macro, or insert code that looks like:

match instance {
  ContextInstance::PythonActor(ins) => { do something },
  ContextInstance::Client(ins) => { do something else },
}

This makes the code more error-prone and harder to understand, with the added complication that the client handling is often not idiomatic w.r.t hyperactor due to the lack of message handlers/actor loop. Some examples:

Making the root client a normal python actor solves these problems, because:

  • We don't need a ContextInstance enum anymore -- PyInstance always contains Instance<PythonActor>.
  • Supervision events follow a unified path as they bubble up through the hierarchy, and every unhandled event reaches RootClientActor.__supervise__, defined in python, without special handling.
  • The root client can handle undeliverable messages using RootClientActor._handle_undeliverable_message, defined in python, without special handling.

Navigating the code changes (guide for reviewers)

There are a lot of file changes here but only some of them are important. I would recommend reviewing them in the following order:

  • monarch/_src/actor/actor_mesh.py
    • Defines the RootClientActor python class and its behavior.
  • hyperactor/src/proc.rs
    • Introduces Proc::actor_instance::<A>(...), which returns a detached A-typed actor instance/handle, along with its supervision receiver, signal receiver and message receiver.
  • monarch_hyperactor/src/actor.rs
    • Introduces PythonActor::bootstrap_client(), which replaces global_root_client() in the root client context. This function starts the root client proc, spawns the RootClientActor, starts its actor loop and returns the Instance<PythonActor>.
    • The root client actor can now handle SupervisionFailureMessage just like every other actor in the hierarchy.
    • Implements PythonActor::handle_supervision_event to pass the event to the actor's SupervisionFailureMessage handler. This way, every unhandled supervision event in the system makes its way to RootClientActor.__supervise__ eventually.
  • monarch_hyperactor/src/v1/actor_mesh.rs
    • Deletes the special handling from the actor states monitor like is_owned and the explicit unhandled_fault_hook call. If owner is defined, it forwards the SupervisionFailureMessage, or else it does nothing.
    • Fixes (what I think was) a bug in send_state_change. A supervision event should only be forwarded as SupervisionFailureMessage to owner if it represents a failure. With the logic before this diff, stopping an actor mesh from inside an actor endpoint would generate a supervision event that reaches unhandled_fault_hook and crashes the root process even if it was a healthy stop.
  • monarch_hyperactor/src/context.rs
    • Deletes ContextInstance and replaces it in PyInstance with Instance<PythonActor>.
  • The rest of the changes are pretty much just cleaning up instance_dispatch! calls.

Differential Revision: D87296357

NOTE FOR REVIEWERS: This PR has internal Meta-specific changes or comments, please review them on Phabricator!

This diff makes the root client actor just another `PythonActor`.

# Why?

Right now the monarch codebase is peppered with special handling to distinguish between normal python actors and the root client "actor", which has type `()` and is actually just a detached `Instance` with no actor loop; it therefore has no message handlers and can't even process supervision events. As a result, we have to wrap the current context's instance in a special `ContextInstance` enum, and everywhere we want to use it, we either have to use the `instance_dispatch!` macro, or insert code that looks like:
```rust
match instance {
  ContextInstance::PythonActor(ins) => { do something },
  ContextInstance::Client(ins) => { do something else },
}
```
This makes the code more error-prone and harder to understand, with the added complication that the client handling is often not idiomatic w.r.t hyperactor due to the lack of message handlers/actor loop. Some examples:
- [Confusing supervision handling where `owner` might not be defined but `is_owned` is still true and so we need to call into a special `unhandled` function instead of continuing to propagate up the hierarchy](https://fburl.com/code/andy3ggr)
- [The root client can't have child actors due to no supervision event handling, so they have to be spawned directly on the root client proc, and even then, there is no way for the supervision event to reach `monarch.actor.unhandled_fault_hook`](https://fburl.com/code/kqd2iwvc)
- [The root client handles undeliverable messages via a bespoke tokio task/thread](https://fburl.com/code/jjgfy5d5)

Making the root client a normal python actor solves these problems, because:
- We don't need a `ContextInstance` enum anymore -- `PyInstance` *always* contains `Instance<PythonActor>`.
- Supervision events follow a unified path as they bubble up through the hierarchy, and *every* unhandled event reaches `RootClientActor.__supervise__`, defined in python, without special handling.
- The root client can handle undeliverable messages using `RootClientActor._handle_undeliverable_message`, defined in python, without special handling.

# Navigating the code changes (guide for reviewers)

There are a lot of file changes here but only some of them are important. I would recommend reviewing them in the following order:
- `monarch/_src/actor/actor_mesh.py`
  - Defines the `RootClientActor` python class and its behavior.
- `hyperactor/src/proc.rs`
  - Introduces `Proc::actor_instance::<A>(...)`, which returns a detached `A`-typed actor instance/handle, along with its supervision receiver, signal receiver and message receiver.
- `monarch_hyperactor/src/actor.rs`
  - Introduces `PythonActor::bootstrap_client()`, which replaces `global_root_client()` in the root client context. This function starts the root client proc, spawns the `RootClientActor`, starts its actor loop and returns the `Instance<PythonActor>`.
  - The root client actor can now handle `SupervisionFailureMessage` just like every other actor in the hierarchy.
  - Implements `PythonActor::handle_supervision_event` to pass the event to the actor's `SupervisionFailureMessage` handler. This way, **every unhandled supervision event in the system makes its way to `RootClientActor.__supervise__` eventually**.
- `monarch_hyperactor/src/v1/actor_mesh.rs`
  - Deletes the special handling from the actor states monitor like `is_owned` and the explicit `unhandled_fault_hook` call. If `owner` is defined, it forwards the `SupervisionFailureMessage`, or else it does nothing.
  - Fixes (what I think was) a bug in `send_state_change`. A supervision event should only be forwarded as `SupervisionFailureMessage` to `owner` if it represents a failure. With the logic before this diff, stopping an actor mesh from inside an actor endpoint would generate a supervision event that reaches `unhandled_fault_hook` and crashes the root process even if it was a healthy stop.
- `monarch_hyperactor/src/context.rs`
  - Deletes `ContextInstance` and replaces it in `PyInstance` with `Instance<PythonActor>`.
- The rest of the changes are pretty much just cleaning up `instance_dispatch!` calls.

Differential Revision: [D87296357](https://our.internmc.facebook.com/intern/diff/D87296357/)

**NOTE FOR REVIEWERS**: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D87296357/)!

[ghstack-poisoned]
samlurye added a commit that referenced this pull request Nov 24, 2025
This diff makes the root client actor just another `PythonActor`.

# Why?

Right now the monarch codebase is peppered with special handling to distinguish between normal python actors and the root client "actor", which has type `()` and is actually just a detached `Instance` with no actor loop; it therefore has no message handlers and can't even process supervision events. As a result, we have to wrap the current context's instance in a special `ContextInstance` enum, and everywhere we want to use it, we either have to use the `instance_dispatch!` macro, or insert code that looks like:
```rust
match instance {
  ContextInstance::PythonActor(ins) => { do something },
  ContextInstance::Client(ins) => { do something else },
}
```
This makes the code more error-prone and harder to understand, with the added complication that the client handling is often not idiomatic w.r.t hyperactor due to the lack of message handlers/actor loop. Some examples:
- [Confusing supervision handling where `owner` might not be defined but `is_owned` is still true and so we need to call into a special `unhandled` function instead of continuing to propagate up the hierarchy](https://fburl.com/code/andy3ggr)
- [The root client can't have child actors due to no supervision event handling, so they have to be spawned directly on the root client proc, and even then, there is no way for the supervision event to reach `monarch.actor.unhandled_fault_hook`](https://fburl.com/code/kqd2iwvc)
- [The root client handles undeliverable messages via a bespoke tokio task/thread](https://fburl.com/code/jjgfy5d5)

Making the root client a normal python actor solves these problems, because:
- We don't need a `ContextInstance` enum anymore -- `PyInstance` *always* contains `Instance<PythonActor>`.
- Supervision events follow a unified path as they bubble up through the hierarchy, and *every* unhandled event reaches `RootClientActor.__supervise__`, defined in python, without special handling.
- The root client can handle undeliverable messages using `RootClientActor._handle_undeliverable_message`, defined in python, without special handling.

# Navigating the code changes (guide for reviewers)

There are a lot of file changes here but only some of them are important. I would recommend reviewing them in the following order:
- `monarch/_src/actor/actor_mesh.py`
  - Defines the `RootClientActor` python class and its behavior.
- `hyperactor/src/proc.rs`
  - Introduces `Proc::actor_instance::<A>(...)`, which returns a detached `A`-typed actor instance/handle, along with its supervision receiver, signal receiver and message receiver.
- `monarch_hyperactor/src/actor.rs`
  - Introduces `PythonActor::bootstrap_client()`, which replaces `global_root_client()` in the root client context. This function starts the root client proc, spawns the `RootClientActor`, starts its actor loop and returns the `Instance<PythonActor>`.
  - The root client actor can now handle `SupervisionFailureMessage` just like every other actor in the hierarchy.
  - Implements `PythonActor::handle_supervision_event` to pass the event to the actor's `SupervisionFailureMessage` handler. This way, **every unhandled supervision event in the system makes its way to `RootClientActor.__supervise__` eventually**.
- `monarch_hyperactor/src/v1/actor_mesh.rs`
  - Deletes the special handling from the actor states monitor like `is_owned` and the explicit `unhandled_fault_hook` call. If `owner` is defined, it forwards the `SupervisionFailureMessage`, or else it does nothing.
  - Fixes (what I think was) a bug in `send_state_change`. A supervision event should only be forwarded as `SupervisionFailureMessage` to `owner` if it represents a failure. With the logic before this diff, stopping an actor mesh from inside an actor endpoint would generate a supervision event that reaches `unhandled_fault_hook` and crashes the root process even if it was a healthy stop.
- `monarch_hyperactor/src/context.rs`
  - Deletes `ContextInstance` and replaces it in `PyInstance` with `Instance<PythonActor>`.
- The rest of the changes are pretty much just cleaning up `instance_dispatch!` calls.

Differential Revision: [D87296357](https://our.internmc.facebook.com/intern/diff/D87296357/)

**NOTE FOR REVIEWERS**: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D87296357/)!

ghstack-source-id: 325421344
Pull Request resolved: #1985
@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Nov 24, 2025
This diff makes the root client actor just another `PythonActor`.

# Why?

Right now the monarch codebase is peppered with special handling to distinguish between normal python actors and the root client "actor", which has type `()` and is actually just a detached `Instance` with no actor loop; it therefore has no message handlers and can't even process supervision events. As a result, we have to wrap the current context's instance in a special `ContextInstance` enum, and everywhere we want to use it, we either have to use the `instance_dispatch!` macro, or insert code that looks like:
```rust
match instance {
  ContextInstance::PythonActor(ins) => { do something },
  ContextInstance::Client(ins) => { do something else },
}
```
This makes the code more error-prone and harder to understand, with the added complication that the client handling is often not idiomatic w.r.t hyperactor due to the lack of message handlers/actor loop. Some examples:
- [Confusing supervision handling where `owner` might not be defined but `is_owned` is still true and so we need to call into a special `unhandled` function instead of continuing to propagate up the hierarchy](https://fburl.com/code/andy3ggr)
- [The root client can't have child actors due to no supervision event handling, so they have to be spawned directly on the root client proc, and even then, there is no way for the supervision event to reach `monarch.actor.unhandled_fault_hook`](https://fburl.com/code/kqd2iwvc)
- [The root client handles undeliverable messages via a bespoke tokio task/thread](https://fburl.com/code/jjgfy5d5)

Making the root client a normal python actor solves these problems, because:
- We don't need a `ContextInstance` enum anymore -- `PyInstance` *always* contains `Instance<PythonActor>`.
- Supervision events follow a unified path as they bubble up through the hierarchy, and *every* unhandled event reaches `RootClientActor.__supervise__`, defined in python, without special handling.
- The root client can handle undeliverable messages using `RootClientActor._handle_undeliverable_message`, defined in python, without special handling.

# Navigating the code changes (guide for reviewers)

There are a lot of file changes here but only some of them are important. I would recommend reviewing them in the following order:
- `monarch/_src/actor/actor_mesh.py`
  - Defines the `RootClientActor` python class and its behavior.
- `hyperactor/src/proc.rs`
  - Introduces `Proc::actor_instance::<A>(...)`, which returns a detached `A`-typed actor instance/handle, along with its supervision receiver, signal receiver and message receiver.
- `monarch_hyperactor/src/actor.rs`
  - Introduces `PythonActor::bootstrap_client()`, which replaces `global_root_client()` in the root client context. This function starts the root client proc, spawns the `RootClientActor`, starts its actor loop and returns the `Instance<PythonActor>`.
  - The root client actor can now handle `SupervisionFailureMessage` just like every other actor in the hierarchy.
  - Implements `PythonActor::handle_supervision_event` to pass the event to the actor's `SupervisionFailureMessage` handler. This way, **every unhandled supervision event in the system makes its way to `RootClientActor.__supervise__` eventually**.
- `monarch_hyperactor/src/v1/actor_mesh.rs`
  - Deletes the special handling from the actor states monitor like `is_owned` and the explicit `unhandled_fault_hook` call. If `owner` is defined, it forwards the `SupervisionFailureMessage`, or else it does nothing.
  - Fixes (what I think was) a bug in `send_state_change`. A supervision event should only be forwarded as `SupervisionFailureMessage` to `owner` if it represents a failure. With the logic before this diff, stopping an actor mesh from inside an actor endpoint would generate a supervision event that reaches `unhandled_fault_hook` and crashes the root process even if it was a healthy stop.
- `monarch_hyperactor/src/context.rs`
  - Deletes `ContextInstance` and replaces it in `PyInstance` with `Instance<PythonActor>`.
- The rest of the changes are pretty much just cleaning up `instance_dispatch!` calls.

Differential Revision: [D87296357](https://our.internmc.facebook.com/intern/diff/D87296357/)

**NOTE FOR REVIEWERS**: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D87296357/)!

[ghstack-poisoned]
samlurye added a commit that referenced this pull request Nov 24, 2025
Pull Request resolved: #1985

This diff makes the root client actor just another `PythonActor`.

# Why?

Right now the monarch codebase is peppered with special handling to distinguish between normal python actors and the root client "actor", which has type `()` and is actually just a detached `Instance` with no actor loop; it therefore has no message handlers and can't even process supervision events. As a result, we have to wrap the current context's instance in a special `ContextInstance` enum, and everywhere we want to use it, we either have to use the `instance_dispatch!` macro, or insert code that looks like:
```rust
match instance {
  ContextInstance::PythonActor(ins) => { do something },
  ContextInstance::Client(ins) => { do something else },
}
```
This makes the code more error-prone and harder to understand, with the added complication that the client handling is often not idiomatic w.r.t hyperactor due to the lack of message handlers/actor loop. Some examples:
- [Confusing supervision handling where `owner` might not be defined but `is_owned` is still true and so we need to call into a special `unhandled` function instead of continuing to propagate up the hierarchy](https://fburl.com/code/andy3ggr)
- [The root client can't have child actors due to no supervision event handling, so they have to be spawned directly on the root client proc, and even then, there is no way for the supervision event to reach `monarch.actor.unhandled_fault_hook`](https://fburl.com/code/kqd2iwvc)
- [The root client handles undeliverable messages via a bespoke tokio task/thread](https://fburl.com/code/jjgfy5d5)

Making the root client a normal python actor solves these problems, because:
- We don't need a `ContextInstance` enum anymore -- `PyInstance` *always* contains `Instance<PythonActor>`.
- Supervision events follow a unified path as they bubble up through the hierarchy, and *every* unhandled event reaches `RootClientActor.__supervise__`, defined in python, without special handling.
- The root client can handle undeliverable messages using `RootClientActor._handle_undeliverable_message`, defined in python, without special handling.

# Navigating the code changes (guide for reviewers)

There are a lot of file changes here but only some of them are important. I would recommend reviewing them in the following order:
- `monarch/_src/actor/actor_mesh.py`
  - Defines the `RootClientActor` python class and its behavior.
- `hyperactor/src/proc.rs`
  - Introduces `Proc::actor_instance::<A>(...)`, which returns a detached `A`-typed actor instance/handle, along with its supervision receiver, signal receiver and message receiver.
- `monarch_hyperactor/src/actor.rs`
  - Introduces `PythonActor::bootstrap_client()`, which replaces `global_root_client()` in the root client context. This function starts the root client proc, spawns the `RootClientActor`, starts its actor loop and returns the `Instance<PythonActor>`.
  - The root client actor can now handle `SupervisionFailureMessage` just like every other actor in the hierarchy.
  - Implements `PythonActor::handle_supervision_event` to pass the event to the actor's `SupervisionFailureMessage` handler. This way, **every unhandled supervision event in the system makes its way to `RootClientActor.__supervise__` eventually**.
- `monarch_hyperactor/src/v1/actor_mesh.rs`
  - Deletes the special handling from the actor states monitor like `is_owned` and the explicit `unhandled_fault_hook` call. If `owner` is defined, it forwards the `SupervisionFailureMessage`, or else it does nothing.
  - Fixes (what I think was) a bug in `send_state_change`. A supervision event should only be forwarded as `SupervisionFailureMessage` to `owner` if it represents a failure. With the logic before this diff, stopping an actor mesh from inside an actor endpoint would generate a supervision event that reaches `unhandled_fault_hook` and crashes the root process even if it was a healthy stop.
- `monarch_hyperactor/src/context.rs`
  - Deletes `ContextInstance` and replaces it in `PyInstance` with `Instance<PythonActor>`.
- The rest of the changes are pretty much just cleaning up `instance_dispatch!` calls.

Differential Revision: [D87296357](https://our.internmc.facebook.com/intern/diff/D87296357/)

**NOTE FOR REVIEWERS**: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D87296357/)!
ghstack-source-id: 325487297
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot. fb-exported meta-exported

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants