🦋 Monarch v0.1.0 — Initial Release
We’re excited to announce the first public release of Monarch, a distributed programming framework for PyTorchbuilt around scalable actor messaging and direct memory access.
Monarch brings together ideas from actor-based concurrency, fault-tolerant supervision, and high-performance tensor communication to make distributed training simpler, more explicit, and faster.
🚀 Highlights
- Actor-Based Programming for PyTorch
Define Python classes that run remotely as actors, send them messages, and coordinate distributed work using a clean, imperative API.
from monarch.actor import Actor, endpoint, this_host
training_procs = this_host().spawn_procs({"gpus": 8})
class Trainer(Actor):
@endpoint
def train(self, step: int): ...
trainers = training_procs.spawn("trainers", Trainer)
trainers.train.call(step=0).get()
- Scalable Messaging and Meshes
Actors are organized into meshes — collections that support broadcast, gather, and other scalable communication primitives. - Supervision and Fault Tolerance
Monarch adopts supervision trees for error handling and recovery. Failures propagate predictably, allowing fine-grained restart and robust distributed workflows. - High-Performance RDMA Transfers
Full RDMA integration for CPU and GPU memory via libibverbs, providing zero-copy, one-sided tensor communication across processes and hosts. - Distributed Tensors
Native support for tensors sharded across processes — enabling distributed compute without custom data movement code.
Monarch is experimental and under active development.
Expect incomplete APIs, rapid iteration, and evolving interfaces.
We welcome contributions — please discuss significant changes or ideas via issues before submitting PRs.