Release 0.1.0 · meta-pytorch/monarch

🦋 Monarch v0.1.0 — Initial Release
We’re excited to announce the first public release of Monarch, a distributed programming framework for PyTorchbuilt around scalable actor messaging and direct memory access.
Monarch brings together ideas from actor-based concurrency, fault-tolerant supervision, and high-performance tensor communication to make distributed training simpler, more explicit, and faster.

🚀 Highlights

Actor-Based Programming for PyTorch
Define Python classes that run remotely as actors, send them messages, and coordinate distributed work using a clean, imperative API.

from monarch.actor import Actor, endpoint, this_host

training_procs = this_host().spawn_procs({"gpus": 8})

class Trainer(Actor):
    @endpoint
    def train(self, step: int): ...

trainers = training_procs.spawn("trainers", Trainer)
trainers.train.call(step=0).get()

Scalable Messaging and Meshes
Actors are organized into meshes — collections that support broadcast, gather, and other scalable communication primitives.
Supervision and Fault Tolerance
Monarch adopts supervision trees for error handling and recovery. Failures propagate predictably, allowing fine-grained restart and robust distributed workflows.
High-Performance RDMA Transfers
Full RDMA integration for CPU and GPU memory via libibverbs, providing zero-copy, one-sided tensor communication across processes and hosts.
Distributed Tensors
Native support for tensors sharded across processes — enabling distributed compute without custom data movement code.

⚠️ Early Development Notice
Monarch is experimental and under active development.
Expect incomplete APIs, rapid iteration, and evolving interfaces.
We welcome contributions — please discuss significant changes or ideas via issues before submitting PRs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

0.1.0

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Uh oh!