|
| 1 | +# Monarch |
| 2 | + |
| 3 | +**Monarch** is a distributed execution engine for PyTorch. |
| 4 | + |
| 5 | +> ⚠️ **Early Development Warning** |
| 6 | +> Monarch is currently in an experimental stage. You should expect bugs, incomplete features, and APIs that may change in future versions. The project welcomes bugfixes, but to make sure things are well coordinated you should discuss any significant change before starting the work. It's recommended that you signal your intention to contribute in the issue tracker, either by filing a new issue or by claiming an existing one. |
| 7 | +
|
| 8 | +## Installation |
| 9 | + |
| 10 | +```sh |
| 11 | + |
| 12 | +# Create and activate the conda environment |
| 13 | +conda create -n monarchenv python=3.10 -y |
| 14 | +conda activate monarchenv |
| 15 | + |
| 16 | +# Install nightly rust toolchain |
| 17 | +curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh |
| 18 | + |
| 19 | + |
| 20 | +rustup toolchain install nightly |
| 21 | +rustup default nightly |
| 22 | + |
| 23 | +# Install non-python dependencies |
| 24 | +conda install python=3.10 |
| 25 | +conda install libunwind |
| 26 | + |
| 27 | +# needs cuda-toolkit-12-0 as that is the version that matches the /usr/local/cuda/ on devservers |
| 28 | +sudo dnf install cuda-toolkit-12-0 cuda-12-0 libnccl-devel clang-devel |
| 29 | +# install build dependencies |
| 30 | +pip install setuptools-rust |
| 31 | +# install torch, can use conda or build it yourself or whatever |
| 32 | +pip install torch |
| 33 | +# install core deps, see pyproject.toml for latest |
| 34 | +pip install pyzmq requests numpy pyre-extensions cloudpickle |
| 35 | +# Install test dependencies |
| 36 | +pip install pytest pytest-timeout pytest-asyncio |
| 37 | + |
| 38 | +# install the package |
| 39 | +python setup.py install |
| 40 | +# or setup for development |
| 41 | +python setup.py develop |
| 42 | + |
| 43 | +# run unit tests. consider -s for more verbose output |
| 44 | +pytest python/tests/ -v -m "not oss_skip" |
| 45 | +``` |
| 46 | + |
| 47 | +## Running examples |
| 48 | + |
| 49 | +TODO |
| 50 | + |
| 51 | +## Debugging |
| 52 | + |
| 53 | +If everything is hanging, set the environment |
| 54 | +`CONTROLLER_PYSPY_REPORT_INTERVAL=10` to get a py-spy dump of the controller and |
| 55 | +its subprocesses every 10 seconds. |
| 56 | + |
| 57 | +Calling `pdb.set_trace()` inside a worker remote function will cause pdb to |
| 58 | +attach to the controller process to debug the worker. Keep in mind that if there |
| 59 | +are multiple workers, this will create multiple sequential debug sessions for |
| 60 | +each worker. |
| 61 | + |
| 62 | +For the rust based setup you can adjust the log level with |
| 63 | +`RUST_LOG=<log level>` (eg. `RUST_LOG=debug`). |
| 64 | + |
| 65 | +## Profiling |
| 66 | + |
| 67 | +The `monarch.profiler` module provides functionality similar to |
| 68 | +[PyTorch's Profiler](https://pytorch.org/docs/stable/profiler.html) for model |
| 69 | +profiling. It includes `profile` and `record_function` methods. The usage is |
| 70 | +generally the same as `torch.profiler.profile` and |
| 71 | +`torch.profiler.record_function`, with a few modifications specific to |
| 72 | +`monarch.profiler.profile`: |
| 73 | + |
| 74 | +1. `monarch.profiler.profile` exclusively accepts `monarch.profiler.Schedule`, a |
| 75 | + dataclass that mimics `torch.profiler.schedule`. |
| 76 | +2. The `on_trace_ready` argument in `monarch.profiler.profile` must be a string |
| 77 | + that specifies the directory where the worker should save the trace files. |
| 78 | + |
| 79 | +Below is an example demonstrating how to use `monarch.profiler`: |
| 80 | + |
| 81 | +```py |
| 82 | + from monarch.profiler import profile, record_function |
| 83 | + with profile( |
| 84 | + activities=[ |
| 85 | + torch.profiler.ProfilerActivity.CPU, |
| 86 | + torch.profiler.ProfilerActivity.CUDA, |
| 87 | + ], |
| 88 | + on_trace_ready="./traces/", |
| 89 | + schedule=monarch.profilerSchedule(wait=1, warmup=1, active=2, repeat=1), |
| 90 | + record_shapes=True, |
| 91 | + ) as prof: |
| 92 | + with record_function("forward"): |
| 93 | + loss = model(batch) |
| 94 | + |
| 95 | + prof.step() |
| 96 | +``` |
| 97 | + |
| 98 | +## Memory Viewer |
| 99 | + |
| 100 | +The `monarch.memory` module provides functionality similar to |
| 101 | +[PyTorch's Memory Snapshot and Viewer](https://pytorch.org/docs/stable/torch_cuda_memory.html) |
| 102 | +for visualizing and analyzing memory usage in PyTorch models. It includes |
| 103 | +`monarch.memory.dump_memory_snapshot` and `monarch.memory.record_memory_history` |
| 104 | +methods: |
| 105 | + |
| 106 | +1. `monarch.memory.dump_memory_snapshot`: This function wraps |
| 107 | + `torch.cuda.memory._dump_snapshot()` to dump memory snapshot remotely. It can |
| 108 | + be used to save a snapshot of the current memory usage to a file. |
| 109 | +2. `monarch.memory.record_memory_history`: This function wraps |
| 110 | + `torch.cuda.memory_record_memory_history()` to allow recording memory history |
| 111 | + remotely. It can be used to track memory allocation and deallocation over |
| 112 | + time. |
| 113 | + |
| 114 | +Both functions use `remote` to execute the corresponding remote functions |
| 115 | +`_memory_controller_record` and `_memory_controller_dump` on the specified |
| 116 | +device mesh. |
| 117 | + |
| 118 | +Below is an example demonstrating how to use `monarch.memory`: |
| 119 | + |
| 120 | +```py |
| 121 | + ... |
| 122 | + monarch.memory.record_memory_history() |
| 123 | + for step in range(2): |
| 124 | + batch = torch.randn((8, DIM)) |
| 125 | + loss = net(batch) |
| 126 | + ... |
| 127 | + monarch.memory.dump_memory_snapshot(dir_snapshots="./snapshots/") |
| 128 | +``` |
| 129 | + |
| 130 | +## License |
| 131 | +Monarch is BSD-3 licensed, as found in the [LICENSE](LICENSE) file. |
0 commit comments