You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* Use Memray to examine tasks running on `lithops` or `processes` executors
* Add Memray documentation
* Fix mypy
* Update docs/user-guide/diagnostics.md
Co-authored-by: Tom Nicholas <[email protected]>
* Make memray check more defensive
---------
Co-authored-by: Tom Nicholas <[email protected]>
Copy file name to clipboardExpand all lines: docs/user-guide/diagnostics.md
+38Lines changed: 38 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -92,3 +92,41 @@ The timeline callback will write a graphic `timeline.svg` to a directory with th
92
92
93
93
### Examples in use
94
94
See the [examples](https://github.com/cubed-dev/cubed/blob/main/examples/README.md) for more information about how to use them.
95
+
96
+
## Memray
97
+
98
+
[Memray](https://github.com/bloomberg/memray), a memory profiler for Python, can be used to track and view memory allocations when running a single task in a Cubed computation.
99
+
100
+
This is not usually needed when using Cubed, but for developers writing new operations, improving projected memory sizes, or for debugging a memory issue, it can be very useful to understand how memory is actually allocated in Cubed.
101
+
102
+
To enable Memray memory profiling in Cubed, simply install memray (`pip install memray`). Then use a local executor that runs tasks in separate processes, such as `processes` (Python 3.11 or later) or `lithops`. When you run a computation, Cubed will enable Memray for the first task in each operation (so if an array has 100 chunks it will only produce one Memray trace).
103
+
104
+
Here is an example of a simple addition operation, with 200MB chunks. (It is adapted from [test_mem_utilization.py](https://github.com/cubed-dev/cubed/blob/main/cubed/tests/test_mem_utilization.py) in Cubed's test suite.)
105
+
106
+
```python
107
+
import cubed.array_api as xp
108
+
import cubed.random
109
+
110
+
a = cubed.random.random(
111
+
(10000, 10000), chunks=(5000, 5000), spec=spec
112
+
) # 200MB chunks
113
+
b = cubed.random.random(
114
+
(10000, 10000), chunks=(5000, 5000), spec=spec
115
+
) # 200MB chunks
116
+
c = xp.add(a, b)
117
+
c.compute(optimize_graph=False)
118
+
```
119
+
120
+
The optimizer is turned off so that generation of the random arrays is not fused with the add operation. This way we can see the memory allocations for that operation alone.
121
+
122
+
After the computation is complete there will be a collection of `.bin` files in the `history/compute-{id}/memray` directory - with one for each operation. To view them we convert them to HTML flame graphs as follows:

131
+
132
+
Annotations have been added to explain what is going on in this example. Note that reading a chunk from Zarr requires twice the chunk memory (400MB) since there is a buffer for the compressed Zarr block (200MB), as well as the resulting array (200MB). After the first chunk has been loaded the memory dips back to 200MB since the compressed buffer is no longer retained.
0 commit comments