Skip to content

Commit 0501b4e

Browse files
Merge pull request #1107 from GFleishman/main
Added rst file for rdt of distributed module
2 parents 0d71166 + ded9f7e commit 0501b4e

File tree

2 files changed

+166
-0
lines changed

2 files changed

+166
-0
lines changed

docs/distributed.rst

Lines changed: 165 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,165 @@
1+
Big Data
2+
------------------------------------------------
3+
4+
Distributed Cellpose for larger-than-memory data
5+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
6+
7+
The ``cellpose.contrib.distributed_cellpose`` module is intended to help run cellpose on 3D datasets
8+
that are too large to fit in system memory. The dataset is divided into overlapping blocks and
9+
each block is segmented separately. Results are stitched back together into a seamless segmentation
10+
of the whole dataset.
11+
12+
Built to run on workstations or clusters. Blocks can be run in parallel, in series, or both.
13+
Compute resources (GPUs, CPUs, and RAM) can be arbitrarily partitioned for parallel computing.
14+
Currently workstations and LSF clusters are supported. SLURM clusters are
15+
an easy addition - if you need this to run on a SLURM cluster `please post a feature request issue
16+
to the github repository <https://github.com/MouseLand/cellpose/issues>`_ and tag @GFleishman.
17+
18+
The input data format must be a zarr array. Some functions are provided in the module to help
19+
convert your data to a zarr array, but not all formats or situations are covered. These are
20+
good opportunities to submit pull requests. Currently, the module must be run via the Python API,
21+
but making it available in the GUI is another good PR or feature request.
22+
23+
All user facing functions in the module have verbose docstrings that explain inputs and outputs.
24+
You can access these docstrings like this:
25+
26+
.. code-block:: python
27+
28+
from cellpose.contrib.distributed_segmentation import distributed_eval
29+
distributed_eval?
30+
31+
Examples
32+
~~~~~~~~
33+
34+
Run distributed Cellpose on half the resources of a workstation that has 16 cpus, 1 gpu,
35+
and 128GB system memory:
36+
37+
.. code-block:: python
38+
39+
from cellpose.contrib.distributed_segmentation import distributed_eval
40+
41+
# parameterize cellpose however you like
42+
model_kwargs = {'gpu':True, 'model_type':'cyto3'} # can also use 'pretrained_model'
43+
eval_kwargs = {'diameter':30,
44+
'z_axis':0,
45+
'channels':[0,0],
46+
'do_3D':True,
47+
}
48+
49+
# define compute resources for local workstation
50+
cluster_kwargs = {
51+
'n_workers':1, # if you only have 1 gpu, then 1 worker is the right choice
52+
'ncpus':8,
53+
'memory_limit':'64GB',
54+
'threads_per_worker':1,
55+
}
56+
57+
# run segmentation
58+
# outputs:
59+
# segments: zarr array containing labels
60+
# boxes: list of bounding boxes around all labels (very useful for navigating big data)
61+
segments, boxes = distributed_eval(
62+
input_zarr=large_zarr_array,
63+
blocksize=(256, 256, 256),
64+
write_path='/where/zarr/array/containing/results/will/be/written.zarr',
65+
model_kwargs=model_kwargs,
66+
eval_kwargs=eval_kwargs,
67+
cluster_kwargs=cluster_kwargs,
68+
)
69+
70+
71+
Test run a single block before distributing the whole dataset (always a good idea):
72+
73+
.. code-block:: python
74+
75+
from cellpose.contrib.distributed_segmentation import process_block
76+
77+
# parameterize cellpose however you like
78+
model_kwargs = {'gpu':True, 'model_type':'cyto3'}
79+
eval_kwargs = {'diameter':30,
80+
'z_axis':0,
81+
'channels':[0,0],
82+
'do_3D':True,
83+
}
84+
85+
# define a crop as the distributed function would
86+
starts = (128, 128, 128)
87+
blocksize = (256, 256, 256)
88+
overlap = 60
89+
crop = tuple(slice(s-overlap, s+b+overlap) for s, b in zip(starts, blocksize))
90+
91+
# call the segmentation
92+
segments, boxes, box_ids = process_block(
93+
block_index=(0, 0, 0), # when test_mode=True this is just a dummy value
94+
crop=crop,
95+
input_zarr=my_zarr_array,
96+
model_kwargs=model_kwargs,
97+
eval_kwargs=eval_kwargs,
98+
blocksize=blocksize,
99+
overlap=overlap,
100+
output_zarr=None,
101+
test_mode=True,
102+
)
103+
104+
105+
Convert a single large (but still smaller than system memory) tiff image to a zarr array:
106+
107+
.. code-block:: python
108+
109+
# Note full image will be loaded in system memory
110+
import tifffile
111+
from cellpose.contrib.distributed_segmentation import numpy_array_to_zarr
112+
113+
data_numpy = tifffile.imread('/path/to/image.tiff')
114+
data_zarr = numpy_array_to_zarr('/path/to/output.zarr', data_numpy, chunks=(256, 256, 256))
115+
del data_numpy # assumption is data is large, don't keep in memory copy around
116+
117+
118+
Wrap a folder of tiff images/tiles into a single zarr array without duplicating any data:
119+
120+
.. code-block:: python
121+
122+
# Note tiff filenames must indicate the position of each file in the overall tile grid
123+
from cellpose.contrib.distributed_segmentation import wrap_folder_of_tiffs
124+
reconstructed_virtual_zarr_array = wrap_folder_of_tiffs(
125+
filname_pattern='/path/to/folder/of/*.tiff',
126+
block_index_pattern=r'_(Z)(\d+)(Y)(\d+)(X)(\d+)',
127+
)
128+
129+
130+
Run distributed Cellpose on an LSF cluster with 128 GPUs (e.g. Janelia cluster):
131+
132+
.. code-block:: python
133+
134+
from cellpose.contrib.distributed_segmentation import distributed_eval
135+
136+
# parameterize cellpose however you like
137+
model_kwargs = {'gpu':True, 'model_type':'cyto3'}
138+
eval_kwargs = {'diameter':30,
139+
'z_axis':0,
140+
'channels':[0,0],
141+
'do_3D':True,
142+
}
143+
144+
# define LSFCluster parameters
145+
cluster_kwargs = {
146+
'ncpus':2, # cpus per worker
147+
'min_workers':8, # cluster adapts number of workers based on number of blocks
148+
'max_workers':128,
149+
'queue':'gpu_l4', # flags required to specify a gpu job may differ between clusters
150+
'job_extra_directives':['-gpu "num=1"'],
151+
}
152+
153+
# run segmentation
154+
# outputs:
155+
# segments: zarr array containing labels
156+
# boxes: list of bounding boxes around all labels (very useful for navigating big data)
157+
segments, boxes = distributed_eval(
158+
input_zarr=large_zarr_array,
159+
blocksize=(256, 256, 256),
160+
write_path='/where/zarr/array/containing/results/will/be/written.zarr',
161+
model_kwargs=model_kwargs,
162+
eval_kwargs=eval_kwargs,
163+
cluster_kwargs=cluster_kwargs,
164+
)
165+

docs/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -55,6 +55,7 @@ Cellpose: a generalist algorithm for cellular segmentation
5555
restore
5656
train
5757
benchmark
58+
distributed
5859
openvino
5960
faq
6061

0 commit comments

Comments
 (0)