Skip to content

Commit 74c0288

Browse files
normanrzphilippottomarkbader
authored
Deprecate chunks_per_shard in favor of shard_shape (#1257)
* make Zarr3 default DataFormat * compress=True * remove deprecations * test fixes * down to 16 * changelog * test fixes * fix /test_dataset_add_remote_mag_and_layer.py * stuff * ci * ci * error on fork * ci * ci * less alignment checks * allow_unaligned * ci.yml aktualisieren * ci testing * ci * sequential tests * ci * ci * ci * ci * parameterize python for kubernetes dockerfile * test * change defaults * mirrored test images * mp logging * mp debugging * debug * debug * debug * debugging * pyproject.toml * py3.12 * debugging * wip * all python versions * revert debug changes in cluster_tools * fixes * larger ci runner * default ci runner * rm pytest-timeout * test * Revert "rm pytest-timeout" This reverts commit 6bc2185. * Revert "test" This reverts commit 8d57971. * ci * ci * ci * ci * ci * ci * properly implement SequentialExecutor * ci * changelog * allow_unaligned wip * ci * wip * fix tests * fix test * examples * longer sleep in slurm test * format * longer sleep in slurm test * Apply suggestions from code review Co-authored-by: Philipp Otto <[email protected]> * add methods * more robust patch * comment * derive_nd_bounding_box_from_shape * refactor Dataset.open to not require an additional IOop * cassettes * format * lint * docs * deprecate chunks_per_shard * deprecate dtype_per_layer * type * fixes for add_layer_from_images * Vec3Int.from_vec_or_int * export defaults * write_layer args * MagLike + mag in write_layer * doc * docstring * change default data_format in cli * docs * changelog * changelog * Update webknossos/Changelog.md Co-authored-by: Mark Bader <[email protected]> * docstring * more kwargs * tests --------- Co-authored-by: Philipp Otto <[email protected]> Co-authored-by: Mark Bader <[email protected]>
1 parent eb6810a commit 74c0288

24 files changed

+646
-325
lines changed

webknossos/Changelog.md

Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,8 @@ For upgrade instructions, please check the respective _Breaking Changes_ section
1414

1515
### Breaking Changes
1616
- Changed writing behavior. There is a new argument `allow_resize` for `MagView.write`, which defaults to `False`. If set to `True`, the bounding box of the underlying `Layer` will be resized to fit the to-be-written data. That largely mirrors the previous behavior. However, it is not safe for concurrent operations, so it is disabled by default. It is recommended to set the `Layer.bounding_box` to the desired size before writing. Additionally, by default, writes need to be aligned with the underlying shard grid to guard against concurrency issues and avoid performance footguns. There is a new argument `allow_unaligned`, which defaults to `False`. If set to `True`, the check for shard alignment is skipped.
17+
- Deprecated `chunks_per_shard` arguments in favor of `shard_shape`, which equals to `shard_shape = chunk_shape * chunks_per_shard`. The shard shape is more intuitive, because it directly defines the size of shards instead of being a factor of the chunk shape.
18+
- Deprecated `dtype_per_layer` argument, because it promotes the use of uncommon dtypes and leads to confusion with the other `dtype_per_channel` argument. With this change only the use of `dtype_per_channel` is encouraged.
1719
- Removed deprecated functions, properties and arguments:
1820
- Functions:
1921
- `open_annotation`, use `Annotation.load()` instead
@@ -101,12 +103,58 @@ For upgrade instructions, please check the respective _Breaking Changes_ section
101103
- `compress` in `Layer.upsample` is now `True`
102104
- `buffer_size` in `View.get_buffered_slice_reader` is now computed from the shard shape
103105
- `buffer_size` in `View.get_buffered_slice_writer` is now computed from the shard shape
106+
- Moved from positional argument to keyword-only argument:
107+
- `json_update_allowed` in `MagView.write`
108+
- `organization_id`, `sharing_token`, `webknossos_url`, `bbox`, `layers`, `mags`, `path`, `exist_ok` in `Dataset.download`
109+
- `layers_to_link`, `jobs` in `Dataset.upload`
110+
- `dtype_per_layer`, `dtype_per_channel`, `num_channels`, `data_format`, `bounding_box` in `Dataset.add_layer`
111+
- `dtype_per_layer`, `dtype_per_channel`, `num_channels`, `data_format` in `Dataset.get_or_add_layer`
112+
- `data_format`, `mag`, `chunk_shape`, `chunks_per_shard`, `shard_shape`, `compress` in `Dataset.add_layer_from_images`
113+
- `chunk_shape`, `shard_shape`, `chunks_per_shard`, `data_format`, `compress`, `executor` in `Dataset.add_copy_layer`
114+
- `organization_id`, `tags`, `name`, `folder_id` in `Dataset.get_remote_datasets`
115+
- `make_relative` in `Dataset.add_symlink_layer`
116+
- `name`, `make_relative`, `layers_to_ignore` in `Dataset.shallow_copy_dataset`
117+
- `executor` in `Dataset.compress`
118+
- `sampling_mode`, `coarsest_mag`, `executor` in `Dataset.downsample`
119+
- `voxel_size`, `chunk_shape`, `shard_shape`, `chunks_per_shard`, `data_format`, `compress`, `executor`, `voxel_size_with_unit` in `Dataset.copy_dataset`
120+
- `chunk_shape`, `shard_shape`, `chunks_per_shard`, `compress` in `Layer.add_mag`
121+
- `chunk_shape`, `shard_shape`, `chunks_per_shard`, `compress` in `Layer.get_or_add_mag`
122+
- `extend_layer_bounding_box`, `chunk_shape`, `shard_shape`, `chunks_per_shard`, `compress`, `executor` in `Layer.add_copy_mag`
123+
- `make_relative`, `extend_layer_bounding_box` in `Layer.add_symlink_mag`
124+
- `extend_layer_bounding_box` in `Layer.add_remote_mag`
125+
- `extend_layer_bounding_box` in `Layer.add_fs_copy_mag`
126+
- `move`, `extend_layer_bounding_box` in `Layer.add_mag_from_zarrarray`
127+
- `from_mag`, `coarsest_mag`, `interpolation_mode`, `compress`, `sampling_mode`, `align_with_other_layers`, `buffer_shape`, `force_sampling_scheme`, `allow_overwrite`, `only_setup_mags`, `executor` in `Layer.downsample`
128+
- `interpolation_mode`, `compress`, `buffer_shape`, `allow_overwrite`, `only_setup_mag`, `executor` in `Layer.downsample_mag`
129+
- `interpolation_mode`, `compress`, `buffer_shape`, `executor` in `Layer.redownsample`
130+
- `interpolation_mode`, `compress`, `buffer_shape`, `allow_overwrite`, `only_setup_mags`, `executor` in `Layer.downsample_mag_list`
131+
- `finest_mag`, `compress`, `sampling_mode`, `align_with_other_layers`, `buffer_shape`, `executor` in `Layer.upsample`
132+
- `chunk_shape`, `executor` in `SegmentationLayer.refresh_largest_segment_id`
133+
- `chunk_shape`, `shard_shape`, `chunks_per_shard`, `compression_mode`, `path` in `MagView.create`
134+
- `target_path`, `executor` in `MagView.compress`
104135
- Added arguments:
105136
- `allow_resize` in `MagView.write` with default `False`
106137
- `allow_unaligned` in `MagView.write` with default `False`
138+
- `shard_shape` in `Dataset.from_images`
139+
- `shard_shape` in `Dataset.add_layer_from_images`
140+
- `shard_shape` in `Dataset.copy_dataset`
141+
- Newly deprecated arguments:
142+
- `chunks_per_shard` in `Dataset.from_images`, use `shard_shape` instead
143+
- `dtype_per_layer` in `Dataset.add_layer`, use `dtype_per_channel` instead
144+
- `dtype_per_layer` in `Dataset.get_or_add_layer`, use `dtype_per_channel` instead
145+
- `chunks_per_shard` in `Dataset.add_layer_from_images`, use `shard_shape` instead
146+
- `chunks_per_shard` in `Dataset.copy_dataset`, use `shard_shape` instead
147+
- `dtype_per_layer` in `Layer.__init__`, use `dtype_per_channel` instead
148+
- `chunks_per_shard` in `Layer.add_mag`, use `shard_shape` instead
149+
- `chunks_per_shard` in `Layer.get_or_add_mag`, use `shard_shape` instead
150+
- `chunks_per_shard` in `Layer.add_copy_mag`, use `shard_shape` instead
151+
- `chunks_per_shard` in `MagView.create`, use `shard_shape` instead
152+
- Newly deprecated properties:
153+
- `Layer.dtype_per_layer`
107154

108155

109156
### Added
157+
- Added `Dataset.write_layer` method for writing entire layers in one go. [#1242](https://github.com/scalableminds/webknossos-libs/pull/1242)
110158

111159
### Changed
112160

webknossos/examples/apply_merger_mode.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -56,7 +56,7 @@ def main() -> None:
5656
out_layer = dataset.add_layer(
5757
"segmentation_remapped",
5858
wk.SEGMENTATION_CATEGORY,
59-
dtype_per_layer=in_layer.dtype_per_layer,
59+
dtype_per_channel=in_layer.dtype_per_channel,
6060
largest_segment_id=in_layer.largest_segment_id,
6161
)
6262
out_mag1 = out_layer.add_mag("1")

webknossos/examples/dataset_usage.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -67,8 +67,8 @@ def main() -> None:
6767

6868
copy_of_dataset = dataset.copy_dataset(
6969
"testoutput/copy_of_dataset",
70-
chunk_shape=8,
71-
chunks_per_shard=8,
70+
chunk_shape=(8, 8, 8),
71+
shard_shape=(64, 64, 64),
7272
compress=True,
7373
)
7474
new_layer = dataset.add_layer(

webknossos/examples/learned_segmenter.py

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -66,8 +66,7 @@ def main() -> None:
6666
segmentation_layer = new_dataset.add_layer(
6767
"segmentation",
6868
wk.SEGMENTATION_CATEGORY,
69-
segmentation.dtype,
70-
compressed=True,
69+
dtype_per_channel=segmentation.dtype,
7170
largest_segment_id=int(segmentation.max()),
7271
)
7372
segmentation_layer.bounding_box = dataset.layers["color"].bounding_box

webknossos/tests/dataset/test_add_layer_from_images.py

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,7 @@ def test_compare_tifffile(tmp_path: Path) -> None:
3232
category="segmentation",
3333
topleft=(100, 100, 55),
3434
chunk_shape=(8, 8, 8),
35-
chunks_per_shard=(8, 8, 8),
35+
shard_shape=(64, 64, 64),
3636
)
3737
assert layer.bounding_box.topleft == wk.Vec3Int(100, 100, 55)
3838
data = layer.get_finest_mag().read()[0, :, :]
@@ -52,7 +52,7 @@ def test_compare_nd_tifffile(tmp_path: Path) -> None:
5252
topleft=(2, 55, 100, 100),
5353
data_format="zarr3",
5454
chunk_shape=(8, 8, 8),
55-
chunks_per_shard=(8, 8, 8),
55+
shard_shape=(64, 64, 64),
5656
executor=executor,
5757
)
5858
assert layer.bounding_box.topleft == wk.VecInt(
@@ -373,15 +373,15 @@ def test_bioformats(
373373
(
374374
"https://static.webknossos.org/data/webknossos-libs/slice_0420.dm4",
375375
"slice_0420.dm4",
376-
{"data_format": "zarr"}, # using zarr to allow z=1 chunking
376+
{"data_format": "zarr3"}, # using zarr to allow z=1 chunking
377377
"uint16",
378378
1,
379379
(8192, 8192, 1),
380380
),
381381
(
382382
"https://static.webknossos.org/data/webknossos-libs/slice_0073.dm3",
383383
"slice_0073.dm3",
384-
{"data_format": "zarr"}, # using zarr to allow z=1 chunking
384+
{"data_format": "zarr3"}, # using zarr to allow z=1 chunking
385385
"uint16",
386386
1,
387387
(4096, 4096, 1),
@@ -392,15 +392,15 @@ def test_bioformats(
392392
"https://static.webknossos.org/data/webknossos-libs/slice_0074.dm3",
393393
],
394394
["slice_0073.dm3", "slice_0074.dm3"],
395-
{"data_format": "zarr"}, # using zarr to allow smaller chunking
395+
{"data_format": "zarr3"}, # using zarr to allow smaller chunking
396396
"uint16",
397397
1,
398398
(4096, 4096, 2),
399399
),
400400
(
401401
"https://static.webknossos.org/data/wklibs-samples/dnasample1.zip",
402402
"dnasample1.dm3",
403-
{"data_format": "zarr"}, # using zarr to allow z=1 chunking
403+
{"data_format": "zarr3"}, # using zarr to allow z=1 chunking
404404
"int16",
405405
1,
406406
(4096, 4096, 1),

webknossos/tests/dataset/test_buffered_slice_utils.py

Lines changed: 11 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,7 @@ def test_buffered_slice_writer() -> None:
3030
dtype_per_channel=dtype,
3131
bounding_box=BoundingBox(origin, (24, 24, 35)),
3232
)
33-
mag_view = layer.add_mag(mag, chunks_per_shard=(32, 32, 1))
33+
mag_view = layer.add_mag(mag, shard_shape=(1024, 1024, 32))
3434

3535
with mag_view.get_buffered_slice_writer(absolute_offset=origin) as writer:
3636
for i in range(13):
@@ -73,8 +73,8 @@ def test_buffered_slice_writer_along_different_axis(
7373
cube_size_without_channel = test_cube.shape[1:]
7474
offset = Vec3Int(64, 96, 32)
7575

76-
chunks_per_shard = [32, 32, 32]
77-
chunks_per_shard[dim] = 1
76+
shard_shape = [1024, 1024, 1024]
77+
shard_shape[dim] = 32
7878

7979
ds = Dataset(tmp_path / f"buffered_slice_writer_{dim}", voxel_size=(1, 1, 1))
8080
layer = ds.add_layer(
@@ -83,7 +83,7 @@ def test_buffered_slice_writer_along_different_axis(
8383
num_channels=test_cube.shape[0],
8484
bounding_box=BoundingBox(offset, cube_size_without_channel),
8585
)
86-
mag_view = layer.add_mag(1, chunks_per_shard=chunks_per_shard)
86+
mag_view = layer.add_mag(1, shard_shape=shard_shape)
8787

8888
with mag_view.get_buffered_slice_writer(
8989
absolute_offset=offset, buffer_size=5, dimension=dim, allow_unaligned=True
@@ -115,7 +115,7 @@ def test_buffered_slice_reader_along_different_axis(tmp_path: Path) -> None:
115115
num_channels=3,
116116
bounding_box=BoundingBox(offset, cube_size_without_channel),
117117
)
118-
mag_view = layer.add_mag(1, chunks_per_shard=(32, 32, 1))
118+
mag_view = layer.add_mag(1, shard_shape=(1024, 1024, 32))
119119
mag_view.write(test_cube, absolute_offset=offset)
120120

121121
with (
@@ -156,7 +156,7 @@ def test_basic_buffered_slice_writer(tmp_path: Path) -> None:
156156
num_channels=1,
157157
bounding_box=BoundingBox((0, 0, 0), shape),
158158
)
159-
mag1 = layer.add_mag("1", chunk_shape=(32, 32, 32), chunks_per_shard=(8, 8, 8))
159+
mag1 = layer.add_mag("1", chunk_shape=(32, 32, 32), shard_shape=(256, 256, 256))
160160

161161
with warnings.catch_warnings():
162162
warnings.filterwarnings("error") # This escalates the warning to an error
@@ -184,7 +184,7 @@ def test_buffered_slice_writer_unaligned(
184184
num_channels=1,
185185
bounding_box=BoundingBox((0, 0, 0), (513, 513, 36)),
186186
)
187-
mag1 = layer.add_mag("1", chunk_shape=(32, 32, 32), chunks_per_shard=(8, 8, 8))
187+
mag1 = layer.add_mag("1", chunk_shape=(32, 32, 32), shard_shape=(256, 256, 256))
188188

189189
# Write some data to z=32. We will check that this
190190
# data is left untouched by the buffered slice writer.
@@ -230,7 +230,7 @@ def test_buffered_slice_writer_should_raise_unaligned_usage(
230230
num_channels=1,
231231
bounding_box=BoundingBox((0, 0, 0), (513, 513, 33)),
232232
)
233-
mag1 = layer.add_mag("1", chunk_shape=(32, 32, 32), chunks_per_shard=(8, 8, 8))
233+
mag1 = layer.add_mag("1", chunk_shape=(32, 32, 32), shard_shape=(256, 256, 256))
234234

235235
offset = (1, 1, 1)
236236

@@ -261,7 +261,7 @@ def test_basic_buffered_slice_writer_multi_shard(tmp_path: Path) -> None:
261261
num_channels=1,
262262
bounding_box=BoundingBox((0, 0, 0), (160, 150, 140)),
263263
)
264-
mag1 = layer.add_mag("1", chunk_shape=(32, 32, 32), chunks_per_shard=(4, 4, 4))
264+
mag1 = layer.add_mag("1", chunk_shape=(32, 32, 32), shard_shape=(128, 128, 128))
265265
assert mag1.info.shard_shape[2] == 32 * 4
266266

267267
# Allocate some data (~ 3 MB) that covers multiple shards (also in z)
@@ -292,7 +292,7 @@ def test_basic_buffered_slice_writer_multi_shard_multi_channel(tmp_path: Path) -
292292
num_channels=3,
293293
bounding_box=BoundingBox((0, 0, 0), (160, 150, 140)),
294294
)
295-
mag1 = layer.add_mag("1", chunk_shape=(32, 32, 32), chunks_per_shard=(4, 4, 4))
295+
mag1 = layer.add_mag("1", chunk_shape=(32, 32, 32), shard_shape=(128, 128, 128))
296296

297297
# Allocate some data (~ 3 MB) that covers multiple shards (also in z)
298298
shape = (3, 160, 150, 140)
@@ -319,7 +319,7 @@ def test_buffered_slice_writer_reset_offset(tmp_path: Path) -> None:
319319
num_channels=1,
320320
bounding_box=BoundingBox((0, 0, 0), (512, 512, 40)),
321321
)
322-
mag1 = layer.add_mag("1", chunk_shape=(32, 32, 8), chunks_per_shard=(8, 8, 1))
322+
mag1 = layer.add_mag("1", chunk_shape=(32, 32, 8), shard_shape=(256, 256, 8))
323323

324324
# Allocate some data (~ 8 MB)
325325
shape = (512, 512, 32)

0 commit comments

Comments
 (0)