You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Zarr 3.x has some performance regressions for certain write workloads
(writing large chunks with floating point dtype).
This change modifies the implementation of `NDBuffer.all_equal` to be
the same logic as Zarr 2.x's `zarr.util.all_equals`, which contains a
number of important optimizations. A few mechanical changes were made
to accomodate that the subroutine is now a method of `NDBuffer` rather
than function.
This change is most impactful when writing large floating point chunks
as the implementation of
```python
np.all(np.isnan(self._data))
```
is significantly more efficient than calling
```python
_data, other = np.broadcast(self.data, np.nan)
np.array_equal(_data, other, equal_nan=True))
```
since `np.broadcast` requires potentially a large allocation -- the
size of `self.data -- and then np.array_equal needs to fetch double the
number of cache lines.
On EC2 r7i.2xlarge:
```
In [20]: data = np.random.rand(512, 512, 8)
In [21]: %timeit np.all(np.isnan(data))
596 μs ± 179 ns per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
In [22]: %%timeit
...: data_, other = np.broadcast_arrays(data, np.nan)
...: np.array_equal(data_, other, equal_nan=True)
...:
...:
2.66 ms ± 953 ns per loop (mean ± std. dev. of 7 runs, 100 loops each)
```
(Both numbers are faster on M3 Max but similar slowdown).
With low-latency stores (e.g. local SSD), this results in double-digit %
speed-ups for the workload referenced in the Zarr V3 blog post:
```
import numpy as np
import zarr
za = zarr.create_array(
/tmp/foo.zarr",
shape=(512, 512, 512),
chunks=(512, 512, 8),
dtype=np.float64,
overwrite=True,
)
arr = np.random.rand(512, 512, 512)
za[:] = arr
```
For higher latency stores, improvement is still dramatic (10%+) when
chunks have high compression ratios (e.g. np.ones).
For arrays larger than 1 GB, improvement is even more pronounced.
0 commit comments