Restore optimizations for NDBuffer.all_equal

y4n9squared · y4n9squared · commit c75d27d77501 · 2025-01-18T20:37:03.000Z
Zarr 3.x has some performance regressions for certain write workloads
(writing large chunks with floating point dtype).

This change modifies the implementation of `NDBuffer.all_equal` to be
the same logic as Zarr 2.x's `zarr.util.all_equals`, which contains a
number of important optimizations. A few mechanical changes were made
to accomodate that the subroutine is now a method of `NDBuffer` rather
than function.

This change is most impactful when writing large floating point chunks
as the implementation of

```python
np.all(np.isnan(self._data))
```

is significantly more efficient than calling

```python
_data, other = np.broadcast(self.data, np.nan)
np.array_equal(_data, other, equal_nan=True))
```

since `np.broadcast` requires potentially a large allocation -- the
size of `self.data -- and then np.array_equal needs to fetch double the
number of cache lines.

On EC2 r7i.2xlarge:

```
In [20]: data = np.random.rand(512, 512, 8)

In [21]: %timeit np.all(np.isnan(data))
596 μs ± 179 ns per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

In [22]: %%timeit
    ...: data_, other = np.broadcast_arrays(data, np.nan)
    ...: np.array_equal(data_, other, equal_nan=True)
    ...:
    ...:
2.66 ms ± 953 ns per loop (mean ± std. dev. of 7 runs, 100 loops each)
```

(Both numbers are faster on M3 Max but similar slowdown).

With low-latency stores (e.g. local SSD), this results in double-digit %
speed-ups for the workload referenced in the Zarr V3 blog post:

```
import numpy as np
import zarr

za = zarr.create_array(
    /tmp/foo.zarr",
    shape=(512, 512, 512),
    chunks=(512, 512, 8),
    dtype=np.float64,
    overwrite=True,
)

arr = np.random.rand(512, 512, 512)

za[:] = arr
```

For higher latency stores, improvement is still dramatic (10%+) when
chunks have high compression ratios (e.g. np.ones).

For arrays larger than 1 GB, improvement is even more pronounced.
diff --git a/src/zarr/core/buffer/core.py b/src/zarr/core/buffer/core.py
@@ -460,18 +460,33 @@ def __len__(self) -> int:
     def __repr__(self) -> str:
         return f"<NDBuffer shape={self.shape} dtype={self.dtype} {self._data!r}>"
 
-    def all_equal(self, other: Any, equal_nan: bool = True) -> bool:
-        """Compare to `other` using np.array_equal."""
-        if other is None:
+    def all_equal(self, value: Any, equal_nan: bool = True) -> bool:
+        if value is None:
             # Handle None fill_value for Zarr V2
             return False
-        # use array_equal to obtain equal_nan=True functionality
-        # Since fill-value is a scalar, isn't there a faster path than allocating a new array for fill value
-        # every single time we have to write data?
-        _data, other = np.broadcast_arrays(self._data, other)
-        return np.array_equal(
-            self._data, other, equal_nan=equal_nan if self._data.dtype.kind not in "USTO" else False
-        )
+
+        if not value:
+            # If `value` is falsey, then just 1 truthy value in `array`
+            # is sufficient to return False. We assume here that np.any is
+            # optimized to return on the first truthy value in `array`.
+            try:
+                return not np.any(self._data)
+            except (TypeError, ValueError):  # pragma: no cover
+                pass
+
+        if np.issubdtype(self._data.dtype, np.object_):
+            # We have to flatten the result of np.equal to handle outputs like
+            # [np.array([True,True]), True, True]
+            return all(np.equal(value, self._data, dtype=self._data.dtype).flatten())
+        else:
+            # Numpy errors if you call np.isnan on custom dtypes, so ensure
+            # we are working with floats before calling isnan
+            if np.issubdtype(self._data.dtype, np.floating) and np.isnan(value):
+                return np.all(np.isnan(self._data))
+            else:
+                # Using == raises warnings from numpy deprecated pattern, but
+                # using np.equal() raises type errors for structured dtypes...
+                return np.all(value == self._data)
 
     def fill(self, value: Any) -> None:
         self._data.fill(value)