writing to a sharded array is 10x slower

### Zarr version

3.1.3

### Numcodecs version

n/a

### Python Version

3.12

### Operating System

Linux

### Installation

uv

### Description

The attached [script]((https://github.com/user-attachments/files/23225723/benchmark_zarr_sharding.py)) shows that writing multiple chunks to an uncompressed sharded array is about 10x slower than writing to a non-sharded array. You can reproduce this bechmark with this command:
```
uv run benchmark_zarr_sharding.py --num-iterations 2 --shape 1000 100000 --chunks 1 100000 --shard-chunks 100 1000000 ^C
```

Some profiling with cProfile reveals that expenses is dominated by repeated calls to [`Buffer._add__`](https://github.com/zarr-developers/zarr-python/blob/b3e9aed305092236c5db70deee0b26dad648d3b0/src/zarr/core/buffer/cpu.py#L116) from _ShardWriter. `Buffer.__add__` concatenates the arrays manually, which will results in many repeated allocations and copies.
```
            np.concatenate((np.asanyarray(self._data), np.asanyarray(other_array)))
```
It would probably be much faster to coalesce these concats into a single operation.  A similar pattern is used for [gpu buffers](https://github.com/zarr-developers/zarr-python/blob/b3e9aed305092236c5db70deee0b26dad648d3b0/src/zarr/core/buffer/gpu.py#L116), so it seems a new api is required to support this use case.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

writing to a sharded array is 10x slower #3560

Zarr version

Numcodecs version

Python Version

Operating System

Installation

Description

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

writing to a sharded array is 10x slower #3560

Description

Zarr version

Numcodecs version

Python Version

Operating System

Installation

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions