Skip to content
6 changes: 4 additions & 2 deletions src/zarr/storage/_obstore.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,8 @@
from collections.abc import Iterable
from typing import TYPE_CHECKING, Any, TypedDict

import numpy as np

from zarr.abc.store import (
ByteRequest,
OffsetByteRequest,
Expand Down Expand Up @@ -145,15 +147,15 @@ async def set(self, key: str, value: Buffer) -> None:

self._check_writable()

buf = value.to_bytes()
buf = value.as_numpy_array().view(np.uint8)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worthwhile to make this a standard method? This would make it easier to use throughout the codebase

Some possible names:

  • to_bytelike
  • to_binary
  • to_uint8s
  • ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i guess i'm also curious to know why we have a to_bytes() method that makes a copy, when .as_numpy_array().view(np.uint8) achieves the same effective result (an iterable of bytes) without a copy.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was suggested in #2925, so I've added an as_bytes_like method.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bytes objects always own their memory. So creating a new bytes object means making a copy

If we prefer to move away from this approach, we could add a deprecation cycle to move from to_bytes to as_bytes_like

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think FsspecStore works with BytesLike since it didn't accept a memoryview object when I tried it. So we probably need to keep to_bytes().

Copy link
Member

@jakirkham jakirkham Apr 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think this would be a good issue to raise with fsspec if we can reproduce with it directly

If not, a Zarr issue would be welcome

Ideally these copies should be avoidable in the FsspecStore case as well

Edit: Ofc this is non-blocking, just would like to improve the performance in all cases

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some possible names:

I'd also suggest as_buffer_like, because in my head "bytes" in Python is a type that always owns its own memory, while "buffer" doesn't necessarily own its memory, as in the buffer protocol or collections.abc.Buffer.

await obs.put_async(self.store, key, buf)

async def set_if_not_exists(self, key: str, value: Buffer) -> None:
# docstring inherited
import obstore as obs

self._check_writable()
buf = value.to_bytes()
buf = value.as_numpy_array().view(np.uint8)
with contextlib.suppress(obs.exceptions.AlreadyExistsError):
await obs.put_async(self.store, key, buf, mode="create")

Expand Down