-
-
Notifications
You must be signed in to change notification settings - Fork 364
Open
Description
Check out how many times we call get
in this example (writing a single shard with 10 chunks):
# /// script
# requires-python = ">=3.11"
# dependencies = [
# "zarr@git+https://github.com/zarr-developers/zarr-python.git@main",
# ]
# ///
#
# This script automatically imports the development branch of zarr to check for issues
import zarr
import numpy as np
from zarr.storage._logging import LoggingStore
store = LoggingStore(store=zarr.storage.MemoryStore())
shape = (10,)
chunks=(1,)
shards=(10,)
data = np.ones(shape)
zarr.create_array(
store=store,
data=data,
chunks=chunks,
shards=shards,
fill_value=0,
overwrite=True,
)
array = zarr.open_array(store)[:]
2025-09-01 15:50:45,109 - LoggingStore(memory://4309622656) - INFO - Calling MemoryStore.read_only
2025-09-01 15:50:45,109 - LoggingStore(memory://4309622656) - INFO - Finished MemoryStore.read_only [0.00 s]
2025-09-01 15:50:45,109 - LoggingStore(memory://4309622656) - INFO - Calling MemoryStore._ensure_open
2025-09-01 15:50:45,109 - LoggingStore(memory://4309622656) - INFO - Finished MemoryStore._ensure_open [0.00 s]
2025-09-01 15:50:45,110 - LoggingStore(memory://4309622656) - INFO - Calling MemoryStore.supports_deletes
2025-09-01 15:50:45,110 - LoggingStore(memory://4309622656) - INFO - Finished MemoryStore.supports_deletes [0.00 s]
2025-09-01 15:50:45,110 - LoggingStore(memory://4309622656) - INFO - Calling MemoryStore.delete_dir
2025-09-01 15:50:45,110 - LoggingStore(memory://4309622656) - INFO - Finished MemoryStore.delete_dir [0.00 s]
2025-09-01 15:50:45,111 - LoggingStore(memory://4309622656) - INFO - Calling MemoryStore.set(zarr.json)
2025-09-01 15:50:45,111 - LoggingStore(memory://4309622656) - INFO - Finished MemoryStore.set(zarr.json) [0.00 s]
2025-09-01 15:50:45,112 - LoggingStore(memory://4309622656) - INFO - Calling MemoryStore.get(c/0)
2025-09-01 15:50:45,112 - LoggingStore(memory://4309622656) - INFO - Finished MemoryStore.get(c/0) [0.00 s]
2025-09-01 15:50:45,112 - LoggingStore(memory://4309622656) - INFO - Calling MemoryStore.get(c/0)
2025-09-01 15:50:45,112 - LoggingStore(memory://4309622656) - INFO - Finished MemoryStore.get(c/0) [0.00 s]
2025-09-01 15:50:45,113 - LoggingStore(memory://4309622656) - INFO - Calling MemoryStore.get(c/0)
2025-09-01 15:50:45,113 - LoggingStore(memory://4309622656) - INFO - Finished MemoryStore.get(c/0) [0.00 s]
2025-09-01 15:50:45,113 - LoggingStore(memory://4309622656) - INFO - Calling MemoryStore.get(c/0)
2025-09-01 15:50:45,113 - LoggingStore(memory://4309622656) - INFO - Finished MemoryStore.get(c/0) [0.00 s]
2025-09-01 15:50:45,113 - LoggingStore(memory://4309622656) - INFO - Calling MemoryStore.get(c/0)
2025-09-01 15:50:45,113 - LoggingStore(memory://4309622656) - INFO - Finished MemoryStore.get(c/0) [0.00 s]
2025-09-01 15:50:45,113 - LoggingStore(memory://4309622656) - INFO - Calling MemoryStore.get(c/0)
2025-09-01 15:50:45,113 - LoggingStore(memory://4309622656) - INFO - Finished MemoryStore.get(c/0) [0.00 s]
2025-09-01 15:50:45,114 - LoggingStore(memory://4309622656) - INFO - Calling MemoryStore.get(c/0)
2025-09-01 15:50:45,114 - LoggingStore(memory://4309622656) - INFO - Finished MemoryStore.get(c/0) [0.00 s]
2025-09-01 15:50:45,114 - LoggingStore(memory://4309622656) - INFO - Calling MemoryStore.get(c/0)
2025-09-01 15:50:45,114 - LoggingStore(memory://4309622656) - INFO - Finished MemoryStore.get(c/0) [0.00 s]
2025-09-01 15:50:45,114 - LoggingStore(memory://4309622656) - INFO - Calling MemoryStore.get(c/0)
2025-09-01 15:50:45,114 - LoggingStore(memory://4309622656) - INFO - Finished MemoryStore.get(c/0) [0.00 s]
2025-09-01 15:50:45,115 - LoggingStore(memory://4309622656) - INFO - Calling MemoryStore.get(c/0)
2025-09-01 15:50:45,115 - LoggingStore(memory://4309622656) - INFO - Finished MemoryStore.get(c/0) [0.00 s]
2025-09-01 15:50:45,118 - LoggingStore(memory://4309622656) - INFO - Calling MemoryStore.set(c/0)
2025-09-01 15:50:45,118 - LoggingStore(memory://4309622656) - INFO - Finished MemoryStore.set(c/0) [0.00 s]
2025-09-01 15:50:45,118 - LoggingStore(memory://4309622656) - INFO - Calling MemoryStore.set(c/0)
2025-09-01 15:50:45,118 - LoggingStore(memory://4309622656) - INFO - Finished MemoryStore.set(c/0) [0.00 s]
2025-09-01 15:50:45,118 - LoggingStore(memory://4309622656) - INFO - Calling MemoryStore.set(c/0)
2025-09-01 15:50:45,119 - LoggingStore(memory://4309622656) - INFO - Finished MemoryStore.set(c/0) [0.00 s]
2025-09-01 15:50:45,119 - LoggingStore(memory://4309622656) - INFO - Calling MemoryStore.set(c/0)
2025-09-01 15:50:45,119 - LoggingStore(memory://4309622656) - INFO - Finished MemoryStore.set(c/0) [0.00 s]
2025-09-01 15:50:45,119 - LoggingStore(memory://4309622656) - INFO - Calling MemoryStore.set(c/0)
2025-09-01 15:50:45,119 - LoggingStore(memory://4309622656) - INFO - Finished MemoryStore.set(c/0) [0.00 s]
2025-09-01 15:50:45,119 - LoggingStore(memory://4309622656) - INFO - Calling MemoryStore.set(c/0)
2025-09-01 15:50:45,119 - LoggingStore(memory://4309622656) - INFO - Finished MemoryStore.set(c/0) [0.00 s]
2025-09-01 15:50:45,119 - LoggingStore(memory://4309622656) - INFO - Calling MemoryStore.set(c/0)
2025-09-01 15:50:45,119 - LoggingStore(memory://4309622656) - INFO - Finished MemoryStore.set(c/0) [0.00 s]
2025-09-01 15:50:45,120 - LoggingStore(memory://4309622656) - INFO - Calling MemoryStore.set(c/0)
2025-09-01 15:50:45,120 - LoggingStore(memory://4309622656) - INFO - Finished MemoryStore.set(c/0) [0.00 s]
2025-09-01 15:50:45,120 - LoggingStore(memory://4309622656) - INFO - Calling MemoryStore.set(c/0)
2025-09-01 15:50:45,120 - LoggingStore(memory://4309622656) - INFO - Finished MemoryStore.set(c/0) [0.00 s]
2025-09-01 15:50:45,120 - LoggingStore(memory://4309622656) - INFO - Calling MemoryStore.set(c/0)
2025-09-01 15:50:45,120 - LoggingStore(memory://4309622656) - INFO - Finished MemoryStore.set(c/0) [0.00 s]
2025-09-01 15:50:45,121 - LoggingStore(memory://4309622656) - INFO - Calling MemoryStore._ensure_open
2025-09-01 15:50:45,121 - LoggingStore(memory://4309622656) - INFO - Finished MemoryStore._ensure_open [0.00 s]
2025-09-01 15:50:45,121 - LoggingStore(memory://4309622656) - INFO - Calling MemoryStore.get(zarr.json)
2025-09-01 15:50:45,122 - LoggingStore(memory://4309622656) - INFO - Finished MemoryStore.get(zarr.json) [0.00 s]
2025-09-01 15:50:45,122 - LoggingStore(memory://4309622656) - INFO - Calling MemoryStore.get(.zarray)
2025-09-01 15:50:45,122 - LoggingStore(memory://4309622656) - INFO - Finished MemoryStore.get(.zarray) [0.00 s]
2025-09-01 15:50:45,122 - LoggingStore(memory://4309622656) - INFO - Calling MemoryStore.get(.zattrs)
2025-09-01 15:50:45,122 - LoggingStore(memory://4309622656) - INFO - Finished MemoryStore.get(.zattrs) [0.00 s]
2025-09-01 15:50:45,122 - LoggingStore(memory://4309622656) - INFO - Calling MemoryStore.get(c/0)
2025-09-01 15:50:45,122 - LoggingStore(memory://4309622656) - INFO - Finished MemoryStore.get(c/0) [0.00 s]
In principle we should only call get(c/0)
once -- at the very end, when we need to retrieve bytes from it. instead, we call get(c/0)
~10 (i can't count) times in this example. We should also only call set(c/0)
once, because we are writing a full shard. Instead, we call set
once per chunk, which is extremely inefficient for sharded writes.
I'm still trying to figure out how this is being controlled. I suspect it has to do with the batch size of the codec pipeline class, but I haven't confirmed this. I will update this issue when I get further.
Metadata
Metadata
Assignees
Labels
No labels