Skip to content
Open
Show file tree
Hide file tree
Changes from 8 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 30 additions & 0 deletions .github/workflows/codspeed.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
name: CodSpeed Benchmarks

on:
push:
branches:
- "main" # or "master"
pull_request:
# `workflow_dispatch` allows CodSpeed to trigger backtest
# performance analysis in order to generate initial data.
workflow_dispatch:

jobs:
benchmarks:
name: Run benchmarks
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v5
with:
fetch-depth: 0 # grab all branches and tags
- name: Set up Python
uses: actions/setup-python@v6
- name: Install Hatch
run: |
python -m pip install --upgrade pip
pip install hatch
- name: Run the benchmarks
uses: CodSpeedHQ/action@v4
with:
mode: instrumentation
run: hatch run test.py3.11-1.26-minimal:run-benchmark
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we test the latest instead? seems more appropriate...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The latest version of python? What's the reasoning? I'd rather update this file when we drop a supported version vs when a new version of python comes out.

Copy link
Contributor

@dcherian dcherian Oct 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we'd want to catch a perf regression from upstream changes too? I'm suggested latest version of released libraries py=3.13, np=2.2

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't have an upper bound on numpy versions, so I don't think this particular workflow will help us catch regressions from upstream changes -- we would need to update this workflow every time a new version of numpy is released. IMO that's something we should do in a separate benchmark workflow. This workflow here will run on every PR, and in that case the oldest version of numpy we support seems better.

we also don't have to use a pre-baked hatch environment here, we could define a dependency set specific to benchmarking. but my feeling is that benchmarking against older versions of stuff gives us a better measure of what users will actually experience.

1 change: 1 addition & 0 deletions changes/3562.misc.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Add continuous performance benchmarking infrastructure.
10 changes: 9 additions & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -83,6 +83,8 @@ test = [
'numpydoc',
"hypothesis",
"pytest-xdist",
"pytest-benchmark",
"pytest-codspeed",
"packaging",
"tomlkit",
"uv",
Expand Down Expand Up @@ -181,6 +183,7 @@ run-pytest = "run"
run-verbose = "run-coverage --verbose"
run-mypy = "mypy src"
run-hypothesis = "run-coverage -nauto --run-slow-hypothesis tests/test_properties.py tests/test_store/test_stateful*"
run-benchmark = "pytest --benchmark-enable tests/benchmarks"
list-env = "pip list"

[tool.hatch.envs.gputest]
Expand Down Expand Up @@ -405,7 +408,12 @@ doctest_optionflags = [
"IGNORE_EXCEPTION_DETAIL",
]
addopts = [
"--durations=10", "-ra", "--strict-config", "--strict-markers",
"--benchmark-columns", "min,mean,stddev,outliers,rounds,iterations",
"--benchmark-group-by", "group",
"--benchmark-warmup", "on",
"--benchmark-disable", # run benchmark routines but don't do benchmarking
"--durations", "10",
"-ra", "--strict-config", "--strict-markers",
]
filterwarnings = [
"error",
Expand Down
Empty file added tests/benchmarks/__init__.py
Empty file.
90 changes: 90 additions & 0 deletions tests/benchmarks/test_e2e.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
"""
Test the basic end-to-end read/write performance of Zarr
"""

from __future__ import annotations

from dataclasses import dataclass
from typing import TYPE_CHECKING

if TYPE_CHECKING:
from pytest_benchmark.fixture import BenchmarkFixture

from zarr.abc.store import Store
from zarr.core.common import NamedConfig
from operator import getitem, setitem
from typing import Any, Literal

import pytest

from zarr import create_array

CompressorName = Literal["gzip"] | None

compressors: dict[CompressorName, NamedConfig[Any, Any] | None] = {
None: None,
"gzip": {"name": "gzip", "configuration": {"level": 1}},
}


@dataclass(kw_only=True, frozen=True)
class Layout:
shape: tuple[int, ...]
chunks: tuple[int, ...]
shards: tuple[int, ...] | None


layouts: tuple[Layout, ...] = (
Layout(shape=(16,), chunks=(1,), shards=None),
Layout(shape=(16,), chunks=(16,), shards=None),
Layout(shape=(16,), chunks=(1,), shards=(1,)),
Layout(shape=(16,), chunks=(1,), shards=(16,)),
Layout(shape=(16,) * 2, chunks=(1,) * 2, shards=None),
Layout(shape=(16,) * 2, chunks=(16,) * 2, shards=None),
Layout(shape=(16,) * 2, chunks=(1,) * 2, shards=(1,) * 2),
Layout(shape=(16,) * 2, chunks=(1,) * 2, shards=(16,) * 2),
)


@pytest.mark.parametrize("compression_name", [None, "gzip"])
@pytest.mark.parametrize("layout", layouts)
@pytest.mark.parametrize("store", ["memory", "local", "zip"], indirect=["store"])
def test_write_array(
store: Store, layout: Layout, compression_name: CompressorName, benchmark: BenchmarkFixture
) -> None:
"""
Test the time required to fill an array with a single value
"""
arr = create_array(
store,
dtype="uint8",
shape=layout.shape,
chunks=layout.chunks,
shards=layout.shards,
compressors=compressors[compression_name], # type: ignore[arg-type]
fill_value=0,
)

benchmark(setitem, arr, Ellipsis, 1)


@pytest.mark.parametrize("compression_name", [None, "gzip"])
@pytest.mark.parametrize("layout", layouts)
@pytest.mark.parametrize("store", ["memory", "local"], indirect=["store"])
def test_read_array(
store: Store, layout: Layout, compression_name: CompressorName, benchmark: BenchmarkFixture
) -> None:
"""
Test the time required to fill an array with a single value
"""
arr = create_array(
store,
dtype="uint8",
shape=layout.shape,
chunks=layout.chunks,
shards=layout.shards,
compressors=compressors[compression_name], # type: ignore[arg-type]
fill_value=0,
)
arr[:] = 1
benchmark(getitem, arr, Ellipsis)
Loading