Skip to content

Commit 094b869

Browse files
Faster spatialdata import time (#1075)
* add profimp dev dep * made imports lazy * more repeats for import benchmarks * add profimp to optional requirements * relax pre-commit ignore; cleanup optional deps * tests: force string annotations --------- Co-authored-by: Philipp A. <flying-sheep@web.de>
1 parent ee34079 commit 094b869

File tree

89 files changed

+553
-98
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

89 files changed

+553
-98
lines changed

benchmarks/README.md

Lines changed: 49 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -14,25 +14,63 @@ pip install -e '.[docs,test,benchmark]'
1414

1515
## Usage
1616

17-
Running all the benchmarks is usually not needed. You run the benchmark using `asv run`. See the [asv documentation](https://asv.readthedocs.io/en/stable/commands.html#asv-run) for interesting arguments, like selecting the benchmarks you're interested in by providing a regex pattern `-b` or `--bench` that links to a function or class method e.g. the option `-b timeraw_import_inspect` selects the function `timeraw_import_inspect` in `benchmarks/spatialdata_benchmark.py`. You can run the benchmark in your current environment with `--python=same`. Some example benchmarks:
17+
Running all the benchmarks is usually not needed. You run the benchmark using `asv run`. See the [asv documentation](https://asv.readthedocs.io/en/stable/commands.html#asv-run) for interesting arguments, like selecting the benchmarks you're interested in by providing a regex pattern `-b` or `--bench` that links to a function or class method. You can run the benchmark in your current environment with `--python=same`. Some example benchmarks:
1818

19-
Importing the SpatialData library can take around 4 seconds:
19+
### Import time benchmarks
20+
21+
Import benchmarks live in `benchmarks/benchmark_imports.py`. Each `timeraw_*` function returns a Python code snippet that asv runs in a fresh interpreter (cold import, empty module cache):
22+
23+
Run all import benchmarks in your current environment:
2024

2125
```
22-
PYTHONWARNINGS="ignore" asv run --python=same --show-stderr -b timeraw_import_inspect
23-
Couldn't load asv.plugins._mamba_helpers because
24-
No module named 'conda'
25-
· Discovering benchmarks
26-
· Running 1 total benchmarks (1 commits * 1 environments * 1 benchmarks)
27-
[ 0.00%] ·· Benchmarking existing-py_opt_homebrew_Caskroom_mambaforge_base_envs_spatialdata2_bin_python3.12
28-
[50.00%] ··· Running (spatialdata_benchmark.timeraw_import_inspect--).
29-
[100.00%] ··· spatialdata_benchmark.timeraw_import_inspect 3.65±0.2s
26+
asv run --python=same --show-stderr -b timeraw
27+
```
28+
29+
Or a single one:
30+
31+
```
32+
asv run --python=same --show-stderr -b timeraw_import_spatialdata
33+
```
34+
35+
### Comparing the current branch against `main`
36+
37+
The simplest way is `asv continuous`, which builds both commits, runs the benchmarks, and prints the comparison in one shot:
38+
39+
```bash
40+
asv continuous --show-stderr -v -b timeraw main faster-import
3041
```
3142

43+
Replace `faster-import` with any branch name or commit hash. The `-v` flag prints per-sample timings; drop it for a shorter summary.
44+
45+
Alternatively, collect results separately and compare afterwards:
46+
47+
```bash
48+
# 1. Collect results for the tip of main and the tip of your branch
49+
asv run --show-stderr -b timeraw main
50+
asv run --show-stderr -b timeraw HEAD
51+
52+
# 2. Print a side-by-side comparison
53+
asv compare main HEAD
54+
```
55+
56+
Both approaches build isolated environments from scratch. If you prefer to skip the rebuild and reuse your current environment (faster, less accurate):
57+
58+
```bash
59+
asv run --python=same --show-stderr -b timeraw HEAD
60+
61+
git stash && git checkout main
62+
asv run --python=same --show-stderr -b timeraw HEAD
63+
git checkout - && git stash pop
64+
65+
asv compare main HEAD
66+
```
67+
68+
### Querying benchmarks
69+
3270
Querying using a bounding box without a spatial index is highly impacted by large amounts of points (transcripts), more than table rows (cells).
3371

3472
```
35-
$ PYTHONWARNINGS="ignore" asv run --python=same --show-stderr -b time_query_bounding_box
73+
$ asv run --python=same --show-stderr -b time_query_bounding_box
3674
3775
[100.00%] ··· ======== ============ ============= ============= ==============
3876
-- filter_table / n_transcripts_per_cell

benchmarks/benchmark_imports.py

Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,56 @@
1+
"""Benchmarks for import times of the spatialdata package and its submodules.
2+
3+
Each ``timeraw_*`` function returns a snippet of Python code that asv runs in
4+
a fresh interpreter, so the measured time reflects a cold import with an empty
5+
module cache.
6+
"""
7+
8+
from collections.abc import Callable
9+
from typing import Any
10+
11+
12+
def _timeraw(func: Any) -> Any:
13+
"""Set asv benchmark attributes for a cold-import timeraw function."""
14+
func.repeat = 5 # number of independent subprocess measurements
15+
func.number = 1 # must be 1: second import in same process hits module cache
16+
return func
17+
18+
19+
@_timeraw
20+
def timeraw_import_spatialdata() -> str:
21+
"""Time a bare ``import spatialdata``."""
22+
return """
23+
import spatialdata
24+
"""
25+
26+
27+
@_timeraw
28+
def timeraw_import_SpatialData() -> str:
29+
"""Time importing the top-level ``SpatialData`` class."""
30+
return """
31+
from spatialdata import SpatialData
32+
"""
33+
34+
35+
@_timeraw
36+
def timeraw_import_read_zarr() -> str:
37+
"""Time importing ``read_zarr`` from the top-level namespace."""
38+
return """
39+
from spatialdata import read_zarr
40+
"""
41+
42+
43+
@_timeraw
44+
def timeraw_import_models_elements() -> str:
45+
"""Time importing the main element model classes."""
46+
return """
47+
from spatialdata.models import Image2DModel, Labels2DModel, PointsModel, ShapesModel, TableModel
48+
"""
49+
50+
51+
@_timeraw
52+
def timeraw_import_transformations() -> str:
53+
"""Time importing the ``spatialdata.transformations`` submodule."""
54+
return """
55+
from spatialdata.transformations import Affine, Scale, Translation, Sequence
56+
"""

benchmarks/spatialdata_benchmark.py

Lines changed: 0 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -20,13 +20,6 @@ def peakmem_list2(self):
2020
return sdata
2121

2222

23-
def timeraw_import_inspect():
24-
"""Time the import of the spatialdata module."""
25-
return """
26-
import spatialdata
27-
"""
28-
29-
3023
class TimeMapRaster:
3124
"""Time the."""
3225

docs/conf.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,8 @@
44
# list see the documentation:
55
# https://www.sphinx-doc.org/en/master/usage/configuration.html
66

7+
from __future__ import annotations
8+
79
# -- Path setup --------------------------------------------------------------
810
import sys
911
from datetime import datetime

docs/extensions/typed_returns.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,7 @@
11
# code from https://github.com/theislab/scanpy/blob/master/docs/extensions/typed_returns.py
22
# with some minor adjustment
3+
from __future__ import annotations
4+
35
import re
46

57
from sphinx.application import Sphinx

pyproject.toml

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -64,7 +64,6 @@ extra = [
6464
[dependency-groups]
6565
dev = [
6666
"bump2version",
67-
"sentry-prevent-cli",
6867
]
6968
test = [
7069
"pytest",
@@ -88,6 +87,7 @@ docs = [
8887
benchmark = [
8988
"asv",
9089
"memray",
90+
"profimp",
9191
]
9292

9393
[tool.coverage.run]
@@ -185,6 +185,9 @@ select = [
185185
]
186186
unfixable = ["B", "C4", "UP", "BLE", "T20", "RET"]
187187

188+
[tool.ruff.lint.isort]
189+
required-imports = ["from __future__ import annotations"]
190+
188191
[tool.ruff.lint.pydocstyle]
189192
convention = "numpy"
190193

0 commit comments

Comments
 (0)