Skip to content

Commit 8609ebe

Browse files
Merge branch 'main' into nd-transpose
2 parents 30e1712 + 51cba77 commit 8609ebe

File tree

14 files changed

+401
-33
lines changed

14 files changed

+401
-33
lines changed

.github/workflows/build.yml

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,14 @@
11
name: Tests
22

3-
on: [push]
3+
on:
4+
# Trigger the workflow on push or pull request,
5+
# but only for the main branch
6+
push:
7+
branches:
8+
- '**' # this matches all branches
9+
pull_request:
10+
branches:
11+
- main
412

513
jobs:
614
build_wheels:

.github/workflows/cibuildwheels.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
name: Python wheels
2+
23
on:
3-
# Trigger the workflow on push or pull request,
4-
# but only for the main branch
4+
# Trigger the workflow only for tags and PRs to the main branch
55
push:
66
tags:
77
- '*'

ANNOUNCE.rst

Lines changed: 10 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,16 @@
1-
Announcing Python-Blosc2 3.3.0
1+
Announcing Python-Blosc2 3.3.1
22
==============================
33

4-
We are introducing a new blosc2.transpose() function for natively transposing
5-
2D NDArray instances, and a fast path for NDArray.slice() that delivers up to
6-
40x speedup when slices align with underlying chunks. Documentation has also
7-
been improved with several edits throughout.
4+
In our effort to better adapt to better adapt to the array API
5+
(https://data-apis.org/array-api/latest/), we have introduced
6+
permute_dims() and matrix_transpose() functions, and the .T property.
7+
This replaces to previous transpose() function, which is now deprecated.
8+
See PR #384. Thanks to Ricardo Sales Piquer (@ricardosp4).
89

9-
See benchmarks at: https://github.com/Blosc/python-blosc2/blob/main/bench/ndarray/aligned_chunks.py
10+
We have also reduced the memory footprint of constructors like ``arange()``,
11+
``linspace()`` and ``fromiter()`` by a large factor. As an example, a 5 TB
12+
array of 8-byte floats now uses less than 200 MB of memory instead of
13+
170 GB previously.
1014

1115
You can think of Python-Blosc2 3.x as an extension of NumPy/numexpr that:
1216

RELEASE_NOTES.md

Lines changed: 23 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,31 @@
11
# Release notes
22

3-
## Changes from 3.3.0 to 3.3.1
3+
## Changes from 3.3.1 to 3.3.2
44

55
XXX version-specific blurb XXX
66

7+
8+
## Changes from 3.3.0 to 3.3.1
9+
10+
* In our effort to better adapt to better adapt to the array API
11+
(https://data-apis.org/array-api/latest/), we have introduced
12+
permute_dims() and matrix_transpose() functions, and the .T property.
13+
This replaces to previous transpose() function, which is now deprecated.
14+
See PR #384. Thanks to Ricardo Sales Piquer (@ricardosp4).
15+
16+
* Constructors like `arange()`, `linspace()` and `fromiter()` now
17+
use far less memory when creating large arrays. As an example, a 5 TB
18+
array of 8-byte floats now uses less than 200 MB of memory instead of
19+
170 GB previously. See PR #387.
20+
21+
* Now, when opening a lazy expression with `blosc2.open()`, and there is
22+
a missing operand, the open still works, but the dtype and shape
23+
attributes are None. This is useful for lazy expressions that have
24+
lost some operands, but you still want to open them for inspection.
25+
See PR #385.
26+
27+
* Added an example of getting a slice out of a C2Array.
28+
729
## Changes from 3.2.1 to 3.3.0
830

931
* New `blosc2.transpose()` function for transposing 2D NDArray instances

bench/ndarray/compute_dists.py

Lines changed: 136 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,136 @@
1+
#######################################################################
2+
# Copyright (c) 2019-present, Blosc Development Team <[email protected]>
3+
# All rights reserved.
4+
#
5+
# This source code is licensed under a BSD-style license (found in the
6+
# LICENSE file in the root directory of this source tree)
7+
#######################################################################
8+
9+
# Benchmark for comparing compute speeds of Blosc2 and Numexpr.
10+
# One can use different distributions of data:
11+
# constant, arange, linspace, or random
12+
# The expression can be any valid Numexpr expression.
13+
14+
import blosc2
15+
from time import time
16+
import numpy as np
17+
import numexpr as ne
18+
19+
# Bench params
20+
N = 30_000
21+
step = 3000
22+
dtype = np.dtype(np.float64)
23+
persistent = False
24+
dist = "constant" # "arange" or "linspace" or "constant" or "random"
25+
expr = "(a - b)"
26+
#expr = "sum(a - b)"
27+
#expr = "cos(a)**2 + sin(b)**2 - 1"
28+
#expr = "sum(cos(a)**2 + sin(b)**2 - 1)"
29+
30+
# Set default compression params
31+
cparams = blosc2.CParams(clevel=1, codec=blosc2.Codec.BLOSCLZ)
32+
blosc2.cparams_dflts["codec"] = cparams.codec
33+
blosc2.cparams_dflts["clevel"] = cparams.clevel
34+
# Set default storage params
35+
storage = blosc2.Storage(contiguous=True, mode="w")
36+
blosc2.storage_dflts["contiguous"] = storage.contiguous
37+
blosc2.storage_dflts["mode"] = storage.mode
38+
39+
urlpath = dict((aname, None) for aname in ("a", "b", "c"))
40+
if persistent:
41+
urlpath = dict((aname, f"{aname}.b2nd") for aname in ("a", "b", "c"))
42+
43+
btimes = []
44+
bspeeds = []
45+
ws_sizes = []
46+
rng = np.random.default_rng()
47+
for i in range(step, N + step, step):
48+
shape = (i, i)
49+
# shape = (i * i,)
50+
if dist == "constant":
51+
a = blosc2.ones(shape, dtype=dtype, urlpath=urlpath['a'])
52+
b = blosc2.full(shape, 2, dtype=dtype, urlpath=urlpath['b'])
53+
elif dist == "arange":
54+
a = blosc2.arange(0, i * i, dtype=dtype, shape=shape, urlpath=urlpath['a'])
55+
b = blosc2.arange(i * i, 2* i * i, dtype=dtype, shape=shape, urlpath=urlpath['b'])
56+
elif dist == "linspace":
57+
a = blosc2.linspace(0, 1, dtype=dtype, shape=shape, urlpath=urlpath['a'])
58+
b = blosc2.linspace(1, 2, dtype=dtype, shape=shape, urlpath=urlpath['b'])
59+
elif dist == "random":
60+
t0 = time()
61+
_ = np.random.random(shape)
62+
a = blosc2.fromiter(np.nditer(_), dtype=dtype, shape=shape, urlpath=urlpath['a'])
63+
b = a.copy(urlpath=urlpath['b'])
64+
# This uses less memory, but it is 2x-3x slower
65+
# iter_ = (rng.random() for _ in range(i**2 * 2))
66+
# a = blosc2.fromiter(iter_, dtype=dtype, shape=shape, urlpath=urlpath['a'])
67+
# b = blosc2.fromiter(iter_, dtype=dtype, shape=shape, urlpath=urlpath['b'])
68+
t = time() - t0
69+
#print(f"Time to create data: {t:.5f} s - {a.schunk.nbytes/t / 1e9:.2f} GB/s")
70+
else:
71+
raise ValueError("Invalid distribution type")
72+
73+
t0 = time()
74+
c = blosc2.lazyexpr(expr).compute(urlpath=urlpath['c'])
75+
t = time() - t0
76+
ws_sizes.append((a.schunk.nbytes + b.schunk.nbytes + c.schunk.nbytes) / 2**30)
77+
speed = ws_sizes[-1] / t
78+
print(f"Time to compute a - b: {t:.5f} s -- {speed:.2f} GB/s -- cratio: {c.schunk.cratio:.1f}x")
79+
#print(f"result: {c[()]}")
80+
btimes.append(t)
81+
bspeeds.append(speed)
82+
83+
# Evaluate using Numexpr compute engine
84+
ntimes = []
85+
nspeeds = []
86+
for i in range(step, N + step, step):
87+
shape = (i, i)
88+
# shape = (i * i,)
89+
if dist == "constant":
90+
a = np.ones(shape, dtype=dtype)
91+
b = np.full(shape, 2, dtype=dtype)
92+
elif dist == "arange":
93+
a = np.arange(0, i * i, dtype=dtype).reshape(shape)
94+
b = np.arange(i * i, 2 * i * i, dtype=dtype).reshape(shape)
95+
elif dist == "linspace":
96+
a = np.linspace(0, 1, num=i * i, dtype=dtype).reshape(shape)
97+
b = np.linspace(1, 2, num=i * i, dtype=dtype).reshape(shape)
98+
elif dist == "random":
99+
a = np.random.random(shape)
100+
b = np.random.random(shape)
101+
else:
102+
raise ValueError("Invalid distribution type")
103+
104+
t0 = time()
105+
c = ne.evaluate(expr)
106+
t = time() - t0
107+
ws_size = (a.nbytes + b.nbytes + c.nbytes) / 2**30
108+
speed = ws_size / t
109+
print(f"Time to compute with Numexpr: {t:.5f} s - {speed:.2f} GB/s")
110+
#print(f"result: {c}")
111+
ntimes.append(t)
112+
nspeeds.append(speed)
113+
114+
# Plot
115+
import matplotlib.pyplot as plt
116+
import matplotlib.ticker as ticker
117+
import seaborn as sns
118+
119+
sns.set_theme(style="whitegrid")
120+
plt.figure(figsize=(10, 6))
121+
plt.plot(ws_sizes, bspeeds, label="Blosc2", marker='o')
122+
plt.plot(ws_sizes, nspeeds, label="Numexpr", marker='o')
123+
# Set y-axis to start from 0
124+
plt.ylim(bottom=0)
125+
plt.xlabel("Working set (GB)")
126+
#plt.ylabel("Time (s)")
127+
plt.ylabel("Speed (GB/s)")
128+
plt.title(f"Blosc2 vs Numexpr performance -- {dist} distribution")
129+
plt.legend()
130+
#plt.gca().xaxis.set_major_locator(ticker.MaxNLocator(integer=True))
131+
#plt.gca().yaxis.set_major_formatter(ticker.FuncFormatter(lambda x, _: f'{x:.2f}'))
132+
plt.grid()
133+
plt.show()
134+
# Save the figure
135+
plt.savefig("blosc2_vs_numexpr.png", dpi=300, bbox_inches='tight')
136+
plt.close()

bench/ndarray/compute_dists2.py

Lines changed: 131 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,131 @@
1+
#######################################################################
2+
# Copyright (c) 2019-present, Blosc Development Team <[email protected]>
3+
# All rights reserved.
4+
#
5+
# This source code is licensed under a BSD-style license (found in the
6+
# LICENSE file in the root directory of this source tree)
7+
#######################################################################
8+
9+
# Benchmark for comparing compute speeds of Blosc2 and Numexpr.
10+
# This version compares across different distributions of data:
11+
# constant, arange, linspace, or random
12+
# The expression can be any valid Numexpr expression.
13+
14+
import blosc2
15+
from time import time
16+
import numpy as np
17+
import numexpr as ne
18+
import matplotlib.pyplot as plt
19+
import seaborn as sns
20+
21+
# Bench params
22+
N = 10_000
23+
step = 3000
24+
dtype = np.dtype(np.float64)
25+
persistent = False
26+
distributions = ["constant", "arange", "linspace", "random"]
27+
expr = "(a - b)"
28+
#expr = "sum(a - b)"
29+
#expr = "cos(a)**2 + sin(b)**2 - 1"
30+
#expr = "sum(cos(a)**2 + sin(b)**2 - 1)"
31+
32+
# Set default compression params
33+
cparams = blosc2.CParams(clevel=1, codec=blosc2.Codec.BLOSCLZ)
34+
blosc2.cparams_dflts["codec"] = cparams.codec
35+
blosc2.cparams_dflts["clevel"] = cparams.clevel
36+
# Set default storage params
37+
storage = blosc2.Storage(contiguous=True, mode="w")
38+
blosc2.storage_dflts["contiguous"] = storage.contiguous
39+
blosc2.storage_dflts["mode"] = storage.mode
40+
41+
# Create dictionaries to store results for each distribution
42+
blosc2_speeds = {dist: [] for dist in distributions}
43+
numexpr_speeds = {dist: [] for dist in distributions}
44+
ws_sizes = []
45+
46+
# Generate working set sizes once
47+
sizes = list(range(step, N + step, step))
48+
for i in sizes:
49+
ws_sizes.append((i * i * 3 * np.dtype(dtype).itemsize) / 2**30) # Approximate size in GB
50+
51+
# Loop through different distributions for benchmarking
52+
for dist in distributions:
53+
print(f"\nBenchmarking {dist} distribution...")
54+
55+
# Evaluate using Blosc2
56+
for i in sizes:
57+
shape = (i, i)
58+
urlpath = {name: None for name in ("a", "b", "c")}
59+
60+
if dist == "constant":
61+
a = blosc2.ones(shape, dtype=dtype, urlpath=urlpath['a'])
62+
b = blosc2.full(shape, 2, dtype=dtype, urlpath=urlpath['b'])
63+
elif dist == "arange":
64+
a = blosc2.arange(0, i * i, dtype=dtype, shape=shape, urlpath=urlpath['a'])
65+
b = blosc2.arange(i * i, 2* i * i, dtype=dtype, shape=shape, urlpath=urlpath['b'])
66+
elif dist == "linspace":
67+
a = blosc2.linspace(0, 1, dtype=dtype, shape=shape, urlpath=urlpath['a'])
68+
b = blosc2.linspace(1, 2, dtype=dtype, shape=shape, urlpath=urlpath['b'])
69+
elif dist == "random":
70+
_ = np.random.random(shape)
71+
a = blosc2.fromiter(np.nditer(_), dtype=dtype, shape=shape, urlpath=urlpath['a'])
72+
# b = a.copy(urlpath=urlpath['b']) # faster, but output is not random
73+
_ = np.random.random(shape)
74+
b = blosc2.fromiter(np.nditer(_), dtype=dtype, shape=shape, urlpath=urlpath['b'])
75+
76+
t0 = time()
77+
c = blosc2.lazyexpr(expr).compute(urlpath=urlpath['c'])
78+
t = time() - t0
79+
speed = (a.schunk.nbytes + b.schunk.nbytes + c.schunk.nbytes) / 2**30 / t
80+
print(f"Blosc2 - {dist} - Size {i}x{i}: {speed:.2f} GB/s - cratio: {c.schunk.cratio:.1f}x")
81+
blosc2_speeds[dist].append(speed)
82+
83+
# Evaluate using Numexpr
84+
for i in sizes:
85+
shape = (i, i)
86+
87+
if dist == "constant":
88+
a = np.ones(shape, dtype=dtype)
89+
b = np.full(shape, 2, dtype=dtype)
90+
elif dist == "arange":
91+
a = np.arange(0, i * i, dtype=dtype).reshape(shape)
92+
b = np.arange(i * i, 2 * i * i, dtype=dtype).reshape(shape)
93+
elif dist == "linspace":
94+
a = np.linspace(0, 1, num=i * i, dtype=dtype).reshape(shape)
95+
b = np.linspace(1, 2, num=i * i, dtype=dtype).reshape(shape)
96+
elif dist == "random":
97+
a = np.random.random(shape)
98+
b = np.random.random(shape)
99+
100+
t0 = time()
101+
c = ne.evaluate(expr)
102+
t = time() - t0
103+
speed = (a.nbytes + b.nbytes + c.nbytes) / 2**30 / t
104+
print(f"Numexpr - {dist} - Size {i}x{i}: {speed:.2f} GB/s")
105+
numexpr_speeds[dist].append(speed)
106+
107+
# Create a figure with four subplots (2x2 grid)
108+
sns.set_theme(style="whitegrid")
109+
fig, axes = plt.subplots(2, 2, figsize=(14, 10), sharex=True)
110+
111+
# Flatten axes for easier iteration
112+
axes = axes.flatten()
113+
114+
# Plot each distribution in its own subplot
115+
for i, dist in enumerate(distributions):
116+
axes[i].plot(ws_sizes, blosc2_speeds[dist], marker='o', linestyle='-', label="Blosc2")
117+
axes[i].plot(ws_sizes, numexpr_speeds[dist], marker='s', linestyle='--', label="Numexpr")
118+
axes[i].set_title(f"{dist.capitalize()} Distribution")
119+
axes[i].set_ylabel("Speed (GB/s)")
120+
axes[i].grid(True)
121+
axes[i].legend()
122+
if i >= 2: # Add x-label only to bottom subplots
123+
axes[i].set_xlabel("Working set size (GB)")
124+
125+
# Add a shared title
126+
fig.suptitle(f"Blosc2 vs Numexpr Performance Across Different Data Distributions ({expr=})", fontsize=16)
127+
plt.tight_layout(rect=[0, 0, 1, 0.96]) # Adjust the rect parameter to make room for the suptitle
128+
129+
# Save the unified plot with subplots
130+
plt.savefig("blosc2_vs_numexpr_subplots.png", dpi=300, bbox_inches='tight')
131+
plt.show()

bench/ndarray/matmul.ipynb

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -189,7 +189,7 @@
189189
"source": [
190190
"**Key observations:**\n",
191191
"- Automatic chunking can optimize performance for smaller matrix sizes.\n",
192-
"- Choosing square chunks of 1000x1000 can achive the best performance for matrices of sizes greater than 2000x2000.\n",
192+
"- Choosing square chunks of 1000x1000 can achieve the best performance for matrices of sizes greater than 2000x2000.\n",
193193
"\n",
194194
"**Next experiment:**\n",
195195
"We will increment the chunks' size, as we have seen that better performance can be achieved with bigger chunks."
@@ -294,7 +294,7 @@
294294
"**Key observations:**\n",
295295
"- The best performance is achieved for the biggest chunk size.\n",
296296
"- The larger the chunk size, the higher the bandwidth.\n",
297-
"- If the chunk size is choosen automatically, the performance is better than choosing any other chunk size. This is weird, because if choosen automatically, chunks of size 1000x1000 are choosen, which is the same size as the fixed chunks.\n",
297+
"- If the chunk size is chosen automatically, the performance is better than choosing any other chunk size. This is weird, because if chosen automatically, chunks of size 1000x1000 are chosen, which is the same size as the fixed chunks.\n",
298298
"\n",
299299
"**Next experiment:**\n",
300300
"We will increment the chunks' size again, as we have seen that better performance can be achieved with bigger chunks."
@@ -304,7 +304,7 @@
304304
"cell_type": "markdown",
305305
"metadata": {},
306306
"source": [
307-
"Presicion simple"
307+
"Precision simple"
308308
]
309309
},
310310
{
@@ -517,7 +517,7 @@
517517
"\n",
518518
"**Next experiment:**\n",
519519
"We are going to try with the same sizes for matrices and a square chunk size of 6000 to see if it improves the performance for that last matrix size.\n",
520-
"We will also remove chunk sizes of 1000 and 2000, and add a chunk size wich will be the same size as the matrix."
520+
"We will also remove chunk sizes of 1000 and 2000, and add a chunk size which will be the same size as the matrix."
521521
]
522522
},
523523
{

doc/python-blosc2.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66

77
<p style="text-align: center; color: black; background-color: rgba(230, 169, 9, 0.65);">
88
<a href="https://github.com/Blosc/python-blosc2/blob/main/RELEASE_NOTES.md"
9-
style="font-size: 1.5em;">Version 3.3.0 released on 2025-04-08!</a>
9+
style="font-size: 1.5em;">Version 3.3.1 released on 2025-04-20!</a>
1010
<span style="display: inline-block; width: 20px;"></span>
1111
<span style="font-family: monospace;">pip install blosc2 -U</span>
1212
</p>

0 commit comments

Comments
 (0)