Blosc
diff --git a/‎.github/workflows/build.yml
Lines changed: 9 additions & 1 deletion b/‎.github/workflows/build.yml
Lines changed: 9 additions & 1 deletion
diff --git a/‎.github/workflows/cibuildwheels.yml
Lines changed: 2 additions & 2 deletions b/‎.github/workflows/cibuildwheels.yml
Lines changed: 2 additions & 2 deletions
diff --git a/‎ANNOUNCE.rst
Lines changed: 10 additions & 6 deletions b/‎ANNOUNCE.rst
Lines changed: 10 additions & 6 deletions
diff --git a/‎RELEASE_NOTES.md
Lines changed: 23 additions & 1 deletion b/‎RELEASE_NOTES.md
Lines changed: 23 additions & 1 deletion
diff --git a/‎bench/ndarray/compute_dists.py
Lines changed: 136 additions & 0 deletions b/‎bench/ndarray/compute_dists.py
Lines changed: 136 additions & 0 deletions
diff --git a/‎bench/ndarray/compute_dists2.py
Lines changed: 131 additions & 0 deletions b/‎bench/ndarray/compute_dists2.py
Lines changed: 131 additions & 0 deletions
diff --git a/‎bench/ndarray/matmul.ipynb
Lines changed: 4 additions & 4 deletions b/‎bench/ndarray/matmul.ipynb
Lines changed: 4 additions & 4 deletions
diff --git a/‎doc/python-blosc2.rst
Lines changed: 1 addition & 1 deletion b/‎doc/python-blosc2.rst
Lines changed: 1 addition & 1 deletion
@@ -1,6 +1,14 @@
 name: Tests
 
-on: [push]
+on:
+  # Trigger the workflow on push or pull request,
+  # but only for the main branch
+  push:
+    branches:
+      - '**'  # this matches all branches
+  pull_request:
+    branches:
+      - main
 
 jobs:
   build_wheels:
 
@@ -1,7 +1,7 @@
 name: Python wheels
+
 on:
-  # Trigger the workflow on push or pull request,
-  # but only for the main branch
+  # Trigger the workflow only for tags and PRs to the main branch
   push:
     tags:
       - '*'
 
@@ -1,12 +1,16 @@
-Announcing Python-Blosc2 3.3.0
+Announcing Python-Blosc2 3.3.1
 ==============================
 
-We are introducing a new blosc2.transpose() function for natively transposing
-2D NDArray instances, and a fast path for NDArray.slice() that delivers up to
-40x speedup when slices align with underlying chunks. Documentation has also
-been improved with several edits throughout.
+In our effort to better adapt to better adapt to the array API
+(https://data-apis.org/array-api/latest/), we have introduced
+permute_dims() and matrix_transpose() functions, and the .T property.
+This replaces to previous transpose() function, which is now deprecated.
+See PR #384.  Thanks to Ricardo Sales Piquer (@ricardosp4).
 
-See benchmarks at: https://github.com/Blosc/python-blosc2/blob/main/bench/ndarray/aligned_chunks.py
+We have also reduced the memory footprint of constructors like ``arange()``,
+``linspace()`` and ``fromiter()`` by a large factor. As an example, a 5 TB
+array of 8-byte floats now uses less than 200 MB of memory instead of
+170 GB previously.
 
 You can think of Python-Blosc2 3.x as an extension of NumPy/numexpr that:
 
 
@@ -1,9 +1,31 @@
 # Release notes
 
-## Changes from 3.3.0 to 3.3.1
+## Changes from 3.3.1 to 3.3.2
 
 XXX version-specific blurb XXX
 
+
+## Changes from 3.3.0 to 3.3.1
+
+* In our effort to better adapt to better adapt to the array API
+  (https://data-apis.org/array-api/latest/), we have introduced
+  permute_dims() and matrix_transpose() functions, and the .T property.
+  This replaces to previous transpose() function, which is now deprecated.
+  See PR #384.  Thanks to Ricardo Sales Piquer (@ricardosp4).
+
+* Constructors like `arange()`, `linspace()` and `fromiter()` now
+  use far less memory when creating large arrays. As an example, a 5 TB
+  array of 8-byte floats now uses less than 200 MB of memory instead of
+  170 GB previously.  See PR #387.
+
+* Now, when opening a lazy expression with `blosc2.open()`, and there is
+  a missing operand, the open still works, but the dtype and shape
+  attributes are None.  This is useful for lazy expressions that have
+  lost some operands, but you still want to open them for inspection.
+  See PR #385.
+
+* Added an example of getting a slice out of a C2Array.
+
 ## Changes from 3.2.1 to 3.3.0
 
 * New `blosc2.transpose()` function for transposing 2D NDArray instances
 
@@ -0,0 +1,136 @@
+#######################################################################
+# Copyright (c) 2019-present, Blosc Development Team <[email protected]>
+# All rights reserved.
+#
+# This source code is licensed under a BSD-style license (found in the
+# LICENSE file in the root directory of this source tree)
+#######################################################################
+
+# Benchmark for comparing compute speeds of Blosc2 and Numexpr.
+# One can use different distributions of data:
+# constant, arange, linspace, or random
+# The expression can be any valid Numexpr expression.
+
+import blosc2
+from time import time
+import numpy as np
+import numexpr as ne
+
+# Bench params
+N = 30_000
+step = 3000
+dtype = np.dtype(np.float64)
+persistent = False
+dist = "constant"  # "arange" or "linspace" or "constant" or "random"
+expr = "(a - b)"
+#expr = "sum(a - b)"
+#expr = "cos(a)**2 + sin(b)**2 - 1"
+#expr = "sum(cos(a)**2 + sin(b)**2 - 1)"
+
+# Set default compression params
+cparams = blosc2.CParams(clevel=1, codec=blosc2.Codec.BLOSCLZ)
+blosc2.cparams_dflts["codec"] = cparams.codec
+blosc2.cparams_dflts["clevel"] = cparams.clevel
+# Set default storage params
+storage = blosc2.Storage(contiguous=True, mode="w")
+blosc2.storage_dflts["contiguous"] = storage.contiguous
+blosc2.storage_dflts["mode"] = storage.mode
+
+urlpath = dict((aname, None) for aname in ("a", "b", "c"))
+if persistent:
+    urlpath = dict((aname, f"{aname}.b2nd") for aname in ("a", "b", "c"))
+
+btimes = []
+bspeeds = []
+ws_sizes = []
+rng = np.random.default_rng()
+for i in range(step, N + step, step):
+    shape = (i, i)
+    # shape = (i * i,)
+    if dist == "constant":
+        a = blosc2.ones(shape, dtype=dtype, urlpath=urlpath['a'])
+        b = blosc2.full(shape, 2, dtype=dtype, urlpath=urlpath['b'])
+    elif dist == "arange":
+        a = blosc2.arange(0, i * i, dtype=dtype, shape=shape, urlpath=urlpath['a'])
+        b = blosc2.arange(i * i, 2* i * i, dtype=dtype, shape=shape, urlpath=urlpath['b'])
+    elif dist == "linspace":
+        a = blosc2.linspace(0, 1, dtype=dtype, shape=shape, urlpath=urlpath['a'])
+        b = blosc2.linspace(1, 2, dtype=dtype, shape=shape, urlpath=urlpath['b'])
+    elif dist == "random":
+        t0 = time()
+        _ = np.random.random(shape)
+        a = blosc2.fromiter(np.nditer(_), dtype=dtype, shape=shape, urlpath=urlpath['a'])
+        b = a.copy(urlpath=urlpath['b'])
+        # This uses less memory, but it is 2x-3x slower
+        # iter_ = (rng.random() for _ in range(i**2 * 2))
+        # a = blosc2.fromiter(iter_, dtype=dtype, shape=shape, urlpath=urlpath['a'])
+        # b = blosc2.fromiter(iter_, dtype=dtype, shape=shape, urlpath=urlpath['b'])
+        t = time() - t0
+        #print(f"Time to create data: {t:.5f} s - {a.schunk.nbytes/t / 1e9:.2f} GB/s")
+    else:
+        raise ValueError("Invalid distribution type")
+
+    t0 = time()
+    c = blosc2.lazyexpr(expr).compute(urlpath=urlpath['c'])
+    t = time() - t0
+    ws_sizes.append((a.schunk.nbytes + b.schunk.nbytes + c.schunk.nbytes) / 2**30)
+    speed = ws_sizes[-1] / t
+    print(f"Time to compute a - b: {t:.5f} s -- {speed:.2f} GB/s -- cratio: {c.schunk.cratio:.1f}x")
+    #print(f"result: {c[()]}")
+    btimes.append(t)
+    bspeeds.append(speed)
+
+# Evaluate using Numexpr compute engine
+ntimes = []
+nspeeds = []
+for i in range(step, N + step, step):
+    shape = (i, i)
+    # shape = (i * i,)
+    if dist == "constant":
+        a = np.ones(shape, dtype=dtype)
+        b = np.full(shape, 2, dtype=dtype)
+    elif dist == "arange":
+        a = np.arange(0, i * i, dtype=dtype).reshape(shape)
+        b = np.arange(i * i, 2 * i * i, dtype=dtype).reshape(shape)
+    elif dist == "linspace":
+        a = np.linspace(0, 1, num=i * i, dtype=dtype).reshape(shape)
+        b = np.linspace(1, 2, num=i * i, dtype=dtype).reshape(shape)
+    elif dist == "random":
+        a = np.random.random(shape)
+        b = np.random.random(shape)
+    else:
+        raise ValueError("Invalid distribution type")
+
+    t0 = time()
+    c = ne.evaluate(expr)
+    t = time() - t0
+    ws_size = (a.nbytes + b.nbytes + c.nbytes) / 2**30
+    speed = ws_size / t
+    print(f"Time to compute with Numexpr: {t:.5f} s - {speed:.2f} GB/s")
+    #print(f"result: {c}")
+    ntimes.append(t)
+    nspeeds.append(speed)
+
+# Plot
+import matplotlib.pyplot as plt
+import matplotlib.ticker as ticker
+import seaborn as sns
+
+sns.set_theme(style="whitegrid")
+plt.figure(figsize=(10, 6))
+plt.plot(ws_sizes, bspeeds, label="Blosc2", marker='o')
+plt.plot(ws_sizes, nspeeds, label="Numexpr", marker='o')
+# Set y-axis to start from 0
+plt.ylim(bottom=0)
+plt.xlabel("Working set (GB)")
+#plt.ylabel("Time (s)")
+plt.ylabel("Speed (GB/s)")
+plt.title(f"Blosc2 vs Numexpr performance -- {dist} distribution")
+plt.legend()
+#plt.gca().xaxis.set_major_locator(ticker.MaxNLocator(integer=True))
+#plt.gca().yaxis.set_major_formatter(ticker.FuncFormatter(lambda x, _: f'{x:.2f}'))
+plt.grid()
+plt.show()
+# Save the figure
+plt.savefig("blosc2_vs_numexpr.png", dpi=300, bbox_inches='tight')
+plt.close()
@@ -0,0 +1,131 @@
+#######################################################################
+# Copyright (c) 2019-present, Blosc Development Team <[email protected]>
+# All rights reserved.
+#
+# This source code is licensed under a BSD-style license (found in the
+# LICENSE file in the root directory of this source tree)
+#######################################################################
+
+# Benchmark for comparing compute speeds of Blosc2 and Numexpr.
+# This version compares across different distributions of data:
+# constant, arange, linspace, or random
+# The expression can be any valid Numexpr expression.
+
+import blosc2
+from time import time
+import numpy as np
+import numexpr as ne
+import matplotlib.pyplot as plt
+import seaborn as sns
+
+# Bench params
+N = 10_000
+step = 3000
+dtype = np.dtype(np.float64)
+persistent = False
+distributions = ["constant", "arange", "linspace", "random"]
+expr = "(a - b)"
+#expr = "sum(a - b)"
+#expr = "cos(a)**2 + sin(b)**2 - 1"
+#expr = "sum(cos(a)**2 + sin(b)**2 - 1)"
+
+# Set default compression params
+cparams = blosc2.CParams(clevel=1, codec=blosc2.Codec.BLOSCLZ)
+blosc2.cparams_dflts["codec"] = cparams.codec
+blosc2.cparams_dflts["clevel"] = cparams.clevel
+# Set default storage params
+storage = blosc2.Storage(contiguous=True, mode="w")
+blosc2.storage_dflts["contiguous"] = storage.contiguous
+blosc2.storage_dflts["mode"] = storage.mode
+
+# Create dictionaries to store results for each distribution
+blosc2_speeds = {dist: [] for dist in distributions}
+numexpr_speeds = {dist: [] for dist in distributions}
+ws_sizes = []
+
+# Generate working set sizes once
+sizes = list(range(step, N + step, step))
+for i in sizes:
+    ws_sizes.append((i * i * 3 * np.dtype(dtype).itemsize) / 2**30)  # Approximate size in GB
+
+# Loop through different distributions for benchmarking
+for dist in distributions:
+    print(f"\nBenchmarking {dist} distribution...")
+
+    # Evaluate using Blosc2
+    for i in sizes:
+        shape = (i, i)
+        urlpath = {name: None for name in ("a", "b", "c")}
+
+        if dist == "constant":
+            a = blosc2.ones(shape, dtype=dtype, urlpath=urlpath['a'])
+            b = blosc2.full(shape, 2, dtype=dtype, urlpath=urlpath['b'])
+        elif dist == "arange":
+            a = blosc2.arange(0, i * i, dtype=dtype, shape=shape, urlpath=urlpath['a'])
+            b = blosc2.arange(i * i, 2* i * i, dtype=dtype, shape=shape, urlpath=urlpath['b'])
+        elif dist == "linspace":
+            a = blosc2.linspace(0, 1, dtype=dtype, shape=shape, urlpath=urlpath['a'])
+            b = blosc2.linspace(1, 2, dtype=dtype, shape=shape, urlpath=urlpath['b'])
+        elif dist == "random":
+            _ = np.random.random(shape)
+            a = blosc2.fromiter(np.nditer(_), dtype=dtype, shape=shape, urlpath=urlpath['a'])
+            # b = a.copy(urlpath=urlpath['b'])  # faster, but output is not random
+            _ = np.random.random(shape)
+            b = blosc2.fromiter(np.nditer(_), dtype=dtype, shape=shape, urlpath=urlpath['b'])
+
+        t0 = time()
+        c = blosc2.lazyexpr(expr).compute(urlpath=urlpath['c'])
+        t = time() - t0
+        speed = (a.schunk.nbytes + b.schunk.nbytes + c.schunk.nbytes) / 2**30 / t
+        print(f"Blosc2 - {dist} - Size {i}x{i}: {speed:.2f} GB/s - cratio: {c.schunk.cratio:.1f}x")
+        blosc2_speeds[dist].append(speed)
+
+    # Evaluate using Numexpr
+    for i in sizes:
+        shape = (i, i)
+
+        if dist == "constant":
+            a = np.ones(shape, dtype=dtype)
+            b = np.full(shape, 2, dtype=dtype)
+        elif dist == "arange":
+            a = np.arange(0, i * i, dtype=dtype).reshape(shape)
+            b = np.arange(i * i, 2 * i * i, dtype=dtype).reshape(shape)
+        elif dist == "linspace":
+            a = np.linspace(0, 1, num=i * i, dtype=dtype).reshape(shape)
+            b = np.linspace(1, 2, num=i * i, dtype=dtype).reshape(shape)
+        elif dist == "random":
+            a = np.random.random(shape)
+            b = np.random.random(shape)
+
+        t0 = time()
+        c = ne.evaluate(expr)
+        t = time() - t0
+        speed = (a.nbytes + b.nbytes + c.nbytes) / 2**30 / t
+        print(f"Numexpr - {dist} - Size {i}x{i}: {speed:.2f} GB/s")
+        numexpr_speeds[dist].append(speed)
+
+# Create a figure with four subplots (2x2 grid)
+sns.set_theme(style="whitegrid")
+fig, axes = plt.subplots(2, 2, figsize=(14, 10), sharex=True)
+
+# Flatten axes for easier iteration
+axes = axes.flatten()
+
+# Plot each distribution in its own subplot
+for i, dist in enumerate(distributions):
+    axes[i].plot(ws_sizes, blosc2_speeds[dist], marker='o', linestyle='-', label="Blosc2")
+    axes[i].plot(ws_sizes, numexpr_speeds[dist], marker='s', linestyle='--', label="Numexpr")
+    axes[i].set_title(f"{dist.capitalize()} Distribution")
+    axes[i].set_ylabel("Speed (GB/s)")
+    axes[i].grid(True)
+    axes[i].legend()
+    if i >= 2:  # Add x-label only to bottom subplots
+        axes[i].set_xlabel("Working set size (GB)")
+
+# Add a shared title
+fig.suptitle(f"Blosc2 vs Numexpr Performance Across Different Data Distributions ({expr=})", fontsize=16)
+plt.tight_layout(rect=[0, 0, 1, 0.96])  # Adjust the rect parameter to make room for the suptitle
+
+# Save the unified plot with subplots
+plt.savefig("blosc2_vs_numexpr_subplots.png", dpi=300, bbox_inches='tight')
+plt.show()
@@ -189,7 +189,7 @@
    "source": [
     "**Key observations:**\n",
     "- Automatic chunking can optimize performance for smaller matrix sizes.\n",
-    "- Choosing square chunks of 1000x1000 can achive the best performance for matrices of sizes greater than 2000x2000.\n",
+    "- Choosing square chunks of 1000x1000 can achieve the best performance for matrices of sizes greater than 2000x2000.\n",
     "\n",
     "**Next experiment:**\n",
     "We will increment the chunks' size, as we have seen that better performance can be achieved with bigger chunks."
@@ -294,7 +294,7 @@
     "**Key observations:**\n",
     "- The best performance is achieved for the biggest chunk size.\n",
     "- The larger the chunk size, the higher the bandwidth.\n",
-    "- If the chunk size is choosen automatically, the performance is better than choosing any other chunk size. This is weird, because if choosen automatically, chunks of size 1000x1000 are choosen, which is the same size as the fixed chunks.\n",
+    "- If the chunk size is chosen automatically, the performance is better than choosing any other chunk size. This is weird, because if chosen automatically, chunks of size 1000x1000 are chosen, which is the same size as the fixed chunks.\n",
     "\n",
     "**Next experiment:**\n",
     "We will increment the chunks' size again, as we have seen that better performance can be achieved with bigger chunks."
@@ -304,7 +304,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Presicion simple"
+    "Precision simple"
    ]
   },
   {
@@ -517,7 +517,7 @@
     "\n",
     "**Next experiment:**\n",
     "We are going to try with the same sizes for matrices and a square chunk size of 6000 to see if it improves the performance for that last matrix size.\n",
-    "We will also remove chunk sizes of 1000 and 2000, and add a chunk size wich will be the same size as the matrix."
+    "We will also remove chunk sizes of 1000 and 2000, and add a chunk size which will be the same size as the matrix."
    ]
   },
   {
 
@@ -6,7 +6,7 @@
 
     <p style="text-align: center; color: black; background-color: rgba(230, 169, 9, 0.65);">
         <a href="https://github.com/Blosc/python-blosc2/blob/main/RELEASE_NOTES.md"
-           style="font-size: 1.5em;">Version 3.3.0 released on 2025-04-08!</a>
+           style="font-size: 1.5em;">Version 3.3.1 released on 2025-04-20!</a>
         <span style="display: inline-block; width: 20px;"></span>
         <span style="font-family: monospace;">pip install blosc2 -U</span>
     </p>