Skip to content

Commit 3e4ea86

Browse files
author
Luke Shaw
committed
Merge branch 'main' of github.com:Blosc/python-blosc2 into fancyIndex
2 parents 6c2371b + 1a176c5 commit 3e4ea86

25 files changed

+1348
-454
lines changed

.github/workflows/build.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ jobs:
2020
python-version: ["3.12"]
2121

2222
steps:
23-
- uses: actions/checkout@v4
23+
- uses: actions/checkout@v5
2424

2525
- name: Set up Python ${{ matrix.python-version }}
2626
uses: actions/setup-python@v5

.github/workflows/cibuildwheels.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -60,7 +60,7 @@ jobs:
6060
artifact_name: "macos-universal2"
6161
steps:
6262
- name: Checkout repo
63-
uses: actions/checkout@v4
63+
uses: actions/checkout@v5
6464

6565
- name: Set up Python
6666
uses: actions/setup-python@v5
@@ -124,7 +124,7 @@ jobs:
124124
# Only upload wheels when tagging (typically a release)
125125
if: startsWith(github.event.ref, 'refs/tags')
126126
steps:
127-
- uses: actions/download-artifact@v4
127+
- uses: actions/download-artifact@v5
128128
with:
129129
path: ./wheelhouse
130130
merge-multiple: true # Merge all the wheels artifacts into one directory

.github/workflows/wasm.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,7 @@ jobs:
3030

3131
steps:
3232
- name: Checkout repo
33-
uses: actions/checkout@v4
33+
uses: actions/checkout@v5
3434

3535
- name: Set up Python
3636
uses: actions/setup-python@v5

ANNOUNCE.rst

Lines changed: 8 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1,19 +1,15 @@
1-
Announcing Python-Blosc2 3.7.0
1+
Announcing Python-Blosc2 3.7.2
22
==============================
33

4-
In this release:
4+
This is a maintenance release where:
55

6-
✅ Overhaul of documentation (API reference and Tutorials)
7-
✅ Improvements to lazy expression indexing and in particular much more efficient
8-
memory usage when applying non-unit steps
9-
✅ Extended functionality of ``expand_dims`` to match that of NumPy
10-
✅ 3(!) new data storage classes (``EmbedStore``, ``DictStore`` and ``TreeStore``)
11-
which allow for the efficient storage of heterogeneous array data
6+
✅ We have updated the Blosc2 C library to 2.21.1, which fixes a regression
7+
in the build system detected in Fedora and Gentoo.
8+
✅ We reverted signature of ``TreeStore.__init__(()`` for making benchmarks
9+
to get back to normal performance.
1210

13-
See [here](https://github.com/Blosc/python-blosc2/pull/451#issuecomment-3178828765)
14-
for plots for the new data storage classes. And
15-
[here](https://github.com/Blosc/python-blosc2/pull/446#issuecomment-3167060686) for the improved performance
16-
of lazy expression slicing.
11+
Check our new blog post about ``TreeStore`` usage and performance at:
12+
https://www.blosc.org/posts/new-treestore-blosc2
1713

1814
You can think of Python-Blosc2 3.x as an extension of NumPy/numexpr that:
1915

CMakeLists.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -50,7 +50,7 @@ else()
5050
include(FetchContent)
5151
FetchContent_Declare(blosc2
5252
GIT_REPOSITORY https://github.com/Blosc/c-blosc2
53-
GIT_TAG d75993535461aaf2ded996f0a625cbec8df9655c # v2.20.0
53+
GIT_TAG 96bce728dcdbf41fd86f142ebef9e513f87a7afb # v2.21.1
5454
)
5555
FetchContent_MakeAvailable(blosc2)
5656
include_directories("${blosc2_SOURCE_DIR}/include")

RELEASE_NOTES.md

Lines changed: 21 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,28 @@
11
# Release notes
2-
## Changes from 3.7.0 to 3.7.1
2+
3+
## Changes from 3.7.2 to 3.7.3
34

45
XXX version-specific blurb XXX
56

7+
## Changes from 3.7.1 to 3.7.2
8+
9+
* C-Blosc2 internal library updated to latest 2.21.1.
10+
11+
* Revert signature of `TreeStore.__init__` for making benchmarks to get back
12+
to normal performance.
13+
14+
## Changes from 3.7.0 to 3.7.1
15+
16+
* Added `C2Array.slice()` method and `C2Array.nbytes`, `C2Array.cbytes`, `C2Array.cratio`, `C2Array.vlmeta` and `C2Array.info` properties (PR #455).
17+
18+
* Many usability improvements to the `TreeStore` class and friends.
19+
20+
* New section about `TreeStore` in basics NDArray tutorial.
21+
22+
* New blog post about `TreeStore` usage and performance at: https://www.blosc.org/posts/new-treestore-blosc2
23+
24+
* C-Blosc2 internal library updated to latest 2.21.0.
25+
626
## Changes from 3.6.1 to 3.7.0
727

828
* Overhaul of documentation (API reference and Tutorials)

ROADMAP-TO-4.0.md

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
List of desired features for a 4.0 release
2+
------------------------------------------
3+
4+
* First and foremost, we would like to have at least of basic implementation of the [array API](https://data-apis.org/array-api). This will require a lot of low-level work on the basic NDArray container to make indexing to work as close as the standard.
5+
6+
* Have a completely specified format for the `TreeStore` and `DictStore`. The format should allow to have containers either in memory or on disk. Also, it should allow a sparse or contiguous storage. The user will be able to specify these properties by following the same conventions than for NDArray objects (alas, `urlpath` and `contiguous` params).
7+
8+
* New `.save()` and `.to_cframe()` methods should be implemented to convert from in-memory representations to on disk and viceversa.
9+
* The format for `TreeStore` and `DictStore` will initially be defined at Python level, and documented only in the Python-Blosc2 repository. An implementation in the C library is desirable, but not mandatory at this time.
10+
11+
* A new `Table` object should be implemented based on the `TreeStore` class (a subclass?), with a label ('table'?) in metalayers indicating that the contents of the tree can be interpreted as regular table. As `TreeStore` is hierarchical, a subtree can also be interpreted as a `Table` if there a label in the metalayer of the subtree (or group in HDF5 parlance); that can lead to tables than can have different subtables embedded. It is not clear yet if should impose the same number of rows for all the columns.
12+
13+
The constructor for the `Table` object should take some parameters to specify properties:
14+
15+
* `columnar`: True or False. If True, every column will be stored in a different NDArray object. If False, the columns will be stored in the same NDArray object, with a compound dtype. In principle, one should be able to create tables that are hybrid between column and row wise, but at this point it is not clear what is the best way to do that.
16+
17+
`Table` should support at least these methods:
18+
19+
* `.__getitem__()` and `.__setitem__()` so that values can be get and set.
20+
* `.append()` for appending (multi-) rows of data for all columns in one go.
21+
* `.__iter__()` for easy and fast iteration over rows.
22+
* `.where()`: an iterator for queying with conditions that are evaluated with the internal compute engine.
23+
* `.index()` for indexing a column and getting better performance in queries (desirable, but optional for 4.0).

bench/b2zip-linspace.py

Lines changed: 57 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,57 @@
1+
#######################################################################
2+
# Copyright (c) 2019-present, Blosc Development Team <[email protected]>
3+
# All rights reserved.
4+
#
5+
# This source code is licensed under a BSD-style license (found in the
6+
# LICENSE file in the root directory of this source tree)
7+
#######################################################################
8+
9+
# This compares performance of creating and reading a NumPy array in different ways:
10+
# 1) memory
11+
# 2) disk
12+
# 3) disk with b2zip format
13+
14+
import blosc2
15+
16+
from time import time
17+
18+
# Number of elements in array
19+
N = 2**27
20+
21+
def b2_native(urlpath=None):
22+
t0 = time()
23+
a = blosc2.linspace(0., 1., N, urlpath=urlpath, mode="w")
24+
# a = blosc2.linspace(0., 1., 2**27, cparams=blosc2.CParams(codec=blosc2.Codec.LZ4))
25+
# a = blosc2.linspace(0., 1., 2**27, dparams=blosc2.DParams(nthreads=1))
26+
t1 = time()
27+
print(f"Time to create a linspace array: {t1 - t0:.2f}s, bandwidth: {a.nbytes / (t1 - t0) / 1e9:.2f} GB/s")
28+
#print(a.info)
29+
30+
t0 = time()
31+
b = a[:]
32+
t1 = time()
33+
print(f"Time to read the array: {t1 - t0:.2f}s, bandwidth: {b.nbytes / (t1 - t0) / 1e9:.2f} GB/s")
34+
35+
def b2_b2zip(urlpath):
36+
t0 = time()
37+
with blosc2.TreeStore(localpath=urlpath, mode="w") as tstore:
38+
a = blosc2.linspace(0., 1., N)
39+
# a = blosc2.linspace(0., 1., 2**27, cparams=blosc2.CParams(codec=blosc2.Codec.LZ4))
40+
tstore["/b"] = a
41+
t1 = time()
42+
print(f"Time to store a linspace array: {t1 - t0:.2f}s, bandwidth: {a.nbytes / (t1 - t0) / 1e9:.2f} GB/s")
43+
44+
t0 = time()
45+
with blosc2.TreeStore(localpath=urlpath, mode="r") as tstore_read:
46+
b = tstore_read["/b"][:]
47+
t1 = time()
48+
print(f"Time to read the array: {t1 - t0:.2f}s, bandwidth: {b.nbytes / (t1 - t0) / 1e9:.2f} GB/s")
49+
50+
51+
if __name__ == "__main__":
52+
print("Blosc2 in-memory")
53+
b2_native()
54+
print("Blosc2 on disk")
55+
b2_native("linspace.b2nd")
56+
print("Blosc2 on disk with b2zip format")
57+
b2_b2zip("my_tstore.b2z")

0 commit comments

Comments
 (0)