@@ -31,78 +31,30 @@ What it is
3131both the C-Blosc1 API and its in-memory format. Python-Blosc2 is a Python package
3232that wraps C-Blosc2, the newest version of the Blosc compressor.
3333
34- Currently Python-Blosc2 already reproduces the API of
35- `Python-Blosc <https://github.com/Blosc/python-blosc >`_, so it can be
36- used as a drop-in replacement. However, there are a `few exceptions
37- for a full compatibility.
38- <https://github.com/Blosc/python-blosc2/blob/main/RELEASE_NOTES.md#changes-from-python-blosc-to-python-blosc2> `_
34+ Starting with version 3.0.0, Python-Blosc2 is including a powerful computing engine
35+ that can operate on compressed data that can be either in-memory, on-disk or on the network.
36+ This engine also supports advanced features like reductions, filters, user-defined functions
37+ and broadcasting (still in beta). You can read our tutorial on how to use this new feature at:
38+ https://github.com/Blosc/python-blosc2/blob/main/doc/getting_started/tutorials/03.lazyarray-expressions.ipynb and
39+ https://github.com/Blosc/python-blosc2/blob/main/doc/getting_started/tutorials/03.lazyarray-udf.ipynb
3940
4041In addition, Python-Blosc2 aims to leverage the full C-Blosc2 functionality to support
4142super-chunks (`SChunk <https://www.blosc.org/python-blosc2/reference/schunk_api.html >`_),
4243multi-dimensional arrays
4344(`NDArray <https://www.blosc.org/python-blosc2/reference/ndarray_api.html >`_),
4445metadata, serialization and other bells and whistles introduced in C-Blosc2.
4546
46- **Note: ** Python- Blosc2 is meant to be backward compatible with Python- Blosc data.
47- That means that it can read data generated with Python- Blosc, but the opposite
47+ **Note: ** Blosc2 is meant to be backward compatible with Blosc(1) data.
48+ That means that it can read data generated with Blosc, but the opposite
4849is not true (i.e. there is no *forward * compatibility).
4950
50- SChunk: a 64-bit compressed store
51- =================================
52-
53- A `SChunk <https://www.blosc.org/python-blosc2/reference/schunk_api.html >`_ is a simple data
54- container that handles setting, expanding and getting
55- data and metadata. Contrarily to chunks, a super-chunk can update and resize the data
56- that it contains, supports user metadata, and it does not have the 2 GB storage limitation.
57-
58- Additionally, you can convert a SChunk into a contiguous, serialized buffer (aka
59- `cframe <https://github.com/Blosc/c-blosc2/blob/main/README_CFRAME_FORMAT.rst >`_)
60- and vice-versa; as a bonus, the serialization/deserialization process also works with NumPy
61- arrays and PyTorch/TensorFlow tensors at a blazing speed:
62-
63- .. |compress | image :: https://github.com/Blosc/python-blosc2/blob/main/images/linspace-compress.png?raw=true
64- :width: 100%
65- :alt: Compression speed for different codecs
66-
67- .. |decompress | image :: https://github.com/Blosc/python-blosc2/blob/main/images/linspace-decompress.png?raw=true
68- :width: 100%
69- :alt: Decompression speed for different codecs
70-
71- +----------------+---------------+
72- | |compress | | |decompress | |
73- +----------------+---------------+
74-
75- while reaching excellent compression ratios:
76-
77- .. image :: https://github.com/Blosc/python-blosc2/blob/main/images/pack-array-cratios.png?raw=true
78- :width: 75%
79- :align: center
80- :alt: Compression ratio for different codecs
81-
82- Also, if you are a Mac M1/M2 owner, make you a favor and use its native arm64 arch (yes, we are
83- distributing Mac arm64 wheels too; you are welcome ;-):
84-
85- .. |pack_arm | image :: https://github.com/Blosc/python-blosc2/blob/main/images/M1-i386-vs-arm64-pack.png?raw=true
86- :width: 100%
87- :alt: Compression speed for different codecs on Apple M1
88-
89- .. |unpack_arm | image :: https://github.com/Blosc/python-blosc2/blob/main/images/M1-i386-vs-arm64-unpack.png?raw=true
90- :width: 100%
91- :alt: Decompression speed for different codecs on Apple M1
92-
93- +------------+--------------+
94- | |pack_arm | | |unpack_arm | |
95- +------------+--------------+
96-
97- Read more about `SChunk ` features in our blog entry at: https://www.blosc.org/posts/python-blosc2-improvements
98-
9951NDArray: an N-Dimensional store
10052===============================
10153
102- One of the latest and more exciting additions in Python-Blosc2 is the
54+ One of the more useful abstractions in Python-Blosc2 is the
10355`NDArray <https://www.blosc.org/python-blosc2/reference/ndarray_api.html >`_ object.
10456It can write and read n-dimensional datasets in an extremely efficient way thanks
105- to a n-dim 2-level partitioning, allowing to slice and dice arbitrary large and
57+ to a n-dimensional 2-level partitioning, allowing to slice and dice arbitrary large and
10658compressed data in a more fine-grained way:
10759
10860.. image :: https://github.com/Blosc/python-blosc2/blob/main/images/b2nd-2level-parts.png?raw=true
@@ -124,6 +76,68 @@ is useful <https://www.youtube.com/watch?v=LvP9zxMGBng>`_:
12476 :alt: Slicing a dataset in pineapple-style
12577 :target: https://www.youtube.com/watch?v=LvP9zxMGBng
12678
79+ Operating with NDArrays
80+ =======================
81+
82+ The `NDArray ` objects can be operated with very easily inside Python-Blosc2.
83+ Here it is a simple example:
84+
85+ .. code-block :: python
86+
87+ import numpy as np
88+ import blosc2
89+
90+ N = 10_000
91+ na = np.linspace(0 , 1 , N * N, dtype = np.float32).reshape(N, N)
92+ nb = np.linspace(1 , 2 , N * N).reshape(N, N)
93+ nc = np.linspace(- 10 , 10 , N * N).reshape(N, N)
94+
95+ # Convert to blosc2
96+ a = blosc2.asarray(na)
97+ b = blosc2.asarray(nb)
98+ c = blosc2.asarray(nc)
99+
100+ # Expression
101+ expr = ((a ** 3 + blosc2.sin(c * 2 )) < b) & (c > 0 )
102+
103+ # Evaluate and get a NDArray as result
104+ out = expr.eval()
105+ print (out.info)
106+
107+ As you can see, the `NDArray ` instances are very similar to NumPy arrays, but behind the scenes
108+ it holds compressed data that can be operated in a very efficient way with the new computing
109+ engine that is included in Python-Blosc2.
110+
111+ So as to whet your appetite, here it is the performance (with a MacBook Air M2 with 24 GB of RAM)
112+ that you can reach when the operands fit comfortably in-memory:
113+
114+ .. image :: https://github.com/Blosc/python-blosc2/blob/main/images/eval-expr-full-mem-M2.png?raw=true
115+ :width: 100%
116+ :alt: Performance when operands fit in-memory
117+
118+ In this case, performance is a bit far from top-level libraries like Numexpr or Numba, but
119+ it is still pretty nice (and probably using CPUs with more cores than M2 would allow closing the
120+ performance gap even further).
121+
122+ It is important to note that the `NDArray ` object can use memory-mapped files as well, and the
123+ benchmark above is actually using a memory-mapped file as the storage for the operands.
124+ Memory-mapped files are very useful when the operands do not fit in-memory, and the performance
125+ is still very good. Thanks to Jan Sellner for his implementation in Blosc2.
126+
127+ And here it is the performance when the operands do not fit well in-memory:
128+
129+ .. image :: https://github.com/Blosc/python-blosc2/blob/main/images/eval-expr-scarce-mem-M2.png?raw=true
130+ :width: 100%
131+ :alt: Performance when operands do not fit in-memory
132+
133+ In the latter case, the memory consumption lines look a bit crazy, but this is because what
134+ is displayed is the real memory consumption, not the virtual one (so, during the evaluation
135+ the OS has to swap out some memory to disk). In this case, the performance when compared with
136+ top-level libraries like Numexpr or Numba is very competitive.
137+
138+ You can find the benchmark for the above examples at:
139+ https://github.com/Blosc/python-blosc2/blob/main/bench/ndarray/lazyarray-expr.ipynb
140+
127141Installing
128142==========
129143
198212
199213https://groups.google.es/group/blosc
200214
201- Twitter
202- =======
215+ Mastodon
216+ ========
203217
204- Please follow `@Blosc2 <https://twitter.com/Blosc2 >`_ to get informed about the latest developments.
218+ Please follow `@Blosc2 <https://fosstodon.org/@Blosc2 >`_ to get informed about the latest
219+ developments. We lately moved from Twitter to Mastodon.
205220
206221Citing Blosc
207222============
@@ -213,11 +228,11 @@ You can cite our work on the different libraries under the Blosc umbrella as:
213228 @ONLINE{blosc,
214229 author = {{Blosc Development Team}},
215230 title = "{A fast, compressed and persistent data store library}",
216- year = {2009-2023 },
231+ year = {2009-2024 },
217232 note = {https://blosc.org}
218233 }
219234
220235
221236----
222237
223- **Enjoy ! **
238+ **Make compression better ! **
0 commit comments