Skip to content

Commit ddcd4c2

Browse files
committed
New section on performance for on-disk reductions
1 parent 832df8d commit ddcd4c2

File tree

3 files changed

+70
-0
lines changed

3 files changed

+70
-0
lines changed

doc/getting_started/overview.rst

Lines changed: 70 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -223,3 +223,73 @@ You can find the notebooks for these benchmarks at:
223223
https://github.com/Blosc/python-blosc2/blob/main/bench/ndarray/lazyarray-dask-small.ipynb
224224

225225
https://github.com/Blosc/python-blosc2/blob/main/bench/ndarray/lazyarray-dask-large.ipynb
226+
227+
Reductions and disk-based computations
228+
======================================
229+
230+
One of the most interesting features of the new computing engine in Python-Blosc2 is
231+
the ability to perform reductions on compressed data that can optionally be stored
232+
on disk. This allows to perform calculations on data that is too large to fit in
233+
memory.
234+
235+
Here is a simple example:
236+
237+
.. code-block:: python
238+
239+
import blosc2
240+
241+
N = 20_000 # for small scenario
242+
# N = 100_000 # for large scenario
243+
a = blosc2.linspace(0, 1, N * N, shape=(N, N), urlpath="a.b2nd", mode="w")
244+
b = blosc2.linspace(1, 2, N * N, shape=(N, N), urlpath="b.b2nd", mode="w")
245+
c = blosc2.linspace(-10, 10, N * N, shape=(N, N)) # small and in-memory
246+
# Expression
247+
expr = np.sum(((a**3 + np.sin(a * 2)) < c) & (b > 0), axis=1)
248+
249+
# Evaluate and get a NDArray as result
250+
out = expr.compute()
251+
print(out.info)
252+
253+
In this case, we are performing a reduction that computes the sum of the boolean
254+
array that results from a expression. The result is a 1D array that will be
255+
stored in memory, but that can be optionally stored on disk (via the ``out=``
256+
parameter in the ``compute()`` or ``sum()`` methods).
257+
258+
Here you can see a plot on how this performs for a series of operands that
259+
vary in size (run on a modern desktop machine, using Linux and a 8-core
260+
AMD CPU, with a large 96 MB L3 cache, and 64 GB of RAM):
261+
262+
.. image:: https://github.com/Blosc/python-blosc2/blob/main/images/reduc-float64-amd.png?raw=true
263+
:width: 100%
264+
:alt: Performance vs operand sizes for reductions
265+
266+
As you can see, we are expressing the performance in GB/s, which is a very
267+
useful metric when dealing with large datasets. The performance is quite
268+
good and, when compression is used, it is kept constant for all operand sizes,
269+
which is a sign that Blosc2 is using the CPU caches (and memory) efficiently.
270+
271+
On the other hand, when compression is not used the performance degrades as
272+
the operand size increases, which is a sign that the CPU caches are not being
273+
used efficiently. This is a because data needs more time to be fetched from
274+
(disk) storage, and the CPU is not able to keep up with the data flow.
275+
276+
Finally, here is a plot for a much larger set of datasets (up to 400,000 x 400,000),
277+
where the operands do not fit in memory even when compressed:
278+
279+
.. image:: https://github.com/Blosc/python-blosc2/blob/main/images/reduc-float64-log-amd.png?raw=true
280+
:width: 100%
281+
:alt: Performance vs large operand sizes for reductions
282+
283+
In this case, we see that for operand sizes exceeding 2 TB, the performance
284+
degrades significantly as well, but it is still quite good, specially when using
285+
disk-based operands. This demonstrates that Blosc2 is able to load data from disk
286+
more efficiently than the swap subsystem of the operating system.
287+
288+
You can find the script for these benchmarks at:
289+
290+
https://github.com/Blosc/python-blosc2/blob/main/bench/ndarray/jit-reduc-sizes.py
291+
292+
All in all, thanks to compression and a fine-tuned partitioning for leveraging modern
293+
CPU caches and efficient I/O that overlaps computation, Blosc2 allows to perform
294+
calculations on data that is too large to fit in memory, and that can be stored in
295+
memory, on disk or `on the network <https://github.com/ironArray/Caterva2>`_.

images/reduc-float64-amd.png

137 KB
Loading

images/reduc-float64-log-amd.png

156 KB
Loading

0 commit comments

Comments
 (0)