Skip to content

Commit a18f81b

Browse files
committed
Small fixes in docs
1 parent fe26206 commit a18f81b

File tree

1 file changed

+16
-12
lines changed

1 file changed

+16
-12
lines changed

doc/getting_started/overview.rst

Lines changed: 16 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -207,7 +207,7 @@ Here, the performance compared to Dask is pretty competitive. Note that, when th
207207
is compressed (lower plot), the memory consumption is much lower than Dask, and kept constant
208208
during the computation, which is testimonial of the smart use of CPU caches and memory by the
209209
Blosc2 engine --for example, the CPU used in the experiment has 128 MB of L3, which is very
210-
close to the amount of memory used by Blosc2. This is a very important point, because
210+
close to the amount of memory used by Blosc2. This is an important point, because
211211
fitting the working set in memory is not enough; you also need to
212212
`use caches and memory efficiently <https://purplesyringa.moe/blog/the-ram-myth>`_
213213
to get the best performance.
@@ -268,28 +268,32 @@ useful metric when dealing with large datasets. The performance is quite
268268
good and, when compression is used, it is kept constant for all operand sizes,
269269
which is a sign that Blosc2 is using the CPU caches (and memory) efficiently.
270270

271-
On the other hand, when compression is not used the performance degrades as
271+
On the other hand, when compression is not used, the performance degrades as
272272
the operand size increases, which is a sign that the CPU caches are not being
273273
used efficiently. This is a because data needs more time to be fetched from
274-
(disk) storage, and the CPU is not able to keep up with the data flow.
274+
(slower disk) storage, and the CPU is not able to keep up with the data flow.
275275

276-
Finally, here is a plot for a much larger set of datasets (up to 400,000 x 400,000),
277-
where the operands do not fit in memory even when compressed:
276+
Finally, here is a plot for a much larger set of datasets (up to
277+
400,000 x 400,000, or 2.3 TB), where the operands do not fit in memory, even
278+
when compressed:
278279

279280
.. image:: https://github.com/Blosc/python-blosc2/blob/main/images/reduc-float64-log-amd.png?raw=true
280281
:width: 100%
281282
:alt: Performance vs large operand sizes for reductions
282283

283-
In this case, we see that for operand sizes exceeding 2 TB, the performance
284+
In this case, we see that for operand sizes exceeding ~1 TB, the performance
284285
degrades significantly as well, but it is still quite good, specially when using
285-
disk-based operands. This demonstrates that Blosc2 is able to load data from disk
286-
more efficiently than the swap subsystem of the operating system.
286+
disk-based operands. This demonstrates how Blosc2 is able to load data from disk
287+
more efficiently than the swap subsystem of the operating system; it can do so
288+
because it is able to grab data from disk while it is computing, so it can
289+
overlap I/O with computation.
287290

288291
You can find the script for these benchmarks at:
289292

290293
https://github.com/Blosc/python-blosc2/blob/main/bench/ndarray/jit-reduc-sizes.py
291294

292-
All in all, thanks to compression and a fine-tuned partitioning for leveraging modern
293-
CPU caches and efficient I/O that overlaps computation, Blosc2 allows to perform
294-
calculations on data that is too large to fit in memory, and that can be stored in
295-
memory, on disk or `on the network <https://github.com/ironArray/Caterva2>`_.
295+
All in all, thanks to compression, a fine-tuned partitioning for leveraging modern
296+
CPU caches, and an efficient I/O that overlaps with computation, the Blosc2 compute
297+
engine allows to perform calculations on data that is too large to fit in memory,
298+
and that can be stored in memory, on disk or
299+
`on the network <https://github.com/ironArray/Caterva2>`_.

0 commit comments

Comments
 (0)