Skip to content

Commit c87ec02

Browse files
committed
Some amendments
1 parent 96b4aed commit c87ec02

File tree

1 file changed

+22
-62
lines changed

1 file changed

+22
-62
lines changed

examples/slicing_and_beyond.ipynb

Lines changed: 22 additions & 62 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66
"source": [
77
"# Slicing chunks and beyond\n",
88
"\n",
9-
"The newest and coolest way to store data in python-blosc2 is through a SChunk (super-chunk) object. Here the data is split into chunks of the same size. So in the past, the only way of working with it was chunk by chunk (see tutorials-basics.ipynb). But now, python-blosc2 can retrieve, update or append data all at once (i.e. avoiding doing it chunk by chunk). To see how this works, let's first create our SChunk."
9+
"The newest and coolest way to store data in python-blosc2 is through a `SChunk` (super-chunk) object. Here the data is split into chunks of the same size. In the past, the only way of working with it was chunk by chunk (see tutorials-basics.ipynb), but now, python-blosc2 can retrieve, update or append data at item level (i.e. avoiding doing it chunk by chunk). To see how this works, let's first create our SChunk."
1010
]
1111
},
1212
{
@@ -56,11 +56,7 @@
5656
{
5757
"cell_type": "code",
5858
"execution_count": 3,
59-
"metadata": {
60-
"pycharm": {
61-
"name": "#%%\n"
62-
}
63-
},
59+
"metadata": {},
6460
"outputs": [
6561
{
6662
"name": "stdout",
@@ -85,11 +81,7 @@
8581
{
8682
"cell_type": "code",
8783
"execution_count": 4,
88-
"metadata": {
89-
"pycharm": {
90-
"name": "#%%\n"
91-
}
92-
},
84+
"metadata": {},
9385
"outputs": [
9486
{
9587
"name": "stdout",
@@ -120,11 +112,7 @@
120112
{
121113
"cell_type": "code",
122114
"execution_count": 5,
123-
"metadata": {
124-
"pycharm": {
125-
"name": "#%%\n"
126-
}
127-
},
115+
"metadata": {},
128116
"outputs": [],
129117
"source": [
130118
"start = 34\n",
@@ -143,11 +131,7 @@
143131
{
144132
"cell_type": "code",
145133
"execution_count": 6,
146-
"metadata": {
147-
"pycharm": {
148-
"name": "#%%\n"
149-
}
150-
},
134+
"metadata": {},
151135
"outputs": [],
152136
"source": [
153137
"schunk_nelems = 1000 * 200 * nchunks\n",
@@ -162,9 +146,9 @@
162146
"cell_type": "markdown",
163147
"metadata": {},
164148
"source": [
165-
"## Getting a SChunk from/as a contiguous buffer\n",
149+
"## Building a SChunk from/as a contiguous buffer\n",
166150
"\n",
167-
"Furthermore, you can pass from a SChunk to a contiguous buffer and vice versa. Let's get that buffer:"
151+
"Furthermore, you can convert a SChunk to a contiguous, serialized buffer and vice-versa. Let's get that buffer (aka `cframe`) first:"
168152
]
169153
},
170154
{
@@ -196,95 +180,71 @@
196180
{
197181
"cell_type": "code",
198182
"execution_count": 8,
199-
"metadata": {
200-
"pycharm": {
201-
"name": "#%%\n"
202-
}
203-
},
183+
"metadata": {},
204184
"outputs": [],
205185
"source": [
206186
"schunk2 = blosc2.schunk_from_cframe(cframe=buf, copy=True)"
207187
]
208188
},
209189
{
210190
"cell_type": "markdown",
211-
"metadata": {
212-
"pycharm": {
213-
"name": "#%% md\n"
214-
}
215-
},
191+
"metadata": {},
216192
"source": [
217193
"In this case we set the `copy` param to `True`. If you do not want to copy the buffer,\n",
218-
"be mindful that you will have to keep its reference until you do not\n",
194+
"be mindful that you will have to keep a reference to it until you do not\n",
219195
"want the SChunk anymore.\n",
220196
"\n",
221-
"## Compressing NumPy arrays\n",
197+
"## Serializing NumPy arrays\n",
222198
"\n",
223-
"If the object you want to get as a compressed buffer is a NumPy array, you can use the newer and faster functions to store it in-memory or on-disk.\n",
199+
"If what you want is to create a serialized, compressed version of a NumPy array, you can use the newer (and faster) functions to store it either in-memory or on-disk. The specification of such a contiguous compressed representation, aka **cframe** can be seen at: https://github.com/Blosc/c-blosc2/blob/main/README_CFRAME_FORMAT.rst.\n",
224200
"\n",
225201
"### In-memory\n",
226202
"\n",
227-
"To store it in-memory you can use `pack_array2`. In comparison with its former version, it is faster (see `pack_compress.py` bench) and does not have the 2 GB size limitation."
203+
"For obtaining an in-memory representation, you can use `pack_array2`. In comparison with its former version (`pack_array`), it is way faster and does not have the 2 GB size limitation:"
228204
]
229205
},
230206
{
231207
"cell_type": "code",
232208
"execution_count": 9,
233-
"metadata": {
234-
"pycharm": {
235-
"name": "#%%\n"
236-
}
237-
},
209+
"metadata": {},
238210
"outputs": [],
239211
"source": [
240-
"np_array = np.arange(2**30, dtype=np.int32)\n",
212+
"np_array = np.arange(2**30 + 1, dtype=np.int32) # 2 GB (+4) array\n",
241213
"\n",
242214
"packed_arr2 = blosc2.pack_array2(np_array)\n",
243215
"unpacked_arr2 = blosc2.unpack_array2(packed_arr2)"
244216
]
245217
},
246218
{
247219
"cell_type": "markdown",
248-
"metadata": {
249-
"pycharm": {
250-
"name": "#%% md\n"
251-
}
252-
},
220+
"metadata": {},
253221
"source": [
254222
"### On-disk\n",
255223
"\n",
256-
"To perform the same but store the buffer on-disk you would use `save_array` and `load_array` like so:"
224+
"To store the serialized buffer on-disk you want to use `save_array` and `load_array`:"
257225
]
258226
},
259227
{
260228
"cell_type": "code",
261229
"execution_count": 10,
262-
"metadata": {
263-
"pycharm": {
264-
"name": "#%%\n"
265-
}
266-
},
230+
"metadata": {},
267231
"outputs": [],
268232
"source": [
269233
"blosc2.save_array(np_array, urlpath=\"ondisk_array.b2frame\", mode=\"w\")\n",
270234
"np_array2 = blosc2.load_array(\"ondisk_array.b2frame\")\n",
271235
"np.array_equal(np_array, np_array2)\n",
272236
"\n",
273-
"# Remove it\n",
237+
"# Remove it from disk\n",
274238
"blosc2.remove_urlpath(\"ondisk_array.b2frame\")"
275239
]
276240
},
277241
{
278242
"cell_type": "markdown",
279-
"metadata": {
280-
"pycharm": {
281-
"name": "#%% md\n"
282-
}
283-
},
243+
"metadata": {},
284244
"source": [
285245
"# Conclusions\n",
286246
"\n",
287-
"Now python-blosc2 has an easy way of creating, getting, setting, deleting and expanding data in a SChunk. Moreover, you can get a contiguous compressed representation (aka [cframe](https://github.com/Blosc/c-blosc2/blob/main/README_CFRAME_FORMAT.rst)) of it and create it again latter. And you can do the same with NumPy arrays faster than with the former functions.\n"
247+
"Now python-blosc2 offers an easy, yet fast way of creating, getting, setting and expanding data via the `SChunk` class. Moreover, you can get a contiguous compressed representation (aka [cframe](https://github.com/Blosc/c-blosc2/blob/main/README_CFRAME_FORMAT.rst)) of it and re-create it again later with no sweat.\n"
288248
]
289249
}
290250
],
@@ -309,4 +269,4 @@
309269
},
310270
"nbformat": 4,
311271
"nbformat_minor": 1
312-
}
272+
}

0 commit comments

Comments
 (0)