Skip to content

Commit be0a763

Browse files
jakirkhamalimanfoo
authored andcommitted
Use buffer StringIO in GZip.decode on Python 2 (#152)
* Use buffer `StringIO` with `GzipFile` on Python 2 In `GZip`'s `decode` method, we knew that `BytesIO` was always copying the data. So we merely ensured the data was in a form that made it easy to copy (i.e. a contiguous buffer). However we have since learned we can do better as the Python 2 implementation of `StringIO` merely takes the buffer without copying it. So use `StringIO` on Python 2 to avoid a copy here. As this support doesn't extend to Python 3, continue using `BytesIO` in that case. Should avoid copies on Python 2 in the general case when using `GZip`'s `decode`; thus, improving performance there. ref: http://www.hydrogen18.com/blog/unpickling-buffers.html ref: https://github.com/python/cpython/blob/2.7/Modules/cStringIO.c#L716 ref: https://github.com/python/cpython/blob/2.7/Modules/cStringIO.c#L681 ref: https://github.com/python/cpython/blob/2.7/Modules/cStringIO.c#L160 * Note GZip.decode avoids copying data on Python 2
1 parent 57b9478 commit be0a763

File tree

2 files changed

+10
-5
lines changed

2 files changed

+10
-5
lines changed

docs/release.rst

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,9 @@ Release notes
1515
* Use (new) buffer protocol in ``MsgPack`` codec `decode()` method.
1616
By :user:`John Kirkham <jakirkham>`, :issue:`148`.
1717

18+
* Avoid copying into data in ``GZip``'s `decode()` method on Python 2.
19+
By :user:`John Kirkham <jakirkham>`, :issue:`152`.
20+
1821

1922
.. _release_0.6.1:
2023

numcodecs/gzip.py

Lines changed: 7 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,10 @@
88
from .compat import ensure_bytes, ensure_ndarray, ensure_contiguous_ndarray, PY2
99

1010

11+
if PY2: # pragma: py3 no cover
12+
from cStringIO import StringIO
13+
14+
1115
class GZip(Codec):
1216
"""Codec providing gzip compression using zlib via the Python standard library.
1317
@@ -51,16 +55,14 @@ def decode(self, buf, out=None):
5155

5256
# normalise inputs
5357
if PY2: # pragma: py3 no cover
54-
# On Python 2, BytesIO always copies.
55-
# Merely ensure the data supports the (new) buffer protocol.
56-
buf = ensure_contiguous_ndarray(buf)
58+
# On Python 2, StringIO always uses the buffer protocol.
59+
buf = StringIO(ensure_contiguous_ndarray(buf))
5760
else: # pragma: py2 no cover
5861
# BytesIO only copies if the data is not of `bytes` type.
5962
# This allows `bytes` objects to pass through without copying.
60-
buf = ensure_bytes(buf)
63+
buf = io.BytesIO(ensure_bytes(buf))
6164

6265
# do decompression
63-
buf = io.BytesIO(buf)
6466
with _gzip.GzipFile(fileobj=buf, mode='rb') as decompressor:
6567
if out is not None:
6668
out_view = ensure_contiguous_ndarray(out)

0 commit comments

Comments
 (0)