Ensure buf is bytes in GZip's decode on Python 3

jakirkham · jakirkham · commit 67d22ee27f9c · 2018-11-28T11:49:18.000-05:00
On Python 3, `BytesIO` avoids a copy of `buf` if it is already of
`bytes` type. So go ahead and convert it to `bytes` in the `decode`
method. If it is already of `bytes` type, then this is a no-op that
avoids a copy on Python 3.

Unfortunately Python 2 copies the result regardless of what type it is.
So simply make sure to take an `ndarray` view onto the data on Python 2.
This will ensures that `buf` is something that satisfies the buffer
protocol. That way we do not introduce any additional copies beyond what
`BytesIO` will do on Python 2 anyways.
diff --git a/numcodecs/gzip.py b/numcodecs/gzip.py
@@ -5,7 +5,7 @@
 
 
 from .abc import Codec
-from .compat import ensure_ndarray, ensure_contiguous_ndarray, PY2
+from .compat import ensure_bytes, ensure_ndarray, ensure_contiguous_ndarray, PY2
 
 
 class GZip(Codec):
@@ -50,7 +50,10 @@ def encode(self, buf):
     def decode(self, buf, out=None):
 
         # normalise inputs
-        buf = ensure_contiguous_ndarray(buf)
+        if PY2:  # pragma: py3 no cover
+            buf = ensure_contiguous_ndarray(buf)
+        else:  # pragma: py2 no cover
+            buf = ensure_bytes(buf)
 
         # do decompression
         buf = io.BytesIO(buf)