Skip to content
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 0 additions & 4 deletions numcodecs/tests/test_vlen_bytes.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@
import sys
import unittest

import numpy as np
Expand Down Expand Up @@ -85,9 +84,6 @@ def test_decode_errors():
codec.decode(enc, out=np.zeros(10, dtype='i4'))


# TODO: fix this test on GitHub actions somehow...
# See https://github.com/zarr-developers/numcodecs/issues/683
@pytest.mark.skipif(sys.platform == "darwin", reason="Test is failing on macOS on GitHub actions.")
def test_encode_none():
a = np.array([b'foo', None, b'bar'], dtype=object)
codec = VLenBytes()
Expand Down
5 changes: 4 additions & 1 deletion numcodecs/vlen.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -227,6 +227,7 @@ class VLenBytes(Codec):
cdef:
Py_ssize_t i, l, n_items, data_length, total_length
object[:] values
object[:] normed_values
int[:] lengths
char* encv
object b
Expand All @@ -240,6 +241,7 @@ class VLenBytes(Codec):
n_items = values.shape[0]

# setup intermediates
normed_values = np.empty(n_items, dtype=object)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To fix the error mentioned above, we need to create a place to store the normalized values

So we allocate an empty array to start. object type as we are storing Python objects in it

lengths = np.empty(n_items, dtype=np.intc)

# first iteration to find lengths
Expand All @@ -250,6 +252,7 @@ class VLenBytes(Codec):
b = b''
elif not PyBytes_Check(b):
raise TypeError('expected byte string, found %r' % b)
normed_values[i] = b
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now when we loop through and normalize an input, we can store each normalized value for processing later

l = PyBytes_GET_SIZE(b)
data_length += l + HEADER_LENGTH
lengths[i] = l
Expand All @@ -268,7 +271,7 @@ class VLenBytes(Codec):
l = lengths[i]
store_le32(<uint8_t*>data, l)
data += HEADER_LENGTH
encv = PyBytes_AS_STRING(values[i])
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this line was the actual one that failed ( a bit further in than the aforementioned error would suggest: #683 (comment) )

The issue is that value[i] does not contain the normalized value. So if None shows up here, it errors. Also the error is a nasty segfault (due to attempting to access a field that does not exist) or an assertion error if Python's debug mode is used

encv = PyBytes_AS_STRING(normed_values[i])
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thus we can now access the normalized value here and know it is of the expected type

That said, this code can be improved further by introducing Cython typed variables and using them consistently to get better and more consistent type checking throughout

memcpy(data, encv, l)
data += l

Expand Down
Loading