-
Notifications
You must be signed in to change notification settings - Fork 39
Error from astype() on StringArray and inconsistencies with zeros_like() #199
Description
My use case: I need to be able to make a mask for a JaggedArray containing strings, starting with something like this:
jagged_array_of_strings.zeros_like().astype(bool)but this fails on a couple different levels. The first is that StringArray seems to have a problem with astype():
>>> j = awkward.fromiter(['True'])
>>> j
<StringArray ['True'] at 0x7f6e88799400>
>>> j.astype(bool)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/mproffit/anaconda3/lib/python3.7/site-packages/awkward/array/base.py", line 111, in __repr__
return "<{0} {1} at 0x{2:012x}>".format(self.__class__.__name__, str(self), id(self))
File "/home/mproffit/anaconda3/lib/python3.7/site-packages/awkward/array/base.py", line 98, in __str__
return "[{0}]".format(" ".join(self._util_arraystr(x) for x in self.__iter__(checkiter=False)))
File "/home/mproffit/anaconda3/lib/python3.7/site-packages/awkward/array/base.py", line 98, in <genexpr>
return "[{0}]".format(" ".join(self._util_arraystr(x) for x in self.__iter__(checkiter=False)))
File "/home/mproffit/anaconda3/lib/python3.7/site-packages/awkward/array/objects.py", line 177, in __iter__
for x in self._content:
File "/home/mproffit/anaconda3/lib/python3.7/site-packages/awkward/array/jagged.py", line 496, in __iter__
self._valid()
File "/home/mproffit/anaconda3/lib/python3.7/site-packages/awkward/array/jagged.py", line 466, in _valid
raise ValueError("maximum offset {0} is beyond the length of the content ({1})".format(self._offsets.max(), len(self._content)))
ValueError: maximum offset 4 is beyond the length of the content (1)Independently, zeros_like() has some problematic behavior on StringArray as well:
>>> j.zeros_like()
<StringArray ['\x00\x00\x00\x00'] at 0x7f6e887990f0>My issue with this is that a string of null bytes actually evaluates to True and can't even be directly converted to a number:
>>> bool('\x00')
True
>>> int('\x00')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: invalid literal for int() with base 10: '\x00'For comparison, numpy's zeros_like() converts strings to empty strings:
>>> import numpy as np
>>> a = np.array('True')
>>> a
array('True', dtype='<U4')
>>> np.zeros_like(a)
array('', dtype='<U4')Empty strings do convert to False (i.e., bool('') is False).
As an aside, astype(bool) oddly doesn't actually work on this ndarray:
>>> np.zeros_like(a).astype(bool)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: invalid literal for int() with base 10: ''But the following does work (and unfortunately doesn't have an equivalent in awkward as far as I'm aware):
>>> np.zeros_like(a, dtype=bool)
array(False)Edit: Turns out this known problem in numpy has been sitting around for a couple years: numpy/numpy#9875