Preserve ArrayOfEqualSizedArrays through awkward conversion#209
Preserve ArrayOfEqualSizedArrays through awkward conversion#209Yuan-Ru-Lin wants to merge 5 commits intolegend-exp:mainfrom
Conversation
38b8132 to
06e7310
Compare
Detect RegularType in _ak_to_lgdo_or_col_dict and return ArrayOfEqualSizedArrays instead of falling back to VectorOfVectors.
fcb91fd to
bab275e
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #209 +/- ##
=======================================
Coverage 80.81% 80.81%
=======================================
Files 47 47
Lines 3978 3978
=======================================
Hits 3215 3215
Misses 763 763 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Can you take a look, @gipert? |
src/lgdo/types/table.py
Outdated
| if isinstance(array.type.content, ak.types.RegularType): | ||
| return ArrayOfEqualSizedArrays(nda=ak.to_numpy(array)) |
There was a problem hiding this comment.
I'm not a big expert of the awkward types, but shouldn't this be inside the NumpyType if? Also, i guess this should be also restricted to 2D
There was a problem hiding this comment.
shouldn't this be inside the NumpyType if?
Not if we want to wrap an N-D array, where N > 1, as a AOESA and return it early. This type check correctly identifies such array.
import numpy as np, awkward as ak
# 1D
assert isinstance(ak.Array(np.array([1,2,3,4])).type.content, ak.types.NumpyType)
# 2D
assert isinstance(ak.Array(np.array([[1,2],[3,4]])).type.content, ak.types.RegularType)
# 3D
assert isinstance(ak.Array(np.array([[[1,2],[3,4]]])).type.content, ak.types.RegularType)this should be also restricted to 2D
Can you elaborate?
There was a problem hiding this comment.
this should be also restricted to 2D
Can you elaborate?
Ah, I see why. The following raises a TypeError because Array cannot be the type of a field.
import lgdo, awkward as ak, numpy as np
lgdo.Table({'a': ak.Array(np.array([[[1,2],[3,4]]]))})EDIT: it should be lgdo.Table(ak.Array({'a': [[[1,2],[3,4]]]})) instead. Let me paste the new result below
Error log
----> 1 lgdo.Table({'a': ak.Array(np.array([[[1,2],[3,4]]]))})File ~/Desktop/pygama-dev/legend-pydataobj/src/lgdo/types/table.py:93, in Table.init(self, col_dict, size, attrs)
90 col_dict = _ak_to_lgdo_or_col_dict(col_dict)
92 # call Struct constructor
---> 93 Struct.init(self, obj_dict=col_dict, attrs=attrs)
94 # no need to call the LGDOCollection constructor, as we are calling the
95 # Struct constructor already
96
97 # if col_dict is not empty, set size according to it
98 # if size is also supplied, resize all fields to match it
99 # otherwise, warn if the supplied fields have varying size
100 if col_dict is not None and len(col_dict) > 0:
File ~/Desktop/pygama-dev/legend-pydataobj/src/lgdo/types/struct.py:51, in Struct.init(self, obj_dict, attrs)
41 """
42 Parameters
43 ----------
(...) 48 a set of user attributes to be carried along with this LGDO.
49 """
50 if obj_dict is not None:
---> 51 self.update(obj_dict)
53 # check the datatype attribute passed by the user and sort the fields
54 # to ensure consistent behavior
55 if attrs is not None and "datatype" in attrs:
File ~/Desktop/pygama-dev/legend-pydataobj/src/lgdo/types/struct.py:156, in Struct.update(self, other, **kwargs)
154 self[k].update(v)
155 else:
--> 156 self[k] = v
File ~/Desktop/pygama-dev/legend-pydataobj/src/lgdo/types/struct.py:125, in Struct.setitem(self, name, obj)
124 def setitem(self, name: str, obj: LGDO) -> None:
--> 125 return self.add_field(name, obj)
File ~/Desktop/pygama-dev/legend-pydataobj/src/lgdo/types/table.py:219, in Table.add_field(self, name, obj, use_obj_size)
217 else:
218 msg = "cannot add field of type"
--> 219 raise TypeError(msg, type(obj).name)
221 super().add_field(name, obj)
223 if self.size is None:
TypeError: ('cannot add field of type', 'Array')
There was a problem hiding this comment.
this should be also restricted to 2D
Can you elaborate?
It also works for 3-D arrays.
import lgdo, awkward as ak, numpy as np
lgdo.Table(ak.Array({
'a': np.array([[[1,2],[3,4]],[[5, 6], [7, 8]]]),
'b': np.array([[1,2],[3,4]]),
'c': np.array([5,6])
}))
# Result: Table(dict={'a': ArrayOfEqualSizedArrays([[[1 2] [3 4]] [[5 6] [7 8]]], attrs={'datatype': 'array_of_equalsized_arrays<1,2>{real}'}), 'b': ArrayOfEqualSizedArrays([[1 2] [3 4]], attrs={'datatype': 'array_of_equalsized_arrays<1,1>{real}'}), 'c': Array([5 6], attrs={'datatype': 'array<1>{real}'})}, attrs={'datatype': 'table{a,b,c}'})
tests/types/test_table.py
Outdated
| "b": [[1, 2], [3], [4], [5, 6, 7]], | ||
| "c": {"f1": [[], [5], [3, 7, 6], []], "f2": [5, 6, 7, 8]}, | ||
| "d": ["boh", "hello", "there", "!"], | ||
| "e": np.array([[1, 2], [3, 4], [5, 6], [7, 8]]), # RegularType |
There was a problem hiding this comment.
as test, i would check what happens with a 3D regular array. i guess that should be just an Array? not sure.
In general, i think it's not obvious whether [[1, 2], [3, 4], [5, 6], [7, 8]] is an Array or an ArrayofEqualSizedArrays...
There was a problem hiding this comment.
as test, i would check what happens with a 3D regular array. i guess that should be just an Array? not sure.
The result of the test is in the above comment.
it's not obvious whether [[1, 2], [3, 4], [5, 6], [7, 8]] is an Array or an ArrayofEqualSizedArrays...
I say we should treat it as an ArrayOfEqualSizedArrays as NumPy treats it as a 2D array of shape (4, 2). If users really want to treat such array as an Array, they can and should reshape it before wrapping it; it's a zero-copy operation anyway.
There was a problem hiding this comment.
I say we should treat it as an ArrayOfEqualSizedArrays as NumPy treats it as a 2D array of shape (4, 2)
I don't understand why, this could also well be array<2>{real}? In the end both types can represent [[1, 2], [3, 4], [5, 6], [7, 8]], it's just a choice. I don't know about the Julia implementation, but on our end the low level object for Array and ArrayOfEqualSizedArrays is the same.
There was a problem hiding this comment.
I see your point now. It is indeed up to users to determine whether a 2D data should be treated as an AOESA or an Array (of 2D data). However, we should at least convert a RegularArray to Array, instead of VectorOfVectors, which makes less sense but is the current situation.
|
I think unfortunately it's a bit tricky to decide whether a regular awkward array is an Array or an ArrayOfEqualSizedArrays. In the end it's up to the user... |
|
Just to summarize the points raised in the above comments:
|
|
I actually wonder if we should not be relying on the ak types, which seem to have too many permutations to easily handle and aren't always easy to understand, and instead rely on some of the methods in ak.Array? I think we could use ndims, fields, and typestr: |
The problem @gipert has been stressing is that when |
Add RegularType to the condition that returns Array in _ak_to_lgdo_or_col_dict. High-dimensional regular arrays are now converted to Array, letting downstream code (e.g. WaveformTable) decide the appropriate LGDO type.
|
@ggmarshall maybe let's discuss also this at the next call, I'm not entirely sure about what we should do. |
|
@Yuan-Ru-Lin could you rebase? |
|
can we do something like: always return |
Detect RegularType in _ak_to_lgdo_or_col_dict and return
ArrayOfEqualSizedArrays instead of falling back to VectorOfVectors.
Closes #208