Skip to content

Preserve ArrayOfEqualSizedArrays through awkward conversion#209

Draft
Yuan-Ru-Lin wants to merge 5 commits intolegend-exp:mainfrom
Yuan-Ru-Lin:preserve-aoesa
Draft

Preserve ArrayOfEqualSizedArrays through awkward conversion#209
Yuan-Ru-Lin wants to merge 5 commits intolegend-exp:mainfrom
Yuan-Ru-Lin:preserve-aoesa

Conversation

@Yuan-Ru-Lin
Copy link
Copy Markdown
Contributor

Detect RegularType in _ak_to_lgdo_or_col_dict and return
ArrayOfEqualSizedArrays instead of falling back to VectorOfVectors.

Closes #208

Detect RegularType in _ak_to_lgdo_or_col_dict and return
ArrayOfEqualSizedArrays instead of falling back to VectorOfVectors.
@codecov
Copy link
Copy Markdown

codecov bot commented Jan 22, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 80.81%. Comparing base (f6fb8ef) to head (9f0b7a6).

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #209   +/-   ##
=======================================
  Coverage   80.81%   80.81%           
=======================================
  Files          47       47           
  Lines        3978     3978           
=======================================
  Hits         3215     3215           
  Misses        763      763           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@Yuan-Ru-Lin
Copy link
Copy Markdown
Contributor Author

Can you take a look, @gipert?

Comment on lines +642 to +643
if isinstance(array.type.content, ak.types.RegularType):
return ArrayOfEqualSizedArrays(nda=ak.to_numpy(array))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a big expert of the awkward types, but shouldn't this be inside the NumpyType if? Also, i guess this should be also restricted to 2D

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this be inside the NumpyType if?

Not if we want to wrap an N-D array, where N > 1, as a AOESA and return it early. This type check correctly identifies such array.

import numpy as np, awkward as ak
# 1D
assert isinstance(ak.Array(np.array([1,2,3,4])).type.content, ak.types.NumpyType)
# 2D
assert isinstance(ak.Array(np.array([[1,2],[3,4]])).type.content, ak.types.RegularType)
# 3D
assert isinstance(ak.Array(np.array([[[1,2],[3,4]]])).type.content, ak.types.RegularType)

this should be also restricted to 2D

Can you elaborate?

Copy link
Copy Markdown
Contributor Author

@Yuan-Ru-Lin Yuan-Ru-Lin Jan 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be also restricted to 2D

Can you elaborate?

Ah, I see why. The following raises a TypeError because Array cannot be the type of a field.

import lgdo, awkward as ak, numpy as np

lgdo.Table({'a': ak.Array(np.array([[[1,2],[3,4]]]))})

EDIT: it should be lgdo.Table(ak.Array({'a': [[[1,2],[3,4]]]})) instead. Let me paste the new result below

Error log ----> 1 lgdo.Table({'a': ak.Array(np.array([[[1,2],[3,4]]]))})

File ~/Desktop/pygama-dev/legend-pydataobj/src/lgdo/types/table.py:93, in Table.init(self, col_dict, size, attrs)
90 col_dict = _ak_to_lgdo_or_col_dict(col_dict)
92 # call Struct constructor
---> 93 Struct.init(self, obj_dict=col_dict, attrs=attrs)
94 # no need to call the LGDOCollection constructor, as we are calling the
95 # Struct constructor already
96
97 # if col_dict is not empty, set size according to it
98 # if size is also supplied, resize all fields to match it
99 # otherwise, warn if the supplied fields have varying size
100 if col_dict is not None and len(col_dict) > 0:

File ~/Desktop/pygama-dev/legend-pydataobj/src/lgdo/types/struct.py:51, in Struct.init(self, obj_dict, attrs)
41 """
42 Parameters
43 ----------
(...) 48 a set of user attributes to be carried along with this LGDO.
49 """
50 if obj_dict is not None:
---> 51 self.update(obj_dict)
53 # check the datatype attribute passed by the user and sort the fields
54 # to ensure consistent behavior
55 if attrs is not None and "datatype" in attrs:

File ~/Desktop/pygama-dev/legend-pydataobj/src/lgdo/types/struct.py:156, in Struct.update(self, other, **kwargs)
154 self[k].update(v)
155 else:
--> 156 self[k] = v

File ~/Desktop/pygama-dev/legend-pydataobj/src/lgdo/types/struct.py:125, in Struct.setitem(self, name, obj)
124 def setitem(self, name: str, obj: LGDO) -> None:
--> 125 return self.add_field(name, obj)

File ~/Desktop/pygama-dev/legend-pydataobj/src/lgdo/types/table.py:219, in Table.add_field(self, name, obj, use_obj_size)
217 else:
218 msg = "cannot add field of type"
--> 219 raise TypeError(msg, type(obj).name)
221 super().add_field(name, obj)
223 if self.size is None:

TypeError: ('cannot add field of type', 'Array')

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be also restricted to 2D

Can you elaborate?

It also works for 3-D arrays.

import lgdo, awkward as ak, numpy as np

lgdo.Table(ak.Array({
    'a': np.array([[[1,2],[3,4]],[[5, 6], [7, 8]]]),
    'b': np.array([[1,2],[3,4]]),
    'c': np.array([5,6])
}))
# Result: Table(dict={'a': ArrayOfEqualSizedArrays([[[1 2] [3 4]] [[5 6] [7 8]]], attrs={'datatype': 'array_of_equalsized_arrays<1,2>{real}'}), 'b': ArrayOfEqualSizedArrays([[1 2] [3 4]], attrs={'datatype': 'array_of_equalsized_arrays<1,1>{real}'}), 'c': Array([5 6], attrs={'datatype': 'array<1>{real}'})}, attrs={'datatype': 'table{a,b,c}'})

"b": [[1, 2], [3], [4], [5, 6, 7]],
"c": {"f1": [[], [5], [3, 7, 6], []], "f2": [5, 6, 7, 8]},
"d": ["boh", "hello", "there", "!"],
"e": np.array([[1, 2], [3, 4], [5, 6], [7, 8]]), # RegularType
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as test, i would check what happens with a 3D regular array. i guess that should be just an Array? not sure.

In general, i think it's not obvious whether [[1, 2], [3, 4], [5, 6], [7, 8]] is an Array or an ArrayofEqualSizedArrays...

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as test, i would check what happens with a 3D regular array. i guess that should be just an Array? not sure.

The result of the test is in the above comment.

it's not obvious whether [[1, 2], [3, 4], [5, 6], [7, 8]] is an Array or an ArrayofEqualSizedArrays...

I say we should treat it as an ArrayOfEqualSizedArrays as NumPy treats it as a 2D array of shape (4, 2). If users really want to treat such array as an Array, they can and should reshape it before wrapping it; it's a zero-copy operation anyway.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I say we should treat it as an ArrayOfEqualSizedArrays as NumPy treats it as a 2D array of shape (4, 2)

I don't understand why, this could also well be array<2>{real}? In the end both types can represent [[1, 2], [3, 4], [5, 6], [7, 8]], it's just a choice. I don't know about the Julia implementation, but on our end the low level object for Array and ArrayOfEqualSizedArrays is the same.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point now. It is indeed up to users to determine whether a 2D data should be treated as an AOESA or an Array (of 2D data). However, we should at least convert a RegularArray to Array, instead of VectorOfVectors, which makes less sense but is the current situation.

@gipert
Copy link
Copy Markdown
Member

gipert commented Jan 22, 2026

I think unfortunately it's a bit tricky to decide whether a regular awkward array is an Array or an ArrayOfEqualSizedArrays. In the end it's up to the user...

@Yuan-Ru-Lin
Copy link
Copy Markdown
Contributor Author

Just to summarize the points raised in the above comments:

  1. N-D arrays where N > 1 have RegularType while 1-D arrays have NumpyType. We can leverage the difference to reconstruct ArrayOfEqualSizedArrays and Array respectively.
  2. 3-D arrays are handled as well; a test has been added.
  3. Users should reshape high-dimensional arrays into 1-D arrays if they want to treat the data as Array

@iguinn
Copy link
Copy Markdown
Contributor

iguinn commented Jan 23, 2026

I actually wonder if we should not be relying on the ak types, which seem to have too many permutations to easily handle and aren't always easy to understand, and instead rely on some of the methods in ak.Array? I think we could use ndims, fields, and typestr:

if len(a.fields)>0: Table
elif a.ndim==1: Array
elif "var" in a.typestr: VoV
else: AoEA

@Yuan-Ru-Lin
Copy link
Copy Markdown
Contributor Author

I actually wonder if we should not be relying on the ak types, which seem to have too many permutations to easily handle and aren't always easy to understand, and instead rely on some of the methods in ak.Array? I think we could use ndims, fields, and typestr:

if len(a.fields)>0: Table
elif a.ndim==1: Array
elif "var" in a.typestr: VoV
else: AoEA

The problem @gipert has been stressing is that when ndim == 2, it's not clear whether it should an array<2> or array_of_equalsized_arrays<1,1>, both of which have the same data layout; it's not a problem of API but inherent ambiguity.

Add RegularType to the condition that returns Array in
_ak_to_lgdo_or_col_dict. High-dimensional regular arrays are now
converted to Array, letting downstream code (e.g. WaveformTable) decide
the appropriate LGDO type.
@gipert
Copy link
Copy Markdown
Member

gipert commented Feb 4, 2026

@ggmarshall maybe let's discuss also this at the next call, I'm not entirely sure about what we should do.

@gipert
Copy link
Copy Markdown
Member

gipert commented Feb 19, 2026

@Yuan-Ru-Lin could you rebase?

@gipert
Copy link
Copy Markdown
Member

gipert commented Mar 11, 2026

can we do something like: always return array<>, except when it's clearly a waveform table?

@gipert gipert marked this pull request as draft April 7, 2026 14:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Table(ak.Array) doesn't preserve AOESA

3 participants