-
When i filter complex awkward array, e.g. containing Records, then typically this will result in an indexed arrray >>> import numpy as np
>>> import awkward as ak
>>> array = ak.zip({"a" : np.random.rand(100), "b" : np.random.rand(100)})
>>> skim = array[array.a > 0.5]
>>> skim.layout
<IndexedArray64>
<index><Index64 i="[6 7 9 12 20 ... 93 94 95 96 99]" offset="0" length="49" at="0x55a7a345b4a0"/></index>
<content><RecordArray>
<field index="0" key="a">
<NumpyArray format="d" shape="100" data="0.191062 0.280678 0.32714 0.459306 0.169022 ... 0.613331 0.88532 0.240763 0.469416 0.857204" at="0x55a7a391e070"/>
</field>
<field index="1" key="b">
<NumpyArray format="d" shape="100" data="0.191364 0.641833 0.749943 0.076706 0.0817349 ... 0.170089 0.28631 0.594311 0.637529 0.488793" at="0x55a7a38bb0e0"/>
</field>
</RecordArray></content>
</IndexedArray64> This is actually a great feature, since it allows me to do fast things with complex arrays without creating huge and expensive copies. However, sometimes it might be useful to actually create a "compressed" copy where the indices or masks are applied all the way down. For example when i want to loop over several files, filter them and concatenate the filtered results. It seems |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments
-
I've been thinking about this, too, since it's an operation that you'd want to have happen right before pickling an array, for instance. It wouldn't be too different in spirit from ak.materialized or ak.repartition, which gives you logically the same array, but changed internally for performance reasons. I think the right name for such an operation would be "to pack," as in Pascal packed arrays, packing booleans into bits in NumPy, or Parquet bit-packing of small integers. (The alliteration of "packing for pickle" suggests itself.) It's not "compression" because it's not an entirely different encoding, just trimming off the unused parts. Such a feature could be done entirely at the Python level (using the internal At the moment, ak.to_arrow mixes the operations of packing and conversion to Arrow. An operation that only packs ( Bottom line: I agree with you that |
Beta Was this translation helpful? Give feedback.
-
I agree |
Beta Was this translation helpful? Give feedback.
I've been thinking about this, too, since it's an operation that you'd want to have happen right before pickling an array, for instance. It wouldn't be too different in spirit from ak.materialized or ak.repartition, which gives you logically the same array, but changed internally for performance reasons.
I think the right name for such an operation would be "to pack," as in Pascal packed arrays, packing booleans into bits in NumPy, or Parquet bit-packing of small integers. (The alliteration of "packing for pickle" suggests itself.) It's not "compression" because it's not an entirely different encoding, just trimming off the unused parts.
Such a feature could be done entirely at the Python…