Is there a way to use dask to help reduce memory usage? #2341

cmoore24-24 · 2023-03-27T20:15:44Z

cmoore24-24
Mar 27, 2023

Hello!
I have created a function that uses awkward arrays to calculate an energy correlation function:

def e23(fatjetscands, pfcands, fatjets):
    jets_pt = ak.unflatten(ak.flatten(pfcands.pt[fatjetscands.pFCandsIdx]),ak.flatten(fatjets.nConstituents))
    sums = ak.sum(jets_pt, axis=1)
    z = (jets_pt/sums)

    eta = ak.unflatten(ak.flatten(pfcands.eta[fatjetscands.pFCandsIdx]),ak.flatten(fatjets.nConstituents))

    phi = ak.unflatten(ak.flatten(pfcands.phi[fatjetscands.pFCandsIdx]),ak.flatten(fatjets.nConstituents))

    z_comb = ak.combinations(z, 3, axis=1)
    z_ijk = z_comb['0']*z_comb['1']*z_comb['2']

    eta_comb = ak.combinations(eta, 3, axis=1)
    phi_comb = ak.combinations(phi, 3, axis=1)

    coords = ak.zip({'phi': phi_comb, 'eta': eta_comb})

    a = distance(coords, '0', '1')
    b = distance(coords, '0', '2')
    c = distance(coords, '1', '2')

    ij_ik = a*b
    ij_jk = a*c
    ik_jk = b*c
    
    del_comb = ak.flatten(ak.zip({'ijik': ij_ik, 'ijjk': ij_jk, 'ikjk': ik_jk}))
    temp_numpy = np.vstack((del_comb.ijik.to_numpy(),
        del_comb.ijjk.to_numpy(), 
        del_comb.ikjk.to_numpy())).T
    delta_r = ak.unflatten(np.amin(temp_numpy,axis=1), ak.count(z_ijk,axis=1))

    e23_jetwise = ak.sum(np.multiply(z_ijk, delta_r), axis=1)
    e23_eventwise = ak.unflatten(e23_jetwise, ak.count(fatjets.nConstituents, axis=1))
    return e23_eventwise

but because of the size of some of these arrays I run into memory problems pretty quickly. I assumed my only option was to rewrite in numba, but it was suggested to me that I might be able to use dask to keep my code array-oriented rather than loopy. Is there currently an easy way to incorporate dask into awkward centric code, like mine above?

Any advice is greatly appreciated!

agoose77 · 2023-03-28T15:44:13Z

agoose77
Mar 28, 2023
Collaborator

Rewriting in numba will likely give you the benefit of fewer allocations; Awkward Array natively requires a new array to be allocated for each operation (we do have non-public support for numexpr, but this can only help for ufunc operations at the moment).

If you are just worried about total memory usage, you can delete the intermediate arrays once you're finished with them. This should lead to a reduction in total usage. Note that tracking and reporting memory usage can be a bit complex, so don't be surprised if you don't necessarily see what you're expecting.

Dask can help here because it can do the lifetime tracking itself (I.e delete arrays once they're no longer needed). It also can optimise the case where you are reading more columns from your data source than are actually needed, although this can also be done by hand.

1 reply

cmoore24-24 Mar 28, 2023
Author

I didn't realize I could manually delete the arrays I didn't need anymore mid-calculation. I'll give that a shot, I think it'll help a lot! But it does sound like numba might still be the way to go, or at least worth checking. Thank you!

jpivarski · 2023-03-28T15:57:39Z

jpivarski
Mar 28, 2023
Maintainer

This is a good question. The fact that you're doing 3-way combinations of collections with "jet" in their name (there are often a large number of jets per event) is suggestive that this is going to use a lot of memory. "$n$ choose 3" can be a big number when $n$ is large.

The simplest thing you can do is ignore the memory use and have Dask break it into smaller chunks. That is, run e23 more times on smaller array slices of fatjetscands, pfcands, and fatjets.

Further down in this response, I noticed that pFCandsIdx is probably a global index, and so if you slice pfcands into smaller arrays, the pFCandsIdx would have to be adjusted for the new starting index. If I'm interpreting that right, it makes this method messier: if you pass in

pfcands[start:stop]

as an argument to this function, you'd have to subtract

fatjetscands.pFCandsIdx - start

in the slice of pfcands.eta and pfcands.phi. That's unpleasant, but it's due to the way that pFCandsIdx was defined (relative to the start of a flattened pfcands, instead of indexed within each event).

If you want to go memory-hunting, the easiest way to find the biggest problem spot is to get fatjetscands, pfcands, and fatjets into an interactive environment, either Jupyter or a Python terminal prompt, do one line at a time, and watch your memory use with htop in another window (or similar). Pympler's SummaryTracker is also a great way to do line-by-line memory use, and it differs from the "outside looking in" approach of htop by giving you a table of all of the Python objects that are using memory. It's probably the case that ak.combinations(some_big_collection, 3, axis=1) makes one very large NumPy array.

It might help to know that you can run ak.combinations on zipped records, so

>>> jets = ak.Array([
...     [
...         {"pt": 1, "eta": 2, "phi": 3},
...         {"pt": 1, "eta": 2, "phi": 3},
...         {"pt": 1, "eta": 2, "phi": 3},
...         {"pt": 1, "eta": 2, "phi": 3},
...     ],
...     [
...         {"pt": 1, "eta": 2, "phi": 3},
...     ],
...     [
...         {"pt": 1, "eta": 2, "phi": 3},
...         {"pt": 1, "eta": 2, "phi": 3},
...         {"pt": 1, "eta": 2, "phi": 3},
...         {"pt": 1, "eta": 2, "phi": 3},
...         {"pt": 1, "eta": 2, "phi": 3},
...     ]
... ])
>>> combo_jets = ak.combinations(jets, 3)
>>> combo_jets.show(type=True)
type: 3 * var * (
    {
        pt: int64,
        eta: int64,
        phi: int64
    },
    {
        pt: int64,
        eta: int64,
        phi: int64
    },
    {
        pt: int64,
        eta: int64,
        phi: int64
    }
)
[[({pt: 1, eta: 2, phi: 3}, {pt: 1, ...}, {...}), ..., ({...}, {...}, ...)],
 [],
 [({pt: 1, eta: 2, phi: 3}, {pt: 1, ...}, {...}), ..., ({...}, {...}, ...)]]

instead of doing each jagged numeric field separately:

>>> ak.combinations(jets.pt, 3).show(type=True)
type: 3 * var * (
    int64,
    int64,
    int64
)
[[(1, 1, 1), (1, 1, 1), (1, 1, 1), (1, 1, 1)],
 [],
 [(1, 1, 1), (1, 1, 1), (1, 1, 1), (...), ..., (1, 1, 1), (1, 1, 1), (1, 1, 1)]]
>>> ak.combinations(jets.eta, 3).show(type=True)
type: 3 * var * (
    int64,
    int64,
    int64
)
[[(2, 2, 2), (2, 2, 2), (2, 2, 2), (2, 2, 2)],
 [],
 [(2, 2, 2), (2, 2, 2), (2, 2, 2), (...), ..., (2, 2, 2), (2, 2, 2), (2, 2, 2)]]
>>> ak.combinations(jets.phi, 3).show(type=True)
type: 3 * var * (
    int64,
    int64,
    int64
)
[[(3, 3, 3), (3, 3, 3), (3, 3, 3), (3, 3, 3)],
 [],
 [(3, 3, 3), (3, 3, 3), (3, 3, 3), (...), ..., (3, 3, 3), (3, 3, 3), (3, 3, 3)]]

Maybe you did each field separately because you thought it would use less memory. If so, it's worth knowing that the opposite is likely to be true, for two reasons: (a) all the fields in a record share the same offsets buffer, instead of each having their own, and (b) as an implementation detail, we don't duplicate records in ak.combinations the way we do numeric fields. Instead, we use an IndexedArray to "lazily duplicate" them.

You can see this implementation detail by looking at combo_jets.layout:

<ListOffsetArray len='3'>
    <offsets><Index dtype='int64' len='4'>[ 0  4  4 14]</Index></offsets>
    <content><RecordArray is_tuple='true' len='14'>
        <content index='0'>
            <IndexedArray len='14'>
                <index><Index dtype='int64' len='14'>
                    [0 0 0 1 5 5 5 5 5 5 6 6 6 7]
                </Index></index>
                <content><RecordArray is_tuple='false' len='10'>
                    <content index='0' field='pt'>
                        <NumpyArray dtype='int64' len='10'>[1 1 1 1 1 1 1 1 1 1]</NumpyArray>
                    </content>
                    <content index='1' field='eta'>
                        <NumpyArray dtype='int64' len='10'>[2 2 2 2 2 2 2 2 2 2]</NumpyArray>
                    </content>
                    <content index='2' field='phi'>
                        <NumpyArray dtype='int64' len='10'>[3 3 3 3 3 3 3 3 3 3]</NumpyArray>
                    </content>
                </RecordArray></content>
            </IndexedArray>
        </content>
        <content index='1'>
            <IndexedArray len='14'>
                <index><Index dtype='int64' len='14'>
                    [1 1 2 2 6 6 6 7 7 8 7 7 8 8]
                </Index></index>
                <content><RecordArray is_tuple='false' len='10'>
                    <content index='0' field='pt'>
                        <NumpyArray dtype='int64' len='10'>[1 1 1 1 1 1 1 1 1 1]</NumpyArray>
                    </content>
                    <content index='1' field='eta'>
                        <NumpyArray dtype='int64' len='10'>[2 2 2 2 2 2 2 2 2 2]</NumpyArray>
                    </content>
                    <content index='2' field='phi'>
                        <NumpyArray dtype='int64' len='10'>[3 3 3 3 3 3 3 3 3 3]</NumpyArray>
                    </content>
                </RecordArray></content>
            </IndexedArray>
        </content>
        <content index='2'>
            <IndexedArray len='14'>
                <index><Index dtype='int64' len='14'>
                    [2 3 3 3 7 8 9 8 9 9 8 9 9 9]
                </Index></index>
                <content><RecordArray is_tuple='false' len='10'>
                    <content index='0' field='pt'>
                        <NumpyArray dtype='int64' len='10'>[1 1 1 1 1 1 1 1 1 1]</NumpyArray>
                    </content>
                    <content index='1' field='eta'>
                        <NumpyArray dtype='int64' len='10'>[2 2 2 2 2 2 2 2 2 2]</NumpyArray>
                    </content>
                    <content index='2' field='phi'>
                        <NumpyArray dtype='int64' len='10'>[3 3 3 3 3 3 3 3 3 3]</NumpyArray>
                    </content>
                </RecordArray></content>
            </IndexedArray>
        </content>
    </RecordArray></content>
</ListOffsetArray>

The outer RecordArray is the 3-tuple of combinations fields "0", "1", and "2", inside each is an IndexedArray with a different index, indicating how the same set of input records is lazily duplicated:

>>> combo_jets.layout.content.content("0").index
<Index dtype='int64' len='14'>
    [0 0 0 1 5 5 5 5 5 5 6 6 6 7]
</Index>
>>> combo_jets.layout.content.content("1").index
<Index dtype='int64' len='14'>
    [1 1 2 2 6 6 6 7 7 8 7 7 8 8]
</Index>
>>> combo_jets.layout.content.content("2").index
<Index dtype='int64' len='14'>
    [2 3 3 3 7 8 9 8 9 9 8 9 9 9]
</Index>

and inside of that is the input RecordArray (not a copy).

>>> record_in_0 = combo_jets.layout.content.content("0").content
>>> record_in_1 = combo_jets.layout.content.content("1").content
>>> record_in_2 = combo_jets.layout.content.content("2").content
>>> record_in_0 is record_in_1
True
>>> record_in_0 is record_in_2
True
>>> record_in_1 is record_in_2
True

In other words, this was all designed to minimize memory usage if you naively call ak.combinations on a big record without worrying about how many fields there are, and you might have made it use more memory by trying to be careful.

Calling ak.combinations on numeric data makes copies:

>>> combo_numeric = ak.combinations(jets.eta, 3)
>>> combo_numeric.layout
<ListOffsetArray len='3'>
    <offsets><Index dtype='int64' len='4'>[ 0  4  4 14]</Index></offsets>
    <content><RecordArray is_tuple='true' len='14'>
        <content index='0'>
            <NumpyArray dtype='int64' len='14'>[2 2 2 2 2 2 2 2 2 2 2 2 2 2]</NumpyArray>
        </content>
        <content index='1'>
            <NumpyArray dtype='int64' len='14'>[2 2 2 2 2 2 2 2 2 2 2 2 2 2]</NumpyArray>
        </content>
        <content index='2'>
            <NumpyArray dtype='int64' len='14'>[2 2 2 2 2 2 2 2 2 2 2 2 2 2]</NumpyArray>
        </content>
    </RecordArray></content>
</ListOffsetArray>
>>> combo_numeric.layout.content.content("0").data
array([2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])
>>> nparray_in_0 = combo_numeric.layout.content.content("0").data
>>> nparray_in_1 = combo_numeric.layout.content.content("1").data
>>> nparray_in_2 = combo_numeric.layout.content.content("2").data
>>> np.shares_memory(nparray_in_0, nparray_in_1)
False
>>> np.shares_memory(nparray_in_0, nparray_in_2)
False
>>> np.shares_memory(nparray_in_1, nparray_in_2)
False

It is a trade-off: the IndexedArray saves us from rearranging/duplicating its content up-front, at a cost of having to do it on every access. The optimization heuristic that we use is: if it's a RecordArray, we don't know if the user is going to be interested in every field, so we introduce the IndexedArray; if it's not a RecordArray, the user can only be interested in all of its contents, so we don't introduce the IndexedArray.

You are interested in three fields, the pt/sum, the eta, and the phi, so you can't get away with not generating them, but if the ak.combinations is called on the RecordArray, at least the outer ListOffsetArray offsets are shared, and by generating them one at a time, maybe you can del the previous one before moving on to the next.

You do a lot of flattening and unflattening, and I don't know why. You have to flatten and unflatten if an in-between step is np.vstack, but you can do the same thing with ak.concatenate (with appropriately chosen axis) and not have to flatten/unflatten.

Since the ListOffsetArray has an offsets buffer, which is a cumulative sum, and ak.unflatten takes counts, which is the thing that gets cumulatively summed, each round trip creates a new offsets when you could have continually reused the old offsets by not going through the round trip. (If you need to flatten and unflatten with offsets, rather than counts, you'd have to build the low-level ListOffsetArray manually.)

Oh, wait—I think I get it—the pFCandsIdx is a global index, so you need to flatten to get values in that index, right?

In that case, I guess you have to flatten it, but if you go a low-level route, manipulating ListOffsetArray directly instead of using the high-level ak.unflatten function, you'll have more opportunities to reuse offsets. Keep an eye out for the distinction between ListOffsetArray and ListArray, which the high-level view hides.

And then there's also Numba. Someone once asked me a similar question, but they were doing 5-way combinations, and the intermediate arrays with $n$ choose 5 items in each event were huge. In that case, not creating intermediate combinations-arrays was the only way to proceed. (It's a reminder that what we really want is a language that understands combinatorics and can make intermediate arrays or not based on the situation.)

1 reply

cmoore24-24 Mar 28, 2023
Author

Thank you for your reply, Jim!

You were right on about why I did all the convoluted flattening and unflattening- Flattening Idx, and then regrouping my arrays so the elements were jets rather than events, was the best way I was able to come up with to apply pFCandsIdx to the overall candidates list (further tempered by nCandidates). It definitely seems like I can improve on this in some way, based on your comment.

I'll definitely try zipping my pt, eta, and phi before calling combinations and see how that affects my memory usage. Thank you for the combo_jets example, that was very helpful!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Is there a way to use dask to help reduce memory usage? #2341

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Is there a way to use dask to help reduce memory usage? #2341

Uh oh!

cmoore24-24 Mar 27, 2023

Replies: 2 comments · 2 replies

Uh oh!

Uh oh!

agoose77 Mar 28, 2023 Collaborator

Uh oh!

cmoore24-24 Mar 28, 2023 Author

Uh oh!

jpivarski Mar 28, 2023 Maintainer

Uh oh!

cmoore24-24 Mar 28, 2023 Author

cmoore24-24
Mar 27, 2023

Replies: 2 comments 2 replies

agoose77
Mar 28, 2023
Collaborator

cmoore24-24 Mar 28, 2023
Author

jpivarski
Mar 28, 2023
Maintainer

cmoore24-24 Mar 28, 2023
Author