Overhead in `TTree.arrays` for "flat" ntuples #285

nikoladze · 2021-02-22T19:43:34Z

nikoladze
Feb 22, 2021

I'm trying to understand what the overhead for reading a "flat" TTree (meaning no sub-branches and only fundamental types and arrays thereof) is when reading via TTree.arrays as compared to looping over the branches and calling TBranch.array. This can be illustrated with the NanoAOD sample from the tests:

from io import BytesIO
import requests
import uproot
import awkward as ak

url = "https://github.com/CoffeaTeam/coffea/raw/master/tests/samples/nano_dy.root"

with requests.get(url) as res:
    raw_data = res.content

def test_iteritems(raw_data):
    with uproot.open(BytesIO(raw_data)) as f:
        return ak.zip(
            {k : v.array() for k, v in f["Events"].iteritems()},
            depth_limit=1
        )

def test_tree_arrays(raw_data):
    with uproot.open(BytesIO(raw_data)) as f:
        return f["Events"].arrays()

In [1]: %time test_iteritems(raw_data)
CPU times: user 1.54 s, sys: 20.6 ms, total: 1.56 s
Wall time: 1.54 s
Out[1]: <Array [{run: 1, ... L1_ZeroBias_copy: True}] type='40 * {"run": uint32, "lumino...'>
In [2]: %time test_tree_arrays(raw_data)
CPU times: user 16.8 s, sys: 36.1 ms, total: 16.8 s
Wall time: 16.8 s
Out[2]: <Array [{run: 1, ... L1_ZeroBias_copy: True}] type='40 * {"run": uint32, "lumino...'>

The first noticeable thing when running a profiler is that there seem to be significantly more calls to member

In [1]: %prun test_tree_arrays(raw_data)
...
   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
16930772/11294002    8.996    0.000   12.563    0.000 model.py:498(member)
...
In [2]: %prun test_iteritems(raw_data)
   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
...
61023/48501    0.031    0.000    0.040    0.000 model.py:498(member)
...

Why is that?

jpivarski · 2021-02-22T20:17:32Z

jpivarski
Feb 22, 2021
Maintainer

There shouldn't be any scaling (i.e. O(n)) performance differences between calling .array on each branch versus calling .arrays on the whole tree. They differ only in the "bookkeeping" of discovering what branches need to be collected and collecting them, though it's not as simple as your test_iteritems since the code path has to deal with potentially evaluating expressions, filtering branches, and such. As "bookkeeping code," the distinction should only be visible for small arrays. When the array size is large (MB, at least), all that running around and string manipulation for each branch should become subdominant to reading and decompressing the arrays.

That's where the optimization effort went: ensuring that there's no Python code in the O(n) loop, releasing the GIL when possible, minimizing the number of remote call-and-response, etc. No special effort was paid to minimizing the O(1) bookkeeping stuff.

If you have a TTree with a very large number of branches and a small number of entries, then this "bookkeeping stuff" could become relevant, and maybe if you have a lot of these short, wide files, it would add up to human time (seconds or more).

The "member" calls are getting member data of a ROOT class, and if I had to guess, I'd guess it's member("fBranches") on the TBranch object. Why this is called many more times would be deep in that bookkeeping code, but a simple fix could be to cache it.

For instance, if this is the offending call:

https://github.com/scikit-hep/uproot4/blob/c719f858e775d8bd04b05a69552610e51cdb0047/uproot/behaviors/TBranch.py#L857-L864

then we could speed it up like this:

    # on the HasBranches class...
    _branches = None

    @property
    def branches(self):
        """
        The list of :doc:`uproot.behaviors.TBranch.TBranch` directly under
        this :doc:`uproot.behaviors.TTree.TTree` or
        :doc:`uproot.behaviors.TBranch.TBranch` (i.e. not recursive).
        """
        if self._branches is None:
            # on this instance...
            self._branches = self.member("fBranches")
        return self._branches

The "member" abstraction is nice to have for most objects (emulating the C++ class hierarchy without reproducing it on the Python side, because of the problems this caused in Uproot 3), but TBranch is special and we can turn the guarantee that members don't change (because they're accessed with a function) into an assumption that this member doesn't change.

If, on the other hand, I'm guessing wrong about which member is being called so much, then that other member should be cached instead.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Overhead in `TTree.arrays` for "flat" ntuples #285

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Overhead in TTree.arrays for "flat" ntuples #285

Uh oh!

nikoladze Feb 22, 2021

Replies: 1 comment

Uh oh!

jpivarski Feb 22, 2021 Maintainer

Overhead in `TTree.arrays` for "flat" ntuples #285

nikoladze
Feb 22, 2021

jpivarski
Feb 22, 2021
Maintainer