feat: changes for dask-awkward one-pass optimize#1225
feat: changes for dask-awkward one-pass optimize#1225martindurant wants to merge 12 commits intoscikit-hep:mainfrom
Conversation
src/uproot/_dask.py
Outdated
| class UprootReadMixin: | ||
| base_form: Form | ||
| expected_form: Form | ||
| behavior = {} |
There was a problem hiding this comment.
I'm not sure how behaviours should be handled, so this probably needs updating
| # Identify form key | ||
| form_key, attribute = buffer_key.rsplit("-", maxsplit=1) | ||
| form_key, attribute = buffer_key.replace("@.", "<root>.").rsplit( | ||
| "-", maxsplit=1 |
There was a problem hiding this comment.
Would this fail for any of the styles of buffer name? We could use [0] or *attribute to guard against this (we don't actually use the attribute).
In fact, this whole loop could be a single comprehension.
There was a problem hiding this comment.
It's unclear what characters are not allowed in a TBranch name, which gets passed over to the form_keys. It would be hard to use a : in a TBranch name, since that gets interpreted by ROOT (in some constructors, not all). On the other hand, . is very common and can't be a delimiter.
I tried to see what would break it:
root [0] TTree* t = new TTree("name", "title")
(TTree *) 0x6360d13c1720
root [1] (new TBranch(t, "something", nullptr, "leaf"))->GetName()
(const char *) "something"
root [2] (new TBranch(t, "some.thing", nullptr, "leaf"))->GetName()
(const char *) "some.thing"
root [3] (new TBranch(t, "some:thing", nullptr, "leaf"))->GetName()
(const char *) "some:thing"
root [4] (new TBranch(t, "some/thing", nullptr, "leaf"))->GetName()
(const char *) "some/thing"
root [5] (new TBranch(t, "some\"thing", nullptr, "leaf"))->GetName()
(const char *) "some"thing"
root [6] (new TBranch(t, "some`thing", nullptr, "leaf"))->GetName()
(const char *) "some`thing"
root [7] (new TBranch(t, "some!thing", nullptr, "leaf"))->GetName()
(const char *) "some!thing"
root [8] (new TBranch(t, "some@thing", nullptr, "leaf"))->GetName()
(const char *) "some@thing"
root [9] (new TBranch(t, "some#thing", nullptr, "leaf"))->GetName()
(const char *) "some#thing"
root [10] (new TBranch(t, "some+thing", nullptr, "leaf"))->GetName()
(const char *) "some+thing"
root [11] (new TBranch(t, "some(thing", nullptr, "leaf"))->GetName()
(const char *) "some(thing"
root [12] (new TBranch(t, "some{thing", nullptr, "leaf"))->GetName()
(const char *) "some{thing"
root [13] (new TBranch(t, "some[thing", nullptr, "leaf"))->GetName()
(const char *) "some[thing"
root [14] (new TBranch(t, "some\\thing", nullptr, "leaf"))->GetName()
(const char *) "some\thing"
root [15] (new TBranch(t, "some|thing", nullptr, "leaf"))->GetName()
(const char *) "some|thing"
root [16] (new TBranch(t, "some;thing", nullptr, "leaf"))->GetName()
(const char *) "some;thing"
root [17] (new TBranch(t, "some\0thing", nullptr, "leaf"))->GetName()
(const char *) "some"Here's something we can count on: a ROOT TBranch name is not going to have a null byte (Python "\x00") in the middle of the string, whereas this is legal and would be properly carried around in Python. It seems pretty drastic, though.
…ant/uproot5 into one-pass-dask-optimize
…ant/uproot5 into one-pass-dask-optimize
|
|
||
| def project(self: T, *, report: TypeTracerReport, state: dict) -> T: | ||
| keys = self.necessary_columns(report=report, state=state) | ||
| keys = [_buf_to_col(c).replace(".", "_") for c in columns] |
There was a problem hiding this comment.
@lgray : this is the simplistic way to unmap the column names. This line for the simple case, the if-block for nanoAOD, adding in implicit columns like nJet. Then, this doesn't actually make use of the schema and keys_for_buffer_keys
| # Identify form key | ||
| form_key, attribute = buffer_key.rsplit("-", maxsplit=1) | ||
| form_key, attribute = buffer_key.replace("@.", "<root>.").rsplit( | ||
| "-", maxsplit=1 |
There was a problem hiding this comment.
It's unclear what characters are not allowed in a TBranch name, which gets passed over to the form_keys. It would be hard to use a : in a TBranch name, since that gets interpreted by ROOT (in some constructors, not all). On the other hand, . is very common and can't be a delimiter.
I tried to see what would break it:
root [0] TTree* t = new TTree("name", "title")
(TTree *) 0x6360d13c1720
root [1] (new TBranch(t, "something", nullptr, "leaf"))->GetName()
(const char *) "something"
root [2] (new TBranch(t, "some.thing", nullptr, "leaf"))->GetName()
(const char *) "some.thing"
root [3] (new TBranch(t, "some:thing", nullptr, "leaf"))->GetName()
(const char *) "some:thing"
root [4] (new TBranch(t, "some/thing", nullptr, "leaf"))->GetName()
(const char *) "some/thing"
root [5] (new TBranch(t, "some\"thing", nullptr, "leaf"))->GetName()
(const char *) "some"thing"
root [6] (new TBranch(t, "some`thing", nullptr, "leaf"))->GetName()
(const char *) "some`thing"
root [7] (new TBranch(t, "some!thing", nullptr, "leaf"))->GetName()
(const char *) "some!thing"
root [8] (new TBranch(t, "some@thing", nullptr, "leaf"))->GetName()
(const char *) "some@thing"
root [9] (new TBranch(t, "some#thing", nullptr, "leaf"))->GetName()
(const char *) "some#thing"
root [10] (new TBranch(t, "some+thing", nullptr, "leaf"))->GetName()
(const char *) "some+thing"
root [11] (new TBranch(t, "some(thing", nullptr, "leaf"))->GetName()
(const char *) "some(thing"
root [12] (new TBranch(t, "some{thing", nullptr, "leaf"))->GetName()
(const char *) "some{thing"
root [13] (new TBranch(t, "some[thing", nullptr, "leaf"))->GetName()
(const char *) "some[thing"
root [14] (new TBranch(t, "some\\thing", nullptr, "leaf"))->GetName()
(const char *) "some\thing"
root [15] (new TBranch(t, "some|thing", nullptr, "leaf"))->GetName()
(const char *) "some|thing"
root [16] (new TBranch(t, "some;thing", nullptr, "leaf"))->GetName()
(const char *) "some;thing"
root [17] (new TBranch(t, "some\0thing", nullptr, "leaf"))->GetName()
(const char *) "some"Here's something we can count on: a ROOT TBranch name is not going to have a null byte (Python "\x00") in the middle of the string, whereas this is legal and would be properly carried around in Python. It seems pretty drastic, though.
|
|
||
| def project(self: T, *, report: TypeTracerReport, state: dict) -> T: | ||
| keys = self.necessary_columns(report=report, state=state) | ||
| keys = [_buf_to_col(c).replace(".", "_") for c in columns] |
This makes the basic uproot tests in dask-contrib/dask-awkward#491 pass
cc @lgray