feat: Pydantic preprocessing and interfacing to NanoEventsFactory #1489
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This continues the series of PRs to make pydantic input models thread through coffea, and specifically enables preprocessing parquet files via this. The old preprocess is renamed preprocess_legacy, and preprocess now attempts to coerce users to pydantic classes (with an escape hatch,
preprocess_legacy_root=True). The pydantic preprocess is made a nonpublic function, with two user-facing variants for root and parquet which call it appropriately.This partially pulls in updates from https://github.com/NJManganelli/coffea/tree/datafactory_parquet and some associated prototyping in another branch which did not get incorporated into #1403
This walks back changes to preprocess, to make the code friendlier and make it easier should we deprecate the dict-based preprocess.
More to be edited into the description as it evolves.