Skip to content

Conversation

@NJManganelli
Copy link
Collaborator

@NJManganelli NJManganelli commented Nov 23, 2025

This continues the series of PRs to make pydantic input models thread through coffea, and specifically enables preprocessing parquet files via this. The old preprocess is renamed preprocess_legacy, and preprocess now attempts to coerce users to pydantic classes (with an escape hatch, preprocess_legacy_root=True). The pydantic preprocess is made a nonpublic function, with two user-facing variants for root and parquet which call it appropriately.

This partially pulls in updates from https://github.com/NJManganelli/coffea/tree/datafactory_parquet and some associated prototyping in another branch which did not get incorporated into #1403

This walks back changes to preprocess, to make the code friendlier and make it easier should we deprecate the dict-based preprocess.

More to be edited into the description as it evolves.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant