-
Notifications
You must be signed in to change notification settings - Fork 1k
Description
Description
This experiment will prototype a minimal dataset validation mechanism using Pandera to assess feasibility, ergonomics, and architectural fit.
The primary goal is to evaluate whether dataset-level validation can align with the direction established by parameter validation in a clean, Kedro-native way. Pandera will be used as a concrete backend for the experiment.
Context
Kedro is introducing first-class parameter validation. As part of exploring a cohesive validation strategy, we want to evaluate whether a similar approach can be extended to dataset-level validation.
Currently, data validation is commonly implemented via hooks and third-party libraries, which can introduce hidden control flow and architectural misalignment.
Scope
Implement a minimal prototype that:
- Integrates Pandera-based validation at the dataset level
- Triggers validation at load() (optional: save())
- Avoids using hooks
- Is tested within a simple example project
Explore whether:
- A dataset wrapper approach is sufficient
- The validation logic can conceptually align with the parameter validation structure
Deliverable
- A minimal proof of concept sufficient to evaluate the approach
- Short write-up covering:
- Architectural fit with Kedro
- (Interaction with lazy datasets) - Based on this comment (Add docs on working with Pandera for data validation #5142 (comment))
- Developer experience
- Limitations / trade-offs
Metadata
Metadata
Assignees
Labels
Type
Projects
Status