Skip to content

Prototype Dataset Validation Using Pandera #5391

@rashidakanchwala

Description

@rashidakanchwala

Description

This experiment will prototype a minimal dataset validation mechanism using Pandera to assess feasibility, ergonomics, and architectural fit.

The primary goal is to evaluate whether dataset-level validation can align with the direction established by parameter validation in a clean, Kedro-native way. Pandera will be used as a concrete backend for the experiment.

Context

Kedro is introducing first-class parameter validation. As part of exploring a cohesive validation strategy, we want to evaluate whether a similar approach can be extended to dataset-level validation.

Currently, data validation is commonly implemented via hooks and third-party libraries, which can introduce hidden control flow and architectural misalignment.

Scope

Implement a minimal prototype that:

  • Integrates Pandera-based validation at the dataset level
  • Triggers validation at load() (optional: save())
  • Avoids using hooks
  • Is tested within a simple example project

Explore whether:

  • A dataset wrapper approach is sufficient
  • The validation logic can conceptually align with the parameter validation structure

Deliverable

Metadata

Metadata

Assignees

Labels

Issue: Feature RequestNew feature or improvement to existing feature

Type

Projects

Status

To Do

Relationships

None yet

Development

No branches or pull requests

Issue actions