-
Notifications
You must be signed in to change notification settings - Fork 1k
Description
Description
Add a dedicated page discuss Data Validation, and different options in Kedro.
Context
These questions has been ask repeatly:
- What are the current status of kedro-great? It seems unmaintained
- What are Kedro's opinion about GE or pandera, which one is the go-to plugin?
- Should I use Pydantic?
- How to validate config, type checking etc?
Kedro is all about best practice for data/ML project, and data validation is no longer an optional thing. The current status is that we have some mention about Great Expectation, a sample hook, unmaintained plugin and not-so-active plugin in the wild. While we cannot give a default path for users, it would be still beneficial to discuss different options and tradeoff to provide some guidance and let users make their own choice.
Pages:
- https://docs.kedro.org/en/stable/hooks/index.html
- https://docs.kedro.org/en/stable/hooks/examples.html#v2-api
- https://docs.kedro.org/en/stable/hooks/examples.html#v3-api
Plugin:
- https://pypi.org/project/kedro-great/ (unmaintain, last update 2020)
- https://pypi.org/project/kedro-expectations/ (actively maintained, but we don't know much about it)
- https://pypi.org/project/kedro-great-expectations/ (from @deepyaman but at version 0.0.1 2023)
- kedro-pandera https://github.com/Galileo-Galilei/kedro-pandera, last update Jul 2024, somewhat active but feature are limited
It's also important to keep in mind that plugin/libraries are one of the options but not the only one, it's still possible to write custom function to do assertion, or even do it as a unit test (with pytest for example)
Sub-issues
Metadata
Metadata
Labels
Type
Projects
Status
Status