You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
PyDeequ is a Python API for [Deequ](https://github.com/awslabs/deequ), a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets. PyDeequ is written to support usage of Deequ in Python.
There are 4 main components of Deequ, and they are:
8
-
- Metrics Computation:
9
-
-`Profiles` leverages Analyzers to analyze each column of a dataset.
10
-
-`Analyzers` serve here as a foundational module that computes metrics for data profiling and validation at scale.
11
-
- Constraint Suggestion:
7
+
There are 4 main components of Deequ, and they are:
8
+
- Metrics Computation:
9
+
-`Profiles` leverages Analyzers to analyze each column of a dataset.
10
+
-`Analyzers` serve here as a foundational module that computes metrics for data profiling and validation at scale.
11
+
- Constraint Suggestion:
12
12
- Specify rules for various groups of Analyzers to be run over a dataset to return back a collection of constraints suggested to run in a Verification Suite.
13
-
- Constraint Verification:
14
-
- Perform data validation on a dataset with respect to various constraints set by you.
13
+
- Constraint Verification:
14
+
- Perform data validation on a dataset with respect to various constraints set by you.
15
15
- Metrics Repository
16
-
- Allows for persistence and tracking of Deequ runs over time.
16
+
- Allows for persistence and tracking of Deequ runs over time.
17
17
18
18

19
19
@@ -32,9 +32,9 @@ You can install [PyDeequ via pip](https://pypi.org/project/pydeequ/).
0 commit comments