The GX Airflow Provider has three Operators to validate Expectations against your data.
This Operator has the simplest API. The user is responsible for loading data into a DataFrame, and GX validates it against the provided Expectations. It has two required parameters:
configure_dataframeis a function that returns a DataFrame. This is how you pass your data to the Operator.expectis either a single Expectation or an ExpectationSuite
Optionally, you can also pass a result_format parameter to control the verbosity of the output.
The GXValidateDataFrameOperator will return a serialized ExpectationSuiteValidationResult.
This Operator is similar to the GXValidateDataFrameOperator, except that GX is responsible for loading the data. The Operator can load and validate data from any data source supported by GX. Its required parameters are:
configure_batch_definitionis a function that takes a single argument, a DataContext, and returns a BatchDefinition. This is how you configure GX to read your data.expectis either a single Expectation or an ExpectationSuite
Optionally, you can also pass a result_format parameter to control the verbosity of the output, and
batch_parameters to specify a batch of data at runtime.
The GXValidateBatchOperator will return a serialized ExpectationSuiteValidationResult.
This Operator can take advantage of all the features of GX. The user configures a Checkpoint,
which orchestrates a BatchDefinition, ValidationDefinition, and ExpectationSuite.
Actions can also be triggered after a Checkpoint run, which can send Slack messages,
MicrosoftTeam messages, email alerts, and more.
It has a single required parameter:
configure_checkpointis a function that takes a single argument, a DataContext, and returns a Checkpoint.
Optionally, you can pass in batch_parameters to specify a batch of data at runtime.
The GXValidateCheckpointOperator will return a serialized CheckpointResult.