You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
## Changes
<!-- Summary of your changes that are easy to understand. Add
screenshots when necessary -->
Adding Lakebase checks storage backend.
### Linked issues
<!-- DOC: Link issue with a keyword: close, closes, closed, fix, fixes,
fixed, resolve, resolves, resolved. See
https://docs.github.com/en/issues/tracking-your-work-with-issues/linking-a-pull-request-to-an-issue#linking-a-pull-request-to-an-issue-using-a-keyword
-->
Resolves#444
### Tests
<!-- How is this tested? Please see the checklist below and also
describe any other relevant tests -->
- [x] manually tested
- [ ] added unit tests
- [x] added integration tests
- [ ] added end-to-end tests
---------
Co-authored-by: Copilot <[email protected]>
Co-authored-by: Marcin Wojtyczka <[email protected]>
Copy file name to clipboardExpand all lines: docs/dqx/docs/guide/data_profiling.mdx
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -262,7 +262,7 @@ When running the profiler workflow using Databricks API or UI, you have the same
262
262
- If the `checks_location` in the run config points to a table, the checks will be saved to that table.
263
263
If the `checks_location` in the run config points to a file, file name is replaced with "<input_table>.yml". In addition, if the location is specified as a relative path, it is prefixed with the workspace installation folder.
264
264
For example:
265
-
- If "checks_location=catalog.schema.table", the location will be resolved to "catalog.schema.table".
265
+
- If "checks_location=catalog.schema.table", the location will be resolved to "catalog.schema.table" or "database.schema.table" in case of using Lakebase to store checks.
266
266
- If "checks_location=folder/checks.yml", the location will be resolved to "install_folder/folder/<input_table>.yml".
267
267
- If "checks_location=/App/checks.yml", the location will be resolved to "/App/<input_table>.yml".
268
268
- If "checks_location=/Volume/catalog/schema/folder/checks.yml", the location will be resolved to "/Volume/catalog/schema/folder/<input_table>.yml".
Copy file name to clipboardExpand all lines: docs/dqx/docs/guide/quality_checks_apply.mdx
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -575,7 +575,7 @@ When running the quality checker workflow using Databricks API or UI, you have t
575
575
- If the `checks_location` in the run config points to a table, the checks will be directly loaded from that table.
576
576
If the `checks_location` in the run config points to a file, file name is replaced with "<input_table>.yml". In addition, if the location is specified as a relative path, it is prefixed with the workspace installation folder.
577
577
For example:
578
-
- If "checks_location=catalog.schema.table", the location will be resolved to "catalog.schema.table".
578
+
- If "checks_location=catalog.schema.table", the location will be resolved to "catalog.schema.table" or "database.schema.table" in case of using Lakebase to store checks.
579
579
- If "checks_location=folder/checks.yml", the location will be resolved to "install_folder/folder/<input_table>.yml".
580
580
- If "checks_location=/App/checks.yml", the location will be resolved to "/App/<input_table>.yml".
581
581
- If "checks_location=/Volume/catalog/schema/folder/checks.yml", the location will be resolved to "/Volume/catalog/schema/folder/<input_table>.yml".
@@ -690,7 +690,7 @@ When running the e2e workflow using Databricks API or UI, you have the same exec
690
690
- If the `checks_location` in the run config points to a table, the checks will be directly loaded from that table.
691
691
If the `checks_location` in the run config points to a file, file name is replaced with <input_table>.yml. In addition, if the location is specified as a relative path, it is prefixed with the workspace installation folder.
692
692
For example:
693
-
- If "checks_location=catalog.schema.table", the location will be resolved to "catalog.schema.table".
693
+
- If "checks_location=catalog.schema.table", the location will be resolved to "catalog.schema.table" or "database.schema.table" in case of using Lakebase to store checks.
694
694
- If "checks_location=folder/checks.yml", the location will be resolved to "install_folder/folder/<input_table>.yml".
695
695
- If "checks_location=/App/checks.yml", the location will be resolved to "/App/<input_table>.yml".
696
696
- If "checks_location=/Volume/catalog/schema/folder/checks.yml", the location will be resolved to "/Volume/catalog/schema/folder/<input_table>.yml".
Copy file name to clipboardExpand all lines: docs/dqx/docs/guide/quality_checks_storage.mdx
+18-2Lines changed: 18 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -22,12 +22,20 @@ Saving and loading methods accept a storage backend configuration as input. The
22
22
*`mode`: (optional) write mode for saving checks (`overwrite` or `append`, default is `overwrite`). The `overwrite` mode will only replace checks for the specific run config and not all checks in the table.
*`instance_name`: name of the Lakebase instance, e.g., "my-instance".
27
+
*`user`: user to connect to the Lakebase instance, e.g., "[email protected]" or Databricks service principal client ID.
28
+
*`location`: fully-qualified table name in the format "database.schema.table".
29
+
*`port`: (optional) port on which to connect to the Lakebase instance (use 5432 if not provided).
30
+
*`run_config_name`: (optional) run configuration name to load (use "default" if not provided).
31
+
*`mode`: (optional) write mode for saving checks (`overwrite` or `append`, default is `overwrite`). The `overwrite` mode will only replace checks for the specific run config and not all checks in the table.
25
32
*`InstallationChecksStorageConfig`: installation-managed location from the run config, ignores location and infers it from `checks_location` in the run config. Containing fields:
26
33
*`location` (optional): automatically set based on the `checks_location` field from the run configuration.
27
34
*`install_folder`: (optional) installation folder where DQX is installed, only required when custom installation folder is used.
28
35
*`run_config_name` (optional) - run configuration name to load (it can be any string), e.g. input table or job name (use "default" if not provided).
29
36
*`product_name`: (optional) name of the product (use "dqx" if not provided).
30
37
*`assume_user`: (optional) if True, assume user installation, otherwise global installation (skipped if `install_folder` is provided).
38
+
* the config inherits from the specific configs such as `WorkspaceFileChecksStorageConfig`, `TableChecksStorageConfig`, `VolumeFileChecksStorageConfig`, and `LakebaseChecksStorageConfig` so relevant fields from these specific configs can be provided (e.g. instance_name and user for lakebase).
31
39
32
40
You can find details on how to define checks [here](/docs/guide/quality_checks_definition).
33
41
@@ -49,7 +57,8 @@ If you create checks as a list of `DQRule` objects, you can convert them to meta
49
57
WorkspaceFileChecksStorageConfig,
50
58
InstallationChecksStorageConfig,
51
59
TableChecksStorageConfig,
52
-
VolumeFileChecksStorageConfig
60
+
VolumeFileChecksStorageConfig,
61
+
LakebaseChecksStorageConfig,
53
62
)
54
63
from databricks.sdk import WorkspaceClient
55
64
@@ -81,6 +90,9 @@ If you create checks as a list of `DQRule` objects, you can convert them to meta
81
90
# save checks as a YAML in a Unity Catalog Volume location (overwrite the file)
Copy file name to clipboardExpand all lines: docs/dqx/docs/installation.mdx
+7-1Lines changed: 7 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -212,6 +212,12 @@ run_configs: # <- list of run configurations, each run co
212
212
213
213
checks_location: iot_checks.yml # <- Quality rules (checks) can be stored in a table or defined in JSON or YAML files, located at absolute or relative path within the installation folder or volume file path.
# lakebase_instance_name: my-lakebase-instance # <- the name of the lakebase instance to use for storing checks
218
+
# lakebase_user: 00000000-0000-0000-0000-000000000000 # <- the user to connect to the lakebase, e.g., [email protected] or a Databricks service principal client ID
219
+
# lakebase_port: 5432 # <- optional port to connect to Lakebase, default is 5432
220
+
215
221
custom_check_functions: # <- optional mapping of custom check function name to Python file (module) containing check function definition
0 commit comments