Skip to content

Commit 76c2eac

Browse files
committed
Merge branch 'main' into yaml_page
2 parents af7059c + 7f182ce commit 76c2eac

File tree

65 files changed

+578
-231
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

65 files changed

+578
-231
lines changed

connectors/database/salesforce/index.mdx

Lines changed: 18 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -54,8 +54,24 @@ For a complete guide on managing secrets in hybrid setups, see the [Hybrid Inges
5454
</Tip>
5555
- **Salesforce Object Name**: Specify the Salesforce Object Name in case you want to ingest a specific object. If left blank, we will ingest all the Objects.
5656
- **Salesforce API Version**: Follow the steps mentioned [here](https://help.salesforce.com/s/articleView?id=000386929&type=1) to get the API version. Enter the numerical value in the field, For example `42.0`.
57-
- **Salesforce Domain**: When connecting to Salesforce, you can specify the domain to use for accessing the platform. The common domains include `login` and `test`, and you can also utilize Salesforce My Domain.
58-
By default, the domain `login` is used for accessing Salesforce.
57+
- **Salesforce Domain**: Specify the Salesforce domain (subdomain only) to use for authentication. This field accepts only the domain prefix, not the full URL.
58+
59+
**Common values:**
60+
- `login` (default) - For production instances (resolves to `https://login.salesforce.com`)
61+
- `test` - For sandbox instances (resolves to `https://test.salesforce.com`)
62+
63+
**For Salesforce My Domain:**
64+
Enter your custom domain prefix, including all subdomain components such as `.my` or `.sandbox.my`, but without `.salesforce.com`.
65+
66+
**Examples:**
67+
- If your My Domain URL is `https://mycompany.my.salesforce.com`, enter: `mycompany.my`
68+
- If your sandbox My Domain URL is `https://mycompany--uat.sandbox.my.salesforce.com`, enter: `mycompany--uat.sandbox.my`
69+
- If your URL is `https://example-dot-com--uat.sandbox.my.salesforce.com`, enter: `example-dot-com--uat.sandbox.my`
70+
71+
<Note>
72+
**Important:** Do NOT enter the full URL or include `.salesforce.com`. Only enter the subdomain prefix as shown in the examples above.
73+
</Note>
74+
5975
**SSL Configuration**
6076
In order to integrate SSL in the Metadata Ingestion Config, the user will have to add the SSL config under sslConfig which is placed in the source.
6177
</Step>

connectors/ingestion/great-expectations.mdx

Lines changed: 116 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -41,6 +41,15 @@ action:
4141
database_name: <databaseName in OM>
4242
schema_name: <schemaName in OM>
4343
table_name: <tableName in OM>
44+
expectation_suite_table_config_map:
45+
my_first_suite_name:
46+
database_name: <databaseName in OM>
47+
schema_name: <schemaName in OM>
48+
table_name: <tableName in OM>
49+
my_other_suite_name:
50+
database_name: <databaseName in OM>
51+
schema_name: <schemaName in OM>
52+
table_name: <tableName in OM>
4453
[...]
4554
```
4655

@@ -52,11 +61,25 @@ In your checkpoint yaml file, you will need to add the above code block in `acti
5261
- `class_name`: this is the name of the class that will be used to execute the custom action
5362
- `config_file_path`: this is the path to your `config.yaml` file that holds the configuration of your OpenMetadata server
5463
- `database_service_name`: [Optional] this is an optional parameter. If not specified and 2 tables have the same name in 2 different OpenMetadata services, the custom action will fail
55-
- `database_name`: [Optional] only required for `RuntimeDataBatchSpec` execution (e.g. run GX against a dataframe).
56-
- `schema_name`: [Optional] only required for `RuntimeDataBatchSpec` execution (e.g. run GX against a dataframe).
57-
- `table_name`: [Optional] only required for `RuntimeDataBatchSpec` execution (e.g. run GX against a dataframe).
64+
- `database_name`: [Optional] The database name as it appears in OpenMetadata. For table-based validations (`SqlAlchemyDatasourceBatchSpec`), this is inferred from the batch spec. **Required** for query-based or dataframe validations (`RuntimeQueryBatchSpec`, `RuntimeDataBatchSpec`) where the table context must be explicitly specified.
65+
- `schema_name`: [Optional] The schema name as it appears in OpenMetadata. For table-based validations, this is inferred from the batch spec. **Required** for query-based or dataframe validations. Defaults to *default* if not specified.
66+
- `table_name`: [Optional] The table name as it appears in OpenMetadata. For table-based validations, this is inferred from the batch spec. **Required** for query-based or dataframe validations where the table cannot be automatically determined.
67+
- `expectation_suite_table_config_map`: [Optional] A dictionary mapping expectation suite names to their target OpenMetadata tables. Required when running multi-table checkpoints, where different expectation suites should send results to different tables. Each entry specifies the `database_name`, `schema_name` and `table_name` for routing validation results.
5868

5969

70+
<Info>
71+
**Multi-Table Checkpoints**
72+
73+
When validating **multiple tables in a single checkpoint**, use the `expectation_suite_table_config_map` parameter to route validation results to the correct OpenMetadata tables. This is necessary because:
74+
- Each expectation suite may target a different table
75+
- The checkpoint action needs to know where to send each suite's results
76+
- Without the mapping, all results would attempt to go to the same default table
77+
78+
**Example scenario:** You have a checkpoint validating both `users` and `orders` tables with separate expectation suites (`users_suite` and `orders_suite`). The `expectation_suite_table_config_map` ensures `users_suite` results go to the `users` table and the `orders_suite` go to the `orders` table.
79+
80+
For single-table checkpoints, this parameter is not needed - the table information is provided directly or inferred from the batch spec.
81+
</Info>
82+
6083
**Note**
6184

6285
If you are using Great Expectation `DataContext` instance in Python to run your tests, you can use the `run_checkpoint` method as follows:
@@ -174,6 +197,96 @@ context.add_or_update_checkpoint(checkpoint=checkpoint)
174197
checkpoint_result = checkpoint.run()
175198
```
176199

200+
#### Multi-Table Checkpoint Example
201+
202+
Validate multiple tables in a single checkpoint run:
203+
204+
```python
205+
import great_expectations as gx
206+
from great_expectations.checkpoint import Checkpoint
207+
208+
context = gx.get_context()
209+
conn_string = "postgresql+psycopg2://user:pw@host:port/db"
210+
211+
data_source = context.sources.add_postgres(
212+
name="my_datasource",
213+
connection_string=conn_string,
214+
)
215+
216+
# Set up users table validation
217+
users_asset = data_source.add_table_asset(
218+
name="users_asset",
219+
table_name="users",
220+
schema_name="public",
221+
)
222+
users_suite = "users_suite"
223+
context.add_or_update_expectation_suite(expectation_suite_name=users_suite)
224+
users_validator = context.get_validator(
225+
batch_request=users_asset.build_batch_request(),
226+
expectation_suite_name=users_suite,
227+
)
228+
users_validator.expect_column_values_to_not_be_null(column="email")
229+
users_validator.save_expectation_suite(discard_failed_expectations=False)
230+
231+
# Set up orders table validation
232+
orders_asset = data_source.add_table_asset(
233+
name="orders_asset",
234+
table_name="orders",
235+
schema_name="public",
236+
)
237+
orders_suite = "orders_suite"
238+
context.add_or_update_expectation_suite(expectation_suite_name=orders_suite)
239+
orders_validator = context.get_validator(
240+
batch_request=orders_asset.build_batch_request(),
241+
expectation_suite_name=orders_suite,
242+
)
243+
orders_validator.expect_column_values_to_be_between(column="amount", min_value=0, max_value=1000000)
244+
orders_validator.save_expectation_suite(discard_failed_expectations=False)
245+
246+
# Create multi-table checkpoint
247+
checkpoint = Checkpoint(
248+
name="multi_table_checkpoint",
249+
run_name_template="%Y%m%d-%H%M%S-multi-table",
250+
data_context=context,
251+
validations=[
252+
{
253+
"batch_request": users_asset.build_batch_request(),
254+
"expectation_suite_name": users_suite,
255+
},
256+
{
257+
"batch_request": orders_asset.build_batch_request(),
258+
"expectation_suite_name": orders_suite,
259+
},
260+
],
261+
action_list=[
262+
{
263+
"name": "openmetadata_action",
264+
"action": {
265+
"module_name": "metadata.great_expectations.action",
266+
"class_name": "OpenMetadataValidationAction",
267+
"config_file_path": "/path/to/config/",
268+
"database_service_name": "my_postgres_service",
269+
"expectation_suite_table_config_map": {
270+
"users_suite": {
271+
"database_name": "production",
272+
"schema_name": "public",
273+
"table_name": "users",
274+
},
275+
"orders_suite": {
276+
"database_name": "production",
277+
"schema_name": "public",
278+
"table_name": "orders",
279+
},
280+
},
281+
}
282+
},
283+
],
284+
)
285+
286+
context.add_or_update_checkpoint(checkpoint=checkpoint)
287+
checkpoint_result = checkpoint.run()
288+
```
289+
177290
### Working with GX 1.x.x?
178291
In v1.x.x GX introduced significant changes to their SDK. One notable change was the removal of the `great_expectations` CLI. OpenMetadata introduced support for 1.x.x version through its `OpenMetadataValidationAction1xx` class. You will need to first `pip install 'open-metadata[great-expectations-1xx]'. Below is a complete example
179292

docs.json

Lines changed: 21 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@
88
"light": "#1570EF",
99
"dark": "#1570EF"
1010
},
11+
"favicon": "/public/images/icons/collate-monogram.svg",
1112
"navbar": {
1213
"links": [
1314
{
@@ -89,7 +90,7 @@
8990
]
9091
},
9192
{
92-
"group": "Connector List",
93+
"group": "Connectors",
9394
"pages": [
9495
{
9596
"group": "API",
@@ -1010,7 +1011,9 @@
10101011
},
10111012
{
10121013
"tab": "Products",
1013-
"pages": ["products/index"]
1014+
"pages": [
1015+
"products/index"
1016+
]
10141017
},
10151018
{
10161019
"tab": "Collate AI",
@@ -1236,7 +1239,7 @@
12361239
"how-to-guides/data-governance/classification/auto-classification/workflow",
12371240
"how-to-guides/data-governance/classification/auto-classification/external-workflow",
12381241
"how-to-guides/data-governance/classification/auto-classification/auto-pii-tagging",
1239-
"how-to-guides/data-governance/classification/auto-classification/external-sample-data"
1242+
"how-to-guides/data-governance/classification/auto-classification/external-sample-data"
12401243
]
12411244
},
12421245
"how-to-guides/data-governance/classification/tiers",
@@ -1255,6 +1258,7 @@
12551258
"group": "Workflows",
12561259
"pages": [
12571260
"how-to-guides/data-governance/workflows/index",
1261+
"how-to-guides/data-governance/workflows/creating-a-new-workflow",
12581262
{
12591263
"group": "Default Workflows",
12601264
"pages": [
@@ -1280,16 +1284,24 @@
12801284
"group": "Nodes",
12811285
"pages": [
12821286
"how-to-guides/data-governance/workflows/elements/nodes/index",
1283-
"how-to-guides/data-governance/workflows/elements/nodes/check-entity-attributes",
1284-
"how-to-guides/data-governance/workflows/elements/nodes/create-user-task",
1285-
"how-to-guides/data-governance/workflows/elements/nodes/set-asset-certification",
1286-
"how-to-guides/data-governance/workflows/elements/nodes/set-glossary-term-status"
1287+
"how-to-guides/data-governance/workflows/elements/nodes/start-node",
1288+
"how-to-guides/data-governance/workflows/elements/nodes/check-condition",
1289+
"how-to-guides/data-governance/workflows/elements/nodes/set-action",
1290+
"how-to-guides/data-governance/workflows/elements/nodes/user-approval-task",
1291+
"how-to-guides/data-governance/workflows/elements/nodes/data-completeness-task"
12871292
]
12881293
}
12891294
]
12901295
},
1291-
"how-to-guides/data-governance/workflows/how-to-edit-workflow"
1292-
1296+
{
1297+
"group": "Examples",
1298+
"pages": [
1299+
"how-to-guides/data-governance/workflows/examples/index",
1300+
"how-to-guides/data-governance/workflows/examples/tag-approval-workflow",
1301+
"how-to-guides/data-governance/workflows/examples/column-completeness-workflow",
1302+
"how-to-guides/data-governance/workflows/examples/set-tier-to-mlmodels-workflow"
1303+
]
1304+
}
12931305
]
12941306
},
12951307
"how-to-guides/data-governance/metrics"
Lines changed: 69 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,69 @@
1+
---
2+
title: Governance Workflows - Creating a New Workflow
3+
description: Create a new governance workflow by setting basic details and running event-based or batch executions.
4+
slug: /how-to-guides/data-governance/workflows/creating-a-new-workflow
5+
sidebarTitle: Creating a New Workflow
6+
---
7+
8+
# Governance Workflows - Creating a New Workflow
9+
10+
## Step 1: Navigate to Workflow Creation
11+
12+
Go to **Govern → Workflows → New Workflow**.
13+
14+
<img noZoom src="/public/images/how-to-guides/governance/new1.png" alt="Workflow Creation" />
15+
16+
## Step 2: Configure Basic Details
17+
18+
In the next screen, provide:
19+
20+
- **Name** – This will serve as the unique identifier for your workflow in OpenMetadata tables. *(No spaces allowed.)*
21+
- **Description** – A short summary describing the purpose of the workflow.
22+
23+
<img noZoom src="/public/images/how-to-guides/governance/new2.png" alt="Configuration" />
24+
25+
Start combining multiple Nodes and create a workflow.
26+
27+
### **Running a Periodic Batch Workflow**
28+
29+
To execute an on-demand workflow, click **Run Now** This immediately triggers the workflow based on its configuration.
30+
31+
<img noZoom src="/public/images/how-to-guides/governance/new3.png" alt="Running a Periodic Batch Workflow" />
32+
33+
## Best Practices
34+
35+
1. **Use the Right Type of Trigger**
36+
- **Event-Based Entity Triggers** are ideal when specific fields must be automatically updated in response to a change.
37+
38+
*Example:* When any attribute of a Glossary Term is modified, its status should automatically update to **IN REVIEW**.
39+
40+
- **Periodic Batch Triggers** are best suited for bulk updates across many entities, especially for classification or enrichment workflows.
41+
42+
*Example:* Tables or Dashboards can be classified as Tier 1, Tier 2, or Tier 3 based on the completeness of their column descriptions.
43+
44+
2. **Use a Single Event-Based Workflow per Data Asset**
45+
- Configure only one event-based entity workflow for each data asset.
46+
47+
Having multiple workflows attempting to update the same field (such as the status of a Glossary Term) can result in unpredictable behavior, as one workflow’s changes may override another’s.
48+
49+
3. **Optimize Batch Size for Periodic Workflows**
50+
- Tune the batch size based on the number of data assets to ensure optimal performance.
51+
- Avoid running periodic workflows across all entities without filtering. Instead, apply an inclusion filter to limit the result set and prevent performance degradation.
52+
4. **Use User Approval Tasks Only in Event-Based Workflows**
53+
- User Approval Tasks should be used exclusively in event-driven workflows.
54+
55+
Using them in periodic workflows would generate multiple approval tasks simultaneously, overloading system resources.
56+
57+
- If an approval step is needed in a periodic batch workflow, ensure the workflow scope is limited to a small, controlled set of entities.
58+
59+
## Limitations
60+
61+
1. **User Approval Tasks Are Limited to Assets with Reviewer Support**
62+
- User Approval Tasks can only be used for data assets that support assigning reviewers.
63+
64+
Reviewer support for additional asset types will be introduced in future releases.
65+
66+
2. **Fallback Behavior for Entities Without Reviewers**
67+
- For entities that do not have any reviewers configured, User Approval Tasks automatically follow the **TRUE** path as a graceful fallback.
68+
69+
*Example:* A Metric without a reviewer will automatically pass the approval step.

how-to-guides/data-governance/workflows/default-workflows/dashboard-certification.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -62,7 +62,7 @@ Certification workflows can be triggered:
6262

6363
- You can apply the workflow to a specific data asset by configuring it through the **Configuration** tab.
6464

65-
<img noZoom src="/public/images/how-to-guides/governance/workflows-table-certification3.png" />
65+
<img noZoom src="/public/images/how-to-guides/governance/workflows-table-certification3.png" alt="dashboard-certification-workflow" />
6666

6767
- On a **scheduled basis** (e.g., daily, weekly)
6868
- **On-demand** via the user interface

0 commit comments

Comments
 (0)