You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/purview/how-to-lineage-azure-synapse-analytics.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,7 +6,7 @@ ms.author: jingwang
6
6
ms.service: purview
7
7
ms.subservice: purview-data-catalog
8
8
ms.topic: how-to
9
-
ms.date: 09/27/2021
9
+
ms.date: 12/05/2022
10
10
---
11
11
# How to get lineage from Azure Synapse Analytics into Microsoft Purview
12
12
@@ -36,7 +36,7 @@ You can connect an Azure Synapse workspace to Microsoft Purview, and the connect
36
36
37
37
### Step 2: Run pipeline in Azure Synapse workspace
38
38
39
-
You can create pipelines with Copy activity in Azure Synapse workspace. You don't need any additional configuration for lineage data capture. The lineage data will automatically be captured during the activities execution.
39
+
You can create pipelines with Copy activity in Azure Synapse workspace. You don't need any other configuration for lineage data capture. The lineage data will automatically be captured during the activities execution.
Copy file name to clipboardExpand all lines: articles/purview/how-to-lineage-spark-atlas-connector.md
+6-6Lines changed: 6 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,7 +6,7 @@ ms.author: jingwang
6
6
ms.service: purview
7
7
ms.subservice: purview-data-catalog
8
8
ms.topic: how-to
9
-
ms.date: 04/28/2021
9
+
ms.date: 12/05/2022
10
10
---
11
11
# How to use Apache Atlas connector to collect Spark lineage
12
12
@@ -24,7 +24,7 @@ Since Microsoft Purview supports Atlas API and Atlas native hook, the connector
24
24
25
25
## Configuration requirement
26
26
27
-
The connectors require a version of Spark 2.4.0+. But Spark version 3 is not supported. The Spark supports three types of listener required to be set:
27
+
The connectors require a version of Spark 2.4.0+. But Spark version 3 isn't supported. The Spark supports three types of listener required to be set:
28
28
29
29
| Listener | Since Spark Version|
30
30
| ------------------- | ------------------- |
@@ -42,7 +42,7 @@ The following steps are documented based on DataBricks as an example:
42
42
43
43
1. Generate package
44
44
1. Pull code from GitHub: https://github.com/hortonworks-spark/spark-atlas-connector
45
-
2.[For Windows] Comment out the **maven-enforcer-plugin** in spark-atlas-connector\pom.xml to remove the dependency on Unix.
45
+
2.[For Windows], Comment out the **maven-enforcer-plugin** in spark-atlas-connector\pom.xml to remove the dependency on Unix.
46
46
47
47
```web
48
48
<requireOS>
@@ -161,14 +161,14 @@ Kick off The Spark job and check the lineage info in your Microsoft Purview acco
161
161
:::image type="content" source="./media/how-to-lineage-spark-atlas-connector/purview-with-spark-lineage.png" alt-text="Screenshot showing purview with spark lineage" lightbox="./media/how-to-lineage-spark-atlas-connector/purview-with-spark-lineage.png":::
162
162
163
163
## Known limitations with the connector for Spark lineage
164
-
1. Supports SQL/DataFrame API (in other words, it does not support RDD). This connector relies on query listener to retrieve query and examine the impacts.
164
+
1. Supports SQL/DataFrame API (in other words, it doesn't support RDD). This connector relies on query listener to retrieve query and examine the impacts.
165
165
166
166
2. All "inputs" and "outputs" from multiple queries are combined into single "spark_process" entity.
167
167
168
168
"spark_process" maps to an "applicationId" in Spark. It allows admin to track all changes that occurred as part of an application. But also causes lineage/relationship graph in "spark_process" to be complicated and less meaningful.
169
169
3. Only part of inputs is tracked in Streaming query.
170
170
171
-
* Kafka source supports subscribing with "pattern" and this connector does not enumerate all existing matching topics, or even all possible topics
171
+
* Kafka source supports subscribing with "pattern" and this connector doesn't enumerate all existing matching topics, or even all possible topics
172
172
173
173
* The "executed plan" provides actual topics with (micro) batch reads and processes. As a result, only inputs that participate in (micro) batch are included as "inputs" of "spark_process" entity.
174
174
@@ -178,7 +178,7 @@ Kick off The Spark job and check the lineage info in your Microsoft Purview acco
178
178
179
179
The "drop table" event from Spark only provides db and table name, which is NOT sufficient to create the unique key to recognize the table.
180
180
181
-
The connector depends on reading the Spark Catalog to get table information. Spark have already dropped the table when this connector notices the table is dropped, so drop table will not work.
181
+
The connector depends on reading the Spark Catalog to get table information. Spark have already dropped the table when this connector notices the table is dropped, so drop table won't work.
Copy file name to clipboardExpand all lines: articles/purview/how-to-link-azure-data-factory.md
+4-4Lines changed: 4 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,7 +6,7 @@ ms.author: jingwang
6
6
ms.service: purview
7
7
ms.subservice: purview-data-catalog
8
8
ms.topic: how-to
9
-
ms.date: 11/01/2021
9
+
ms.date: 12/05/2022
10
10
---
11
11
# How to connect Azure Data Factory and Microsoft Purview
12
12
@@ -52,7 +52,7 @@ Follow the steps below to connect an existing data factory to your Microsoft Pur
52
52
53
53
Some Data Factory instances might be disabled if the data factory is already connected to the current Microsoft Purview account, or the data factory doesn't have a managed identity.
54
54
55
-
A warning message will be displayed if any of the selected Data Factories are already connected to other Microsoft Purview account. By selecting OK, the Data Factory connection with the other Microsoft Purview account will be disconnected. No additional confirmations are required.
55
+
A warning message will be displayed if any of the selected Data Factories are already connected to other Microsoft Purview account. When you select OK, the Data Factory connection with the other Microsoft Purview account will be disconnected. No other confirmations are required.
56
56
57
57
:::image type="content" source="./media/how-to-link-azure-data-factory/warning-for-disconnect-factory.png" alt-text="Screenshot showing warning to disconnect Azure Data Factory.":::
58
58
@@ -61,7 +61,7 @@ Follow the steps below to connect an existing data factory to your Microsoft Pur
61
61
62
62
### How authentication works
63
63
64
-
Data factory's managed identity is used to authenticate lineage push operations from data factory to Microsoft Purview. When connecting data factory to Microsoft Purview on UI, it adds the role assignment automatically.
64
+
Data factory's managed identity is used to authenticate lineage push operations from data factory to Microsoft Purview. When you connect your data factory to Microsoft Purview on UI, it adds the role assignment automatically.
65
65
66
66
Grant the data factory's managed identity **Data Curator** role on Microsoft Purview **root collection**. Learn more about [Access control in Microsoft Purview](../purview/catalog-permissions.md) and [Add roles and restrict access through collections](../purview/how-to-create-and-manage-collections.md#add-roles-and-restrict-access-through-collections).
67
67
@@ -127,7 +127,7 @@ An example of this pattern would be the following:
127
127
128
128
### Data movement with 1:1 lineage and wildcard support
129
129
130
-
Another common scenario for capturing lineage, is using a wildcard to copy files from a single input dataset to a single output dataset. The wildcard allows the copy activity to match multiple files for copying using a common portion of the file name. Microsoft Purview captures file-level lineage for each individual file copied by the corresponding copy activity.
130
+
Another common scenario for capturing lineage is using a wildcard to copy files from a single input dataset to a single output dataset. The wildcard allows the copy activity to match multiple files for copying using a common portion of the file name. Microsoft Purview captures file-level lineage for each individual file copied by the corresponding copy activity.
131
131
132
132
An example of this pattern would be the following:
0 commit comments