Skip to content

Commit 3034b01

Browse files
Merge pull request #220497 from whhender/purview-freshness-jingwang
Purview freshness jingwang
2 parents 5941e39 + 4e87939 commit 3034b01

5 files changed

+15
-15
lines changed

articles/purview/concept-data-lineage.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ author: linda33wj
55
ms.author: jingwang
66
ms.service: purview
77
ms.topic: conceptual
8-
ms.date: 09/27/2021
8+
ms.date: 12/05/2022
99
---
1010
# Data lineage in Microsoft Purview
1111

articles/purview/how-to-lineage-azure-synapse-analytics.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ ms.author: jingwang
66
ms.service: purview
77
ms.subservice: purview-data-catalog
88
ms.topic: how-to
9-
ms.date: 09/27/2021
9+
ms.date: 12/05/2022
1010
---
1111
# How to get lineage from Azure Synapse Analytics into Microsoft Purview
1212

@@ -36,7 +36,7 @@ You can connect an Azure Synapse workspace to Microsoft Purview, and the connect
3636

3737
### Step 2: Run pipeline in Azure Synapse workspace
3838

39-
You can create pipelines with Copy activity in Azure Synapse workspace. You don't need any additional configuration for lineage data capture. The lineage data will automatically be captured during the activities execution.
39+
You can create pipelines with Copy activity in Azure Synapse workspace. You don't need any other configuration for lineage data capture. The lineage data will automatically be captured during the activities execution.
4040

4141
### Step 3: Monitor lineage reporting status
4242

articles/purview/how-to-lineage-spark-atlas-connector.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ ms.author: jingwang
66
ms.service: purview
77
ms.subservice: purview-data-catalog
88
ms.topic: how-to
9-
ms.date: 04/28/2021
9+
ms.date: 12/05/2022
1010
---
1111
# How to use Apache Atlas connector to collect Spark lineage
1212

@@ -24,7 +24,7 @@ Since Microsoft Purview supports Atlas API and Atlas native hook, the connector
2424

2525
## Configuration requirement
2626

27-
The connectors require a version of Spark 2.4.0+. But Spark version 3 is not supported. The Spark supports three types of listener required to be set:
27+
The connectors require a version of Spark 2.4.0+. But Spark version 3 isn't supported. The Spark supports three types of listener required to be set:
2828

2929
| Listener | Since Spark Version|
3030
| ------------------- | ------------------- |
@@ -42,7 +42,7 @@ The following steps are documented based on DataBricks as an example:
4242

4343
1. Generate package
4444
1. Pull code from GitHub: https://github.com/hortonworks-spark/spark-atlas-connector
45-
2. [For Windows] Comment out the **maven-enforcer-plugin** in spark-atlas-connector\pom.xml to remove the dependency on Unix.
45+
2. [For Windows], Comment out the **maven-enforcer-plugin** in spark-atlas-connector\pom.xml to remove the dependency on Unix.
4646

4747
```web
4848
<requireOS>
@@ -161,14 +161,14 @@ Kick off The Spark job and check the lineage info in your Microsoft Purview acco
161161
:::image type="content" source="./media/how-to-lineage-spark-atlas-connector/purview-with-spark-lineage.png" alt-text="Screenshot showing purview with spark lineage" lightbox="./media/how-to-lineage-spark-atlas-connector/purview-with-spark-lineage.png":::
162162

163163
## Known limitations with the connector for Spark lineage
164-
1. Supports SQL/DataFrame API (in other words, it does not support RDD). This connector relies on query listener to retrieve query and examine the impacts.
164+
1. Supports SQL/DataFrame API (in other words, it doesn't support RDD). This connector relies on query listener to retrieve query and examine the impacts.
165165

166166
2. All "inputs" and "outputs" from multiple queries are combined into single "spark_process" entity.
167167

168168
"spark_process" maps to an "applicationId" in Spark. It allows admin to track all changes that occurred as part of an application. But also causes lineage/relationship graph in "spark_process" to be complicated and less meaningful.
169169
3. Only part of inputs is tracked in Streaming query.
170170

171-
* Kafka source supports subscribing with "pattern" and this connector does not enumerate all existing matching topics, or even all possible topics
171+
* Kafka source supports subscribing with "pattern" and this connector doesn't enumerate all existing matching topics, or even all possible topics
172172

173173
* The "executed plan" provides actual topics with (micro) batch reads and processes. As a result, only inputs that participate in (micro) batch are included as "inputs" of "spark_process" entity.
174174

@@ -178,7 +178,7 @@ Kick off The Spark job and check the lineage info in your Microsoft Purview acco
178178

179179
The "drop table" event from Spark only provides db and table name, which is NOT sufficient to create the unique key to recognize the table.
180180

181-
The connector depends on reading the Spark Catalog to get table information. Spark have already dropped the table when this connector notices the table is dropped, so drop table will not work.
181+
The connector depends on reading the Spark Catalog to get table information. Spark have already dropped the table when this connector notices the table is dropped, so drop table won't work.
182182

183183

184184
## Next steps

articles/purview/how-to-link-azure-data-factory.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ ms.author: jingwang
66
ms.service: purview
77
ms.subservice: purview-data-catalog
88
ms.topic: how-to
9-
ms.date: 11/01/2021
9+
ms.date: 12/05/2022
1010
---
1111
# How to connect Azure Data Factory and Microsoft Purview
1212

@@ -52,7 +52,7 @@ Follow the steps below to connect an existing data factory to your Microsoft Pur
5252

5353
Some Data Factory instances might be disabled if the data factory is already connected to the current Microsoft Purview account, or the data factory doesn't have a managed identity.
5454

55-
A warning message will be displayed if any of the selected Data Factories are already connected to other Microsoft Purview account. By selecting OK, the Data Factory connection with the other Microsoft Purview account will be disconnected. No additional confirmations are required.
55+
A warning message will be displayed if any of the selected Data Factories are already connected to other Microsoft Purview account. When you select OK, the Data Factory connection with the other Microsoft Purview account will be disconnected. No other confirmations are required.
5656

5757
:::image type="content" source="./media/how-to-link-azure-data-factory/warning-for-disconnect-factory.png" alt-text="Screenshot showing warning to disconnect Azure Data Factory.":::
5858

@@ -61,7 +61,7 @@ Follow the steps below to connect an existing data factory to your Microsoft Pur
6161
6262
### How authentication works
6363

64-
Data factory's managed identity is used to authenticate lineage push operations from data factory to Microsoft Purview. When connecting data factory to Microsoft Purview on UI, it adds the role assignment automatically.
64+
Data factory's managed identity is used to authenticate lineage push operations from data factory to Microsoft Purview. When you connect your data factory to Microsoft Purview on UI, it adds the role assignment automatically.
6565

6666
Grant the data factory's managed identity **Data Curator** role on Microsoft Purview **root collection**. Learn more about [Access control in Microsoft Purview](../purview/catalog-permissions.md) and [Add roles and restrict access through collections](../purview/how-to-create-and-manage-collections.md#add-roles-and-restrict-access-through-collections).
6767

@@ -127,7 +127,7 @@ An example of this pattern would be the following:
127127

128128
### Data movement with 1:1 lineage and wildcard support
129129

130-
Another common scenario for capturing lineage, is using a wildcard to copy files from a single input dataset to a single output dataset. The wildcard allows the copy activity to match multiple files for copying using a common portion of the file name. Microsoft Purview captures file-level lineage for each individual file copied by the corresponding copy activity.
130+
Another common scenario for capturing lineage is using a wildcard to copy files from a single input dataset to a single output dataset. The wildcard allows the copy activity to match multiple files for copying using a common portion of the file name. Microsoft Purview captures file-level lineage for each individual file copied by the corresponding copy activity.
131131

132132
An example of this pattern would be the following:
133133

articles/purview/troubleshoot-connections.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ ms.author: jingwang
66
ms.service: purview
77
ms.subservice: purview-data-map
88
ms.topic: how-to
9-
ms.date: 09/27/2021
9+
ms.date: 12/05/2022
1010
ms.custom: ignite-fall-2021
1111
---
1212
# Troubleshoot your connections in Microsoft Purview
@@ -90,5 +90,5 @@ If your Microsoft Purview scan used to successfully run, but are now failing, ch
9090

9191
## Next steps
9292

93-
- [Browse the Microsoft Purview Data catalog](how-to-browse-catalog.md)
93+
- [Browse the Microsoft Purview Data Catalog](how-to-browse-catalog.md)
9494
- [Search the Microsoft Purview Data Catalog](how-to-search-catalog.md)

0 commit comments

Comments
 (0)