Skip to content

Commit d6d9792

Browse files
committed
Merge branch 'main' of https://github.com/MicrosoftDocs/azure-docs-pr into premPlus
2 parents ce24d0a + 1c1f33d commit d6d9792

6 files changed

+46
-17
lines changed

articles/event-hubs/azure-event-hubs-kafka-overview.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
---
2-
title: Introduction to Apache Kafka on Azure Event Hubs
3-
description: Learn what Apache Kafka on Azure Event Hubs is and how to use it to stream data from Apache Kafka applications without setting up a Kafka cluster on your own.
2+
title: Introduction to Apache Kafka in Event Hubs on Azure Cloud
3+
description: Learn what Apache Kafka in the Event Hubs service on Azure Cloud is and how to use it to stream data from Apache Kafka applications without setting up a Kafka cluster on your own.
44
ms.topic: overview
55
ms.date: 02/03/2023
66
---

articles/purview/catalog-lineage-user-guide.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ One of the platform features of Microsoft Purview is the ability to show the lin
1818
## Lineage collection
1919

2020
Metadata collected in Microsoft Purview from enterprise data systems are stitched across to show an end to end data lineage. Data systems that collect lineage into Microsoft Purview are broadly categorized into following three types:
21-
21+
2222
- [Data processing systems](#data-processing-systems)
2323
- [Data storage systems](#data-storage-systems)
2424
- [Data analytics and reporting systems](#data-analytics-and-reporting-systems)
@@ -60,7 +60,7 @@ Databases & storage solutions such as Oracle, Teradata, and SAP have query engin
6060
|| [SAP S/4HANA](register-scan-saps4hana-source.md) |
6161

6262
### Data analytics and reporting systems
63-
Data analytics and reporting systems like Azure ML and Power BI report lineage into Microsoft Purview. These systems will use the datasets from storage systems and process through their meta model to create BI Dashboards, ML experiments and so on.
63+
Data analytics and reporting systems like Azure Machine Learning and Power BI report lineage into Microsoft Purview. These systems will use the datasets from storage systems and process through their meta model to create BI Dashboards, ML experiments and so on.
6464

6565
| Data analytics & reporting system | Supported scope |
6666
| ---------------------- | ------------|

articles/purview/concept-data-lineage.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -13,15 +13,15 @@ This article provides an overview of data lineage in Microsoft Purview Data Cata
1313

1414
- Raw data staged from various platforms
1515
- Transformed and prepared data
16-
- Data used by visualization platforms.
16+
- Data used by visualization platforms
1717

1818
## Use cases
1919

2020
Data lineage is broadly understood as the lifecycle that spans the data’s origin, and where it moves over time across the data estate. It's used for different kinds of backwards-looking scenarios such as troubleshooting, tracing root cause in data pipelines and debugging. Lineage is also used for data quality analysis, compliance and “what if” scenarios often referred to as impact analysis. Lineage is represented visually to show data moving from source to destination including how the data was transformed. Given the complexity of most enterprise data environments, these views can be hard to understand without doing some consolidation or masking of peripheral data points.
2121

2222
## Lineage experience in Microsoft Purview Data Catalog
2323

24-
Microsoft Purview Data Catalog will connect with other data processing, storage, and analytics systems to extract lineage information. The information is combined to represent a generic, scenario-specific lineage experience in the Catalog.
24+
Microsoft Purview Data Catalog will connect with other data processing, storage, and analytics systems to extract lineage information. The information is combined to represent a generic, scenario-specific lineage experience in the catalog.
2525

2626
:::image type="content" source="media/concept-lineage/lineage-end-end-inline.png" alt-text="end-end lineage showing data copied from blob store all the way to Power BI dashboard" lightbox="media/concept-lineage/lineage-end-end.png":::
2727

articles/purview/concept-scans-and-ingestion.md

Lines changed: 22 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -6,19 +6,24 @@ ms.author: shjia
66
ms.service: purview
77
ms.subservice: purview-data-map
88
ms.topic: conceptual
9-
ms.date: 02/14/2023
9+
ms.date: 03/13/2023
1010
ms.custom: ignite-fall-2021
1111
---
1212

1313
# Scans and ingestion in Microsoft Purview
1414

1515
This article provides an overview of the Scanning and Ingestion features in Microsoft Purview. These features connect your Microsoft Purview account to your sources to populate the data map and data catalog so you can begin exploring and managing your data through Microsoft Purview.
1616

17+
- [**Scanning**](#scanning) captures metadata from [data sources](microsoft-purview-connector-overview.md) and brings it to Microsoft Purview.
18+
- [**Ingestion**](#ingestion) processes metadata and stores it in the data catalog from both:
19+
- Data source scans - scanned metadata is added to the Microsoft Purview Data Map.
20+
- Lineage connections - transformation resources add metadata about their sources, outputs, and activities to the Microsoft Purview Data Map.
21+
1722
## Scanning
1823

1924
After data sources are [registered](manage-data-sources.md) in your Microsoft Purview account, the next step is to scan the data sources. The scanning process establishes a connection to the data source and captures technical metadata like names, file size, columns, and so on. It also extracts schema for structured data sources, applies classifications on schemas, and [applies sensitivity labels if your Microsoft Purview Data Map is connected to a Microsoft Purview compliance portal](create-sensitivity-label.md). The scanning process can be triggered to run immediately or can be scheduled to run on a periodic basis to keep your Microsoft Purview account up to date.
2025

21-
For each scan there are customizations you can apply so that you're only scanning your sources for the information you need.
26+
For each scan, there are customizations you can apply so that you're only scanning information you need, rather than the whole source.
2227

2328
### Choose an authentication method for your scans
2429

@@ -48,15 +53,15 @@ There are [system scan rule sets](create-a-scan-rule-set.md#system-scan-rule-set
4853

4954
### Schedule your scan
5055

51-
Microsoft Purview gives you a choice of scanning weekly or monthly at a specific time you choose. Weekly scans may be appropriate for data sources with structures that are actively under development or frequently change. Monthly scanning is more appropriate for data sources that change infrequently. A good best practice is to work with the administrator of the source you want to scan to identify a time when compute demands on the source are low.
56+
Microsoft Purview gives you a choice of scanning weekly or monthly at a specific time you choose. Weekly scans may be appropriate for data sources with structures that are actively under development or frequently change. Monthly scanning is more appropriate for data sources that change infrequently. Best practice is to work with the administrator of the source you want to scan to identify a time when compute demands on the source are low.
5257

5358
### How scans detect deleted assets
5459

5560
A Microsoft Purview catalog is only aware of the state of a data store when it runs a scan. For the catalog to know if a file, table, or container was deleted, it compares the last scan output against the current scan output. For example, suppose that the last time you scanned an Azure Data Lake Storage Gen2 account, it included a folder named *folder1*. When the same account is scanned again, *folder1* is missing. Therefore, the catalog assumes the folder has been deleted.
5661

5762
#### Detecting deleted files
5863

59-
The logic for detecting missing files works for multiple scans by the same user as well as by different users. For example, suppose a user runs a one-time scan on a Data Lake Storage Gen2 data store on folders A, B, and C. Later, a different user in the same account runs a different one-time scan on folders C, D, and E of the same data store. Because folder C was scanned twice, the catalog checks it for possible deletions. Folders A, B, D, and E, however, were scanned only once, and the catalog won't check them for deleted assets.
64+
The logic for detecting missing files works for multiple scans by the same user and by different users. For example, suppose a user runs a one-time scan on a Data Lake Storage Gen2 data store on folders A, B, and C. Later, a different user in the same account runs a different one-time scan on folders C, D, and E of the same data store. Because folder C was scanned twice, the catalog checks it for possible deletions. Folders A, B, D, and E, however, were scanned only once, and the catalog won't check them for deleted assets.
6065

6166
To keep deleted files out of your catalog, it's important to run regular scans. The scan interval is important, because the catalog can't detect deleted assets until another scan is run. So, if you run scans once a month on a particular store, the catalog can't detect any deleted data assets in that store until you run the next scan a month later.
6267

@@ -67,7 +72,19 @@ When you enumerate large data stores like Data Lake Storage Gen2, there are mult
6772
6873
## Ingestion
6974

70-
The technical metadata or classifications identified by the scanning process are then sent to Ingestion. The ingestion process is responsible for populating the data map and is managed by Microsoft Purview. Ingestion analyses the input from scan, [applies resource set patterns](concept-resource-sets.md#how-microsoft-purview-detects-resource-sets), populates available [lineage](concept-data-lineage.md) information, and then loads the data map automatically. Assets/schemas can be discovered or curated only after ingestion is complete. So, if your scan is completed but you haven't seen your assets in the data map or catalog, you'll need to wait for the ingestion process to finish.
75+
Ingestion is the process responsible for populating the data map with metadata gathered through its various processes.
76+
77+
## Ingestion from scans
78+
79+
The technical metadata or classifications identified by the scanning process are then sent to ingestion. Ingestion analyses the input from scan, [applies resource set patterns](concept-resource-sets.md#how-microsoft-purview-detects-resource-sets), populates available [lineage](concept-data-lineage.md) information, and then loads the data map automatically. Assets/schemas can be discovered or curated only after ingestion is complete. So, if your scan is completed but you haven't seen your assets in the data map or catalog, you'll need to wait for the ingestion process to finish.
80+
81+
## Ingestion from lineage connections
82+
83+
Resources like [Azure Data Factory](how-to-link-azure-data-factory.md) and [Azure Synapse](how-to-lineage-azure-synapse-analytics.md) can be connected to Microsoft Purview to bring data source and lineage information into your Microsoft Purview Data Map. For example, when a copy pipeline runs in an Azure Data Factory that has been connected to Microsoft Purview, metadata about the input sources, the activity, and the output sources are ingested in Microsoft Purview and the information is added to the data map.
84+
85+
If a data source has already been added to the data map through a scan, lineage information about the activity will be added to the existing source. If the data source hasn't yet been added to the data map, the lineage ingestion process will add it to the root collection with its lineage information.
86+
87+
For more information about the available lineage connections, see the [lineage user guide](catalog-lineage-user-guide.md).
7188

7289
## Next steps
7390

articles/purview/how-to-lineage-azure-synapse-analytics.md

Lines changed: 9 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -6,11 +6,17 @@ ms.author: jingwang
66
ms.service: purview
77
ms.subservice: purview-data-catalog
88
ms.topic: how-to
9-
ms.date: 12/14/2022
9+
ms.date: 03/13/2023
1010
---
1111
# How to get lineage from Azure Synapse Analytics into Microsoft Purview
1212

13-
This document explains the steps required for connecting an Azure Synapse workspace with a Microsoft Purview account to track data lineage. The document also gets into the details of the coverage scope and supported lineage capabilities.
13+
This document explains the steps required for connecting an Azure Synapse workspace with a Microsoft Purview account to track [data lineage](concept-data-lineage.md) and [ingest data sources](concept-scans-and-ingestion.md#ingestion). The document also gets into the details of the activity coverage scope and supported lineage capabilities.
14+
15+
When you connect Azure Synapse Analytics to Microsoft Purview, whenever a [supported pipeline activity](#supported-azure-synapse-capabilities) is run, metadata about the activity's source data, output data, and the activity will be automatically [ingested](concept-scans-and-ingestion.md#ingestion) into the Microsoft Purview Data Map.
16+
17+
If a data source has already been scanned and exists in the data map, the ingestion process will add the lineage information from Azure Synapse Analytics to that existing source. If the source or output doesn't exist in the data map and is [supported by Azure Synapse Analytics lineage](#supported-azure-synapse-capabilities) Microsoft Purview will automatically add their metadata from Synapse Analytics into the data map under the root collection.
18+
19+
This can be an excellent way to monitor your data estate as users move and transform information using Azure Synapse Analytics.
1420

1521
## Supported Azure Synapse capabilities
1622

@@ -25,7 +31,7 @@ Currently, Microsoft Purview captures runtime lineage from the following Azure S
2531
[!INCLUDE[azure-synapse-supported-activity-lineage-capabilities](includes/data-factory-common-supported-capabilities.md)]
2632

2733
## Access secured Microsoft Purview account
28-
34+
2935
If your Microsoft Purview account is protected by firewall, learn how to let Azure Synapse [access a secured Microsoft Purview account](../synapse-analytics/catalog-and-governance/how-to-access-secured-purview-account.md) through Microsoft Purview private endpoints.
3036

3137
## Bring Azure Synapse lineage into Microsoft Purview

articles/purview/how-to-link-azure-data-factory.md

Lines changed: 9 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -6,11 +6,17 @@ ms.author: jingwang
66
ms.service: purview
77
ms.subservice: purview-data-catalog
88
ms.topic: how-to
9-
ms.date: 12/14/2022
9+
ms.date: 03/13/2023
1010
---
1111
# How to connect Azure Data Factory and Microsoft Purview
1212

13-
This document explains the steps required for connecting an Azure Data Factory account with a Microsoft Purview account to track data lineage. The document also gets into the details of the coverage scope and supported lineage patterns.
13+
This document explains the steps required for connecting an Azure Data Factory account with a Microsoft Purview account to track [data lineage](concept-data-lineage.md) and [ingest data sources](concept-scans-and-ingestion.md#ingestion). The document also gets into the details of the activity coverage scope and supported lineage patterns.
14+
15+
When you connect an Azure Data Factory to Microsoft Purview, whenever a [supported Azure Data Factory activity](#supported-azure-data-factory-activities) is run, metadata about the activity's source data, output data, and the activity will be automatically [ingested](concept-scans-and-ingestion.md#ingestion) into the Microsoft Purview Data Map.
16+
17+
If a data source has already been scanned and exists in the data map, the ingestion process will add the lineage information from Azure Data Factory to that existing source. If the source or output doesn't exist in the data map and is [supported by Azure Data Factory lineage](#supported-azure-data-factory-activities) Microsoft Purview will automatically add their metadata from Azure Data Factory into the data map under the root collection.
18+
19+
This can be an excellent way to monitor your data estate as users move and transform information using Azure Data Factory.
1420

1521
## View existing Data Factory connections
1622

@@ -98,7 +104,7 @@ The integration between Data Factory and Microsoft Purview supports only a subse
98104
Refer to [supported data stores](how-to-lineage-sql-server-integration-services.md#supported-data-stores).
99105

100106
## Access secured Microsoft Purview account
101-
107+
102108
If your Microsoft Purview account is protected by firewall, learn how to let Data Factory [access a secured Microsoft Purview account](../data-factory/how-to-access-secured-purview-account.md) through Microsoft Purview private endpoints.
103109

104110
## Bring Data Factory lineage into Microsoft Purview

0 commit comments

Comments
 (0)