You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/event-hubs/azure-event-hubs-kafka-overview.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,6 @@
1
1
---
2
-
title: Introduction to Apache Kafka on Azure Event Hubs
3
-
description: Learn what Apache Kafka on Azure Event Hubs is and how to use it to stream data from Apache Kafka applications without setting up a Kafka cluster on your own.
2
+
title: Introduction to Apache Kafka in Event Hubs on Azure Cloud
3
+
description: Learn what Apache Kafka in the Event Hubs service on Azure Cloud is and how to use it to stream data from Apache Kafka applications without setting up a Kafka cluster on your own.
Copy file name to clipboardExpand all lines: articles/purview/catalog-lineage-user-guide.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -18,7 +18,7 @@ One of the platform features of Microsoft Purview is the ability to show the lin
18
18
## Lineage collection
19
19
20
20
Metadata collected in Microsoft Purview from enterprise data systems are stitched across to show an end to end data lineage. Data systems that collect lineage into Microsoft Purview are broadly categorized into following three types:
Data analytics and reporting systems like Azure ML and Power BI report lineage into Microsoft Purview. These systems will use the datasets from storage systems and process through their meta model to create BI Dashboards, ML experiments and so on.
63
+
Data analytics and reporting systems like Azure Machine Learning and Power BI report lineage into Microsoft Purview. These systems will use the datasets from storage systems and process through their meta model to create BI Dashboards, ML experiments and so on.
64
64
65
65
| Data analytics & reporting system | Supported scope |
Copy file name to clipboardExpand all lines: articles/purview/concept-data-lineage.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -13,15 +13,15 @@ This article provides an overview of data lineage in Microsoft Purview Data Cata
13
13
14
14
- Raw data staged from various platforms
15
15
- Transformed and prepared data
16
-
- Data used by visualization platforms.
16
+
- Data used by visualization platforms
17
17
18
18
## Use cases
19
19
20
20
Data lineage is broadly understood as the lifecycle that spans the data’s origin, and where it moves over time across the data estate. It's used for different kinds of backwards-looking scenarios such as troubleshooting, tracing root cause in data pipelines and debugging. Lineage is also used for data quality analysis, compliance and “what if” scenarios often referred to as impact analysis. Lineage is represented visually to show data moving from source to destination including how the data was transformed. Given the complexity of most enterprise data environments, these views can be hard to understand without doing some consolidation or masking of peripheral data points.
21
21
22
22
## Lineage experience in Microsoft Purview Data Catalog
23
23
24
-
Microsoft Purview Data Catalog will connect with other data processing, storage, and analytics systems to extract lineage information. The information is combined to represent a generic, scenario-specific lineage experience in the Catalog.
24
+
Microsoft Purview Data Catalog will connect with other data processing, storage, and analytics systems to extract lineage information. The information is combined to represent a generic, scenario-specific lineage experience in the catalog.
25
25
26
26
:::image type="content" source="media/concept-lineage/lineage-end-end-inline.png" alt-text="end-end lineage showing data copied from blob store all the way to Power BI dashboard" lightbox="media/concept-lineage/lineage-end-end.png":::
Copy file name to clipboardExpand all lines: articles/purview/concept-scans-and-ingestion.md
+22-5Lines changed: 22 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,19 +6,24 @@ ms.author: shjia
6
6
ms.service: purview
7
7
ms.subservice: purview-data-map
8
8
ms.topic: conceptual
9
-
ms.date: 02/14/2023
9
+
ms.date: 03/13/2023
10
10
ms.custom: ignite-fall-2021
11
11
---
12
12
13
13
# Scans and ingestion in Microsoft Purview
14
14
15
15
This article provides an overview of the Scanning and Ingestion features in Microsoft Purview. These features connect your Microsoft Purview account to your sources to populate the data map and data catalog so you can begin exploring and managing your data through Microsoft Purview.
16
16
17
+
-[**Scanning**](#scanning) captures metadata from [data sources](microsoft-purview-connector-overview.md) and brings it to Microsoft Purview.
18
+
-[**Ingestion**](#ingestion) processes metadata and stores it in the data catalog from both:
19
+
- Data source scans - scanned metadata is added to the Microsoft Purview Data Map.
20
+
- Lineage connections - transformation resources add metadata about their sources, outputs, and activities to the Microsoft Purview Data Map.
21
+
17
22
## Scanning
18
23
19
24
After data sources are [registered](manage-data-sources.md) in your Microsoft Purview account, the next step is to scan the data sources. The scanning process establishes a connection to the data source and captures technical metadata like names, file size, columns, and so on. It also extracts schema for structured data sources, applies classifications on schemas, and [applies sensitivity labels if your Microsoft Purview Data Map is connected to a Microsoft Purview compliance portal](create-sensitivity-label.md). The scanning process can be triggered to run immediately or can be scheduled to run on a periodic basis to keep your Microsoft Purview account up to date.
20
25
21
-
For each scan there are customizations you can apply so that you're only scanning your sources for the information you need.
26
+
For each scan, there are customizations you can apply so that you're only scanning information you need, rather than the whole source.
22
27
23
28
### Choose an authentication method for your scans
24
29
@@ -48,15 +53,15 @@ There are [system scan rule sets](create-a-scan-rule-set.md#system-scan-rule-set
48
53
49
54
### Schedule your scan
50
55
51
-
Microsoft Purview gives you a choice of scanning weekly or monthly at a specific time you choose. Weekly scans may be appropriate for data sources with structures that are actively under development or frequently change. Monthly scanning is more appropriate for data sources that change infrequently. A good best practice is to work with the administrator of the source you want to scan to identify a time when compute demands on the source are low.
56
+
Microsoft Purview gives you a choice of scanning weekly or monthly at a specific time you choose. Weekly scans may be appropriate for data sources with structures that are actively under development or frequently change. Monthly scanning is more appropriate for data sources that change infrequently. Best practice is to work with the administrator of the source you want to scan to identify a time when compute demands on the source are low.
52
57
53
58
### How scans detect deleted assets
54
59
55
60
A Microsoft Purview catalog is only aware of the state of a data store when it runs a scan. For the catalog to know if a file, table, or container was deleted, it compares the last scan output against the current scan output. For example, suppose that the last time you scanned an Azure Data Lake Storage Gen2 account, it included a folder named *folder1*. When the same account is scanned again, *folder1* is missing. Therefore, the catalog assumes the folder has been deleted.
56
61
57
62
#### Detecting deleted files
58
63
59
-
The logic for detecting missing files works for multiple scans by the same user as well as by different users. For example, suppose a user runs a one-time scan on a Data Lake Storage Gen2 data store on folders A, B, and C. Later, a different user in the same account runs a different one-time scan on folders C, D, and E of the same data store. Because folder C was scanned twice, the catalog checks it for possible deletions. Folders A, B, D, and E, however, were scanned only once, and the catalog won't check them for deleted assets.
64
+
The logic for detecting missing files works for multiple scans by the same user and by different users. For example, suppose a user runs a one-time scan on a Data Lake Storage Gen2 data store on folders A, B, and C. Later, a different user in the same account runs a different one-time scan on folders C, D, and E of the same data store. Because folder C was scanned twice, the catalog checks it for possible deletions. Folders A, B, D, and E, however, were scanned only once, and the catalog won't check them for deleted assets.
60
65
61
66
To keep deleted files out of your catalog, it's important to run regular scans. The scan interval is important, because the catalog can't detect deleted assets until another scan is run. So, if you run scans once a month on a particular store, the catalog can't detect any deleted data assets in that store until you run the next scan a month later.
62
67
@@ -67,7 +72,19 @@ When you enumerate large data stores like Data Lake Storage Gen2, there are mult
67
72
68
73
## Ingestion
69
74
70
-
The technical metadata or classifications identified by the scanning process are then sent to Ingestion. The ingestion process is responsible for populating the data map and is managed by Microsoft Purview. Ingestion analyses the input from scan, [applies resource set patterns](concept-resource-sets.md#how-microsoft-purview-detects-resource-sets), populates available [lineage](concept-data-lineage.md) information, and then loads the data map automatically. Assets/schemas can be discovered or curated only after ingestion is complete. So, if your scan is completed but you haven't seen your assets in the data map or catalog, you'll need to wait for the ingestion process to finish.
75
+
Ingestion is the process responsible for populating the data map with metadata gathered through its various processes.
76
+
77
+
## Ingestion from scans
78
+
79
+
The technical metadata or classifications identified by the scanning process are then sent to ingestion. Ingestion analyses the input from scan, [applies resource set patterns](concept-resource-sets.md#how-microsoft-purview-detects-resource-sets), populates available [lineage](concept-data-lineage.md) information, and then loads the data map automatically. Assets/schemas can be discovered or curated only after ingestion is complete. So, if your scan is completed but you haven't seen your assets in the data map or catalog, you'll need to wait for the ingestion process to finish.
80
+
81
+
## Ingestion from lineage connections
82
+
83
+
Resources like [Azure Data Factory](how-to-link-azure-data-factory.md) and [Azure Synapse](how-to-lineage-azure-synapse-analytics.md) can be connected to Microsoft Purview to bring data source and lineage information into your Microsoft Purview Data Map. For example, when a copy pipeline runs in an Azure Data Factory that has been connected to Microsoft Purview, metadata about the input sources, the activity, and the output sources are ingested in Microsoft Purview and the information is added to the data map.
84
+
85
+
If a data source has already been added to the data map through a scan, lineage information about the activity will be added to the existing source. If the data source hasn't yet been added to the data map, the lineage ingestion process will add it to the root collection with its lineage information.
86
+
87
+
For more information about the available lineage connections, see the [lineage user guide](catalog-lineage-user-guide.md).
Copy file name to clipboardExpand all lines: articles/purview/how-to-lineage-azure-synapse-analytics.md
+9-3Lines changed: 9 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,11 +6,17 @@ ms.author: jingwang
6
6
ms.service: purview
7
7
ms.subservice: purview-data-catalog
8
8
ms.topic: how-to
9
-
ms.date: 12/14/2022
9
+
ms.date: 03/13/2023
10
10
---
11
11
# How to get lineage from Azure Synapse Analytics into Microsoft Purview
12
12
13
-
This document explains the steps required for connecting an Azure Synapse workspace with a Microsoft Purview account to track data lineage. The document also gets into the details of the coverage scope and supported lineage capabilities.
13
+
This document explains the steps required for connecting an Azure Synapse workspace with a Microsoft Purview account to track [data lineage](concept-data-lineage.md) and [ingest data sources](concept-scans-and-ingestion.md#ingestion). The document also gets into the details of the activity coverage scope and supported lineage capabilities.
14
+
15
+
When you connect Azure Synapse Analytics to Microsoft Purview, whenever a [supported pipeline activity](#supported-azure-synapse-capabilities) is run, metadata about the activity's source data, output data, and the activity will be automatically [ingested](concept-scans-and-ingestion.md#ingestion) into the Microsoft Purview Data Map.
16
+
17
+
If a data source has already been scanned and exists in the data map, the ingestion process will add the lineage information from Azure Synapse Analytics to that existing source. If the source or output doesn't exist in the data map and is [supported by Azure Synapse Analytics lineage](#supported-azure-synapse-capabilities) Microsoft Purview will automatically add their metadata from Synapse Analytics into the data map under the root collection.
18
+
19
+
This can be an excellent way to monitor your data estate as users move and transform information using Azure Synapse Analytics.
14
20
15
21
## Supported Azure Synapse capabilities
16
22
@@ -25,7 +31,7 @@ Currently, Microsoft Purview captures runtime lineage from the following Azure S
If your Microsoft Purview account is protected by firewall, learn how to let Azure Synapse [access a secured Microsoft Purview account](../synapse-analytics/catalog-and-governance/how-to-access-secured-purview-account.md) through Microsoft Purview private endpoints.
30
36
31
37
## Bring Azure Synapse lineage into Microsoft Purview
Copy file name to clipboardExpand all lines: articles/purview/how-to-link-azure-data-factory.md
+9-3Lines changed: 9 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,11 +6,17 @@ ms.author: jingwang
6
6
ms.service: purview
7
7
ms.subservice: purview-data-catalog
8
8
ms.topic: how-to
9
-
ms.date: 12/14/2022
9
+
ms.date: 03/13/2023
10
10
---
11
11
# How to connect Azure Data Factory and Microsoft Purview
12
12
13
-
This document explains the steps required for connecting an Azure Data Factory account with a Microsoft Purview account to track data lineage. The document also gets into the details of the coverage scope and supported lineage patterns.
13
+
This document explains the steps required for connecting an Azure Data Factory account with a Microsoft Purview account to track [data lineage](concept-data-lineage.md) and [ingest data sources](concept-scans-and-ingestion.md#ingestion). The document also gets into the details of the activity coverage scope and supported lineage patterns.
14
+
15
+
When you connect an Azure Data Factory to Microsoft Purview, whenever a [supported Azure Data Factory activity](#supported-azure-data-factory-activities) is run, metadata about the activity's source data, output data, and the activity will be automatically [ingested](concept-scans-and-ingestion.md#ingestion) into the Microsoft Purview Data Map.
16
+
17
+
If a data source has already been scanned and exists in the data map, the ingestion process will add the lineage information from Azure Data Factory to that existing source. If the source or output doesn't exist in the data map and is [supported by Azure Data Factory lineage](#supported-azure-data-factory-activities) Microsoft Purview will automatically add their metadata from Azure Data Factory into the data map under the root collection.
18
+
19
+
This can be an excellent way to monitor your data estate as users move and transform information using Azure Data Factory.
14
20
15
21
## View existing Data Factory connections
16
22
@@ -98,7 +104,7 @@ The integration between Data Factory and Microsoft Purview supports only a subse
98
104
Refer to [supported data stores](how-to-lineage-sql-server-integration-services.md#supported-data-stores).
99
105
100
106
## Access secured Microsoft Purview account
101
-
107
+
102
108
If your Microsoft Purview account is protected by firewall, learn how to let Data Factory [access a secured Microsoft Purview account](../data-factory/how-to-access-secured-purview-account.md) through Microsoft Purview private endpoints.
103
109
104
110
## Bring Data Factory lineage into Microsoft Purview
0 commit comments