You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/purview/catalog-lineage-user-guide.md
+37-18Lines changed: 37 additions & 18 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -5,7 +5,7 @@ author: chanuengg
5
5
ms.author: csugunan
6
6
ms.service: purview
7
7
ms.topic: conceptual
8
-
ms.date: 01/20/2022
8
+
ms.date: 09/20/2022
9
9
---
10
10
# Microsoft Purview Data Catalog lineage user guide
11
11
@@ -39,7 +39,7 @@ Data integration and ETL tools can push lineage into Microsoft Purview at execut
39
39
| Azure Data Share |[Share snapshot](how-to-link-azure-data-share.md)|
40
40
41
41
### Data storage systems
42
-
Databases & storage solutions such as Oracle, Teradata, and SAP have query engines to transform data using scripting language. Data lineage from views/stored procedures/etc are collected into Microsoft Purview and stitched with lineage from other systems. Lineage is supported for the following data sources via Microsoft Purview data scan. Learn more about the supported lineage scenarios from the respective article.
42
+
Databases & storage solutions such as Oracle, Teradata, and SAP have query engines to transform data using scripting language. Data lineage information from views/stored procedures/etc is collected into Microsoft Purview and stitched with lineage from other systems. Lineage is supported for the following data sources via Microsoft Purview data scan. Learn more about the supported lineage scenarios from the respective article.
43
43
44
44
|**Category**|**Data source**|
45
45
|---|---|
@@ -90,51 +90,70 @@ To access lineage information for an asset in Microsoft Purview, follow the step
90
90
91
91
Microsoft Purview supports asset level lineage for the datasets and processes. To see the asset level lineage go to the **Lineage** tab of the current asset in the catalog. Select the current dataset asset node. By default the list of columns belonging to the data appears in the left pane.
92
92
93
-
:::image type="content" source="./media/catalog-lineage-user-guide/view-columns-from-lineage.png" alt-text="Screenshot showing how to select View columns in the lineage page" border="true":::
93
+
:::image type="content" source="./media/catalog-lineage-user-guide/view-columns-from-lineage-inline.png" alt-text="Screenshot showing how to select View columns in the lineage page" lightbox="./media/catalog-lineage-user-guide/view-columns-from-lineage.png"border="true":::
94
94
95
95
## Dataset column lineage
96
96
97
97
To see column-level lineage of a dataset, go to the **Lineage** tab of the current asset in the catalog and follow below steps:
98
98
99
99
1. Once you are in the lineage tab, in the left pane, select the check box next to each column you want to display in the data lineage.
100
100
101
-
:::image type="content" source="./media/catalog-lineage-user-guide/select-columns-to-show-in-lineage.png" alt-text="Screenshot showing how to select columns to display in the lineage page." lightbox="./media/catalog-lineage-user-guide/select-columns-to-show-in-lineage.png":::
101
+
:::image type="content" source="./media/catalog-lineage-user-guide/select-columns-to-show-in-lineage-inline.png" alt-text="Screenshot showing how to select columns to display in the lineage page." lightbox="./media/catalog-lineage-user-guide/select-columns-to-show-in-lineage.png":::
102
102
103
-
2. Hover over a selected column on the left pane or in the dataset of the lineage canvas to see the column mapping. All the column instances are highlighted.
103
+
1. Hover over a selected column on the left pane or in the dataset of the lineage canvas to see the column mapping. All the column instances are highlighted.
104
104
105
-
:::image type="content" source="./media/catalog-lineage-user-guide/show-column-flow-in-lineage.png" alt-text="Screenshot showing how to hover over a column name to highlight the column flow in a data lineage path." lightbox="./media/catalog-lineage-user-guide/show-column-flow-in-lineage.png":::
105
+
:::image type="content" source="./media/catalog-lineage-user-guide/show-column-flow-in-lineage-inline.png" alt-text="Screenshot showing how to hover over a column name to highlight the column flow in a data lineage path." lightbox="./media/catalog-lineage-user-guide/show-column-flow-in-lineage.png":::
106
106
107
-
3. If the number of columns is larger than what can be displayed in the left pane, use the filter option to select a specific column by name. Alternatively, you can use your mouse to scroll through the list.
107
+
1. If the number of columns is larger than what can be displayed in the left pane, use the filter option to select a specific column by name. Alternatively, you can use your mouse to scroll through the list.
108
108
109
109
:::image type="content" source="./media/catalog-lineage-user-guide/filter-columns-by-name.png" alt-text="Screenshot showing how to filter columns by column name on the lineage page." lightbox="./media/catalog-lineage-user-guide/filter-columns-by-name.png":::
110
110
111
-
4. If the lineage canvas contains more nodes and edges, use the filter to select data asset or process nodes by name. Alternatively, you can use your mouse to pan around the lineage window.
111
+
1. If the lineage canvas contains more nodes and edges, use the filter to select data asset or process nodes by name. Alternatively, you can use your mouse to pan around the lineage window.
112
112
113
113
:::image type="content" source="./media/catalog-lineage-user-guide/filter-assets-by-name.png" alt-text="Screenshot showing data asset nodes by name on the lineage page." lightbox="./media/catalog-lineage-user-guide/filter-assets-by-name.png":::
114
114
115
-
5. Use the toggle in the left pane to highlight the list of datasets in the lineage canvas. If you turn off the toggle, any asset that contains at least one of the selected columns is displayed. If you turn on the toggle, only datasets that contain all of the columns are displayed.
115
+
1. Use the toggle in the left pane to highlight the list of datasets in the lineage canvas. If you turn off the toggle, any asset that contains at least one of the selected columns is displayed. If you turn on the toggle, only datasets that contain all of the columns are displayed.
116
116
117
117
:::image type="content" source="./media/catalog-lineage-user-guide/use-toggle-to-filter-nodes.png" alt-text="Screenshot showing how to use the toggle to filter the list of nodes on the lineage page." lightbox="./media/catalog-lineage-user-guide/use-toggle-to-filter-nodes.png":::
118
118
119
119
## Process column lineage
120
-
Data process can take one or more input datasets to produce one or more outputs. In Microsoft Purview, column level lineage is available for process nodes.
121
-
1. Switch between input and output datasets from a drop down in the columns panel.
122
-
2. Select columns from one or more tables to see the lineage flowing from input dataset to corresponding output dataset.
123
120
124
-
:::image type="content" source="./media/catalog-lineage-user-guide/process-column-lineage.png" alt-text="Screenshot showing columns lineage of a process node." lightbox="./media/catalog-lineage-user-guide/process-column-lineage.png":::
121
+
You can also view data processes, like copy activities, in the data catalog. For example, in this lineage flow, select the copy activity:
122
+
123
+
:::image type="content" source="./media/catalog-lineage-user-guide/select-copy-activity-inline.png" alt-text="Screenshot of a data lineage flow with one of the copy activity nodes highlighted." lightbox="./media/catalog-lineage-user-guide/select-copy-activity.png":::
124
+
125
+
The copy activity will expand, and then you can select the **Switch to asset** button, which will give you more details about the process itself.
126
+
127
+
:::image type="content" source="./media/catalog-lineage-user-guide/switch-to-asset-inline.png" alt-text="Screenshot of the copy activity node expanded, and the new switch to asset button selected." lightbox="./media/catalog-lineage-user-guide/switch-to-asset.png":::
128
+
129
+
Data process can take one or more input datasets to produce one or more outputs. In Microsoft Purview, column level lineage is available for process nodes.
130
+
131
+
1. Switch between input and output datasets from a drop-down in the columns panel.
132
+
1. Select columns from one or more tables to see the lineage flowing from input dataset to corresponding output dataset.
133
+
134
+
:::image type="content" source="./media/catalog-lineage-user-guide/process-column-lineage-inline.png" alt-text="Screenshot showing columns lineage of a process node." lightbox="./media/catalog-lineage-user-guide/process-column-lineage.png":::
125
135
126
136
## Browse assets in lineage
137
+
127
138
1. Select **Switch to asset** on any asset to view its corresponding metadata from the lineage view. Doing so is an effective way to browse to another asset in the catalog from the lineage view.
128
139
129
-
:::image type="content" source="./media/catalog-lineage-user-guide/select-switch-to-asset.png" alt-text="Screenshot how to select Switch to asset in a lineage data asset." lightbox="./media/catalog-lineage-user-guide/select-switch-to-asset.png":::
140
+
:::image type="content" source="./media/catalog-lineage-user-guide/select-switch-to-asset-inline.png" alt-text="Screenshot how to select Switch to asset in a lineage data asset." lightbox="./media/catalog-lineage-user-guide/select-switch-to-asset.png":::
130
141
131
-
2. The lineage canvas could become complex for popular datasets. To avoid clutter, the default view will only show five levels of lineage for the asset in focus. The rest of the lineage can be expanded by selecting the bubbles in the lineage canvas. Data consumers can also hide the assets in the canvas that are of no interest. To further reduce the clutter, turn off the toggle **More Lineage** at the top of lineage canvas. This action will hide all the bubbles in lineage canvas.
142
+
1. The lineage canvas could become complex for popular datasets. To avoid clutter, the default view will only show five levels of lineage for the asset in focus. The rest of the lineage can be expanded by selecting the bubbles in the lineage canvas. Data consumers can also hide the assets in the canvas that are of no interest. To further reduce the clutter, turn off the toggle **More Lineage** at the top of lineage canvas. This action will hide all the bubbles in lineage canvas.
132
143
133
-
:::image type="content" source="./media/catalog-lineage-user-guide/use-toggle-to-hide-bubbles.png" alt-text="Screenshot showing how to toggle More lineage." lightbox="./media/catalog-lineage-user-guide/use-toggle-to-hide-bubbles.png":::
144
+
:::image type="content" source="./media/catalog-lineage-user-guide/use-toggle-to-hide-bubbles-inline.png" alt-text="Screenshot showing how to toggle More lineage." lightbox="./media/catalog-lineage-user-guide/use-toggle-to-hide-bubbles.png":::
134
145
135
-
3. Use the smart buttons in the lineage canvas to get an optimal view of the lineage. Auto layout, Zoom to fit, Zoom in/out, Full screen, and navigation map are available for an immersive lineage experience in the catalog.
146
+
1. Use the smart buttons in the lineage canvas to get an optimal view of the lineage:
147
+
1. Full screen
148
+
1. Zoom to fit
149
+
1. Zoom in/out
150
+
1. Auto align
151
+
1. Zoom preview
152
+
1. And more options:
153
+
1. Center the current asset
154
+
1. Reset to default view
136
155
137
-
:::image type="content" source="./media/catalog-lineage-user-guide/use-lineage-smart-buttons.png" alt-text="Screenshot showing how to select the lineage smart buttons." lightbox="./media/catalog-lineage-user-guide/use-lineage-smart-buttons.png":::
156
+
:::image type="content" source="./media/catalog-lineage-user-guide/use-lineage-smart-buttons-inline.png" alt-text="Screenshot showing how to select the lineage smart buttons." lightbox="./media/catalog-lineage-user-guide/use-lineage-smart-buttons.png":::
Copy file name to clipboardExpand all lines: articles/purview/concept-data-lineage.md
+5-5Lines changed: 5 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -11,19 +11,19 @@ ms.date: 09/27/2021
11
11
12
12
This article provides an overview of data lineage in Microsoft Purview Data Catalog. It also details how data systems can integrate with the catalog to capture lineage of data. Microsoft Purview can capture lineage for data in different parts of your organization's data estate, and at different levels of preparation including:
13
13
14
-
-Completely raw data staged from various platforms
14
+
-Raw data staged from various platforms
15
15
- Transformed and prepared data
16
16
- Data used by visualization platforms.
17
17
18
-
## Use Cases
18
+
## Use cases
19
19
20
-
Data lineage is broadly understood as the lifecycle that spans the data’s origin, and where it moves over time across the data estate. It is used for different kinds of backwards-looking scenarios such as troubleshooting, tracing root cause in data pipelines and debugging. Lineage is also used for data quality analysis, compliance and “what if” scenarios often referred to as impact analysis. Lineage is represented visually to show data moving from source to destination including how the data was transformed. Given the complexity of most enterprise data environments, these views can be hard to understand without doing some consolidation or masking of peripheral data points.
20
+
Data lineage is broadly understood as the lifecycle that spans the data’s origin, and where it moves over time across the data estate. It's used for different kinds of backwards-looking scenarios such as troubleshooting, tracing root cause in data pipelines and debugging. Lineage is also used for data quality analysis, compliance and “what if” scenarios often referred to as impact analysis. Lineage is represented visually to show data moving from source to destination including how the data was transformed. Given the complexity of most enterprise data environments, these views can be hard to understand without doing some consolidation or masking of peripheral data points.
21
21
22
22
## Lineage experience in Microsoft Purview Data Catalog
23
23
24
24
Microsoft Purview Data Catalog will connect with other data processing, storage, and analytics systems to extract lineage information. The information is combined to represent a generic, scenario-specific lineage experience in the Catalog.
25
25
26
-
:::image type="content" source="media/concept-lineage/lineage-end-end.png" alt-text="end-end lineage showing data copied from blob store all the way to Power BI dashboard":::
26
+
:::image type="content" source="media/concept-lineage/lineage-end-end-inline.png" alt-text="end-end lineage showing data copied from blob store all the way to Power BI dashboard" lightbox="media/concept-lineage/lineage-end-end.png":::
27
27
28
28
Your data estate may include systems doing data extraction, transformation (ETL/ELT systems), analytics, and visualization systems. Each of the systems captures rich static and operational metadata that describes the state and quality of the data within the systems boundary. The goal of lineage in a data catalog is to extract the movement, transformation, and operational metadata from each data system at the lowest grain possible.
29
29
@@ -42,7 +42,7 @@ The following section covers the details about the granularity of which the line
42
42
43
43
- Lineage is represented as a graph, typically it contains source and target entities in Data storage systems that are connected by a process invoked by a compute system.
44
44
- Data systems connect to the data catalog to generate and report a unique object referencing the physical object of the underlying data system for example: SQL Stored procedure, notebooks, and so on.
45
-
- High fidelity lineage with additional metadata like ownership is captured to show the lineage in a human readable format for source & target entities. for example: lineage at a hive table level instead of partitions or file level.
45
+
- High fidelity lineage with other metadata like ownership is captured to show the lineage in a human readable format for source & target entities. for example: lineage at a hive table level instead of partitions or file level.
0 commit comments