Skip to content

Commit 0aae182

Browse files
authored
Asset normalization limitations
1 parent 3409457 commit 0aae182

File tree

1 file changed

+6
-3
lines changed

1 file changed

+6
-3
lines changed

articles/purview/concept-asset-normalization.md

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,21 +1,24 @@
11
---
22
title: Asset normalization
3-
description: Learn how Microsoft Purview prevents duplicate assets in your data map through asset normalization
3+
description: Learn how Microsoft Purview prevents duplicating assets in your data map through asset normalization.
44
author: nayenama
55
ms.author: nayenama
66
ms.service: purview
77
ms.subservice: purview-data-catalog
88
ms.topic: conceptual
9-
ms.date: 02/17/2023
9+
ms.date: 05/26/2023
1010
ms.custom: ignite-fall-2021
1111
---
1212

1313
# Asset normalization
1414

15-
When ingesting assets into the Microsoft Purview data map, different sources updating the same data asset may send similar, but slightly different qualified names. While these qualified names represent the same asset, slight differences such as an extra character or different capitalization may cause these assets on the surface to appear different. To avoid storing duplicate entries and causing confusion when consuming the data catalog, Microsoft Purview applies normalization during ingestion to ensure all fully qualified names of the same entity type are in the same format.
15+
When ingesting assets into the Microsoft Purview data map, different sources updating the same data asset may send similar, but slightly different qualified names. While these qualified names represent the same asset, slight differences such as an extra character may cause these assets on the surface to appear different and cause duplicate entries in Microsoft Purview. To avoid storing duplicate entries and causing confusion when consuming the data catalog, Microsoft Purview applies normalization during ingestion to ensure all fully qualified names of the same entity type are in the same format.
1616

1717
For example, you scan in an Azure Blob with the qualified name `https://myaccount.file.core.windows.net/myshare/folderA/folderB/my-file.parquet`. This blob is also consumed by an Azure Data Factory pipeline that will then add lineage information to the asset. The ADF pipeline may be configured to read the file as `https://myAccount.file.core.windows.net//myshare/folderA/folderB/my-file.parquet`. While the qualified name is different, this ADF pipeline is consuming the same piece of data. Normalization ensures that all the metadata from both Azure Blob Storage and Azure Data Factory is visible on a single asset, `https://myaccount.file.core.windows.net/myshare/folderA/folderB/my-file.parquet`.
1818

19+
>[!IMPORTANT]
20+
>The rules listed below are the only kinds of potential dupilcation Microsoft Purview currently recognizes. If you are experiencing accidental asset duplication, compare the assets fully qualified names to check for caplitalization differences or additional characters. Update any ingestion points, for example your ADF pipelines, so that the qualified names match.
21+
1922
## Normalization rules
2023

2124
Below are the normalization rules applied by Microsoft Purview.

0 commit comments

Comments
 (0)