You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/purview/azure-purview-connector-overview.md
+15-15Lines changed: 15 additions & 15 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,6 @@
1
1
---
2
-
title: Microsoft Purview supported data sources and file types
3
-
description: This article provides details about supported data sources, file types, and functionalities in Microsoft Purview.
2
+
title: Microsoft Purview Data Map supported data sources and file types
3
+
description: This article provides details about supported data sources, file types, and functionalities in the Microsoft Purview Data Map.
4
4
author: linda33wj
5
5
ms.author: jingwang
6
6
ms.service: purview
@@ -12,9 +12,9 @@ ms.custom: ignite-fall-2021
12
12
13
13
# Supported data sources and file types
14
14
15
-
This article discusses currently supported data sources, file types, and scanning concepts in Microsoft Purview.
15
+
This article discusses currently supported data sources, file types, and scanning concepts in the Microsoft Purview Data Map.
16
16
17
-
## Microsoft Purview data sources
17
+
## Microsoft Purview Data Map available data sources
18
18
19
19
The table below shows the supported capabilities for each data source. Select the data source, or the feature, to learn more.
20
20
@@ -60,12 +60,12 @@ The table below shows the supported capabilities for each data source. Select th
60
60
\* Besides the lineage on assets within the data source, lineage is also supported if dataset is used as a source/sink in [Data Factory](how-to-link-azure-data-factory.md) or [Synapse pipeline](how-to-lineage-azure-synapse-analytics.md).
61
61
62
62
> [!NOTE]
63
-
> Currently, Microsoft Purview can't scan an asset that has `/`, `\`, or `#` in its name. To scope your scan and avoid scanning assets that have those characters in the asset name, use the example in [Register and scan an Azure SQL Database](register-scan-azure-sql-database.md#creating-the-scan).
63
+
> Currently, the Microsoft Purview Data Map can't scan an asset that has `/`, `\`, or `#` in its name. To scope your scan and avoid scanning assets that have those characters in the asset name, use the example in [Register and scan an Azure SQL Database](register-scan-azure-sql-database.md#creating-the-scan).
64
64
65
65
## Scan regions
66
-
The following is a list of all the Azure data source (data center) regions where the Microsoft Purview scanner runs. If your Azure data source is in a region outside of this list, the scanner will run in the region of your Microsoft Purview instance.
66
+
The following is a list of all the Azure data source (data center) regions where the Microsoft Purview Data Map scanner runs. If your Azure data source is in a region outside of this list, the scanner will run in the region of your Microsoft Purview instance.
67
67
68
-
### Microsoft Purview scanner regions
68
+
### Microsoft Purview Data Map scanner regions
69
69
70
70
- Australia East
71
71
- Australia Southeast
@@ -97,14 +97,14 @@ The following file types are supported for scanning, for schema extraction, and
> * Microsoft Purview scanner only supports schema extraction for the structured file types listed above.
101
-
> * For AVRO, ORC, and PARQUET file types, Microsoft Purview scanner does not support schema extraction for files that contain complex data types (for example, MAP, LIST, STRUCT).
102
-
> *Microsoft Purview scanner supports scanning snappy compressed PARQUET types for schema extraction and classification.
100
+
> *The Microsoft Purview Data Map scanner only supports schema extraction for the structured file types listed above.
101
+
> * For AVRO, ORC, and PARQUET file types, the scanner does not support schema extraction for files that contain complex data types (for example, MAP, LIST, STRUCT).
102
+
> *The scanner supports scanning snappy compressed PARQUET types for schema extraction and classification.
103
103
> * For GZIP file types, the GZIP must be mapped to a single csv file within.
104
104
> Gzip files are subject to System and Custom Classification rules. We currently don't support scanning a gzip file mapped to multiple files within, or any file type other than csv.
105
105
> * For delimited file types (CSV, PSV, SSV, TSV, TXT), we do not support data type detection. The data type will be listed as "string" for all columns.
- Microsoft Purview also supports custom file extensions and custom parsers.
107
+
-The Microsoft Purview Data Map also supports custom file extensions and custom parsers.
108
108
109
109
## Nested data
110
110
@@ -116,12 +116,12 @@ Nested data, or nested schema parsing, isn't supported in SQL. A column with nes
116
116
117
117
## Sampling within a file
118
118
119
-
In Microsoft Purview terminology,
119
+
In Microsoft Purview Data Map terminology,
120
120
- L1 scan: Extracts basic information and meta data like file name, size and fully qualified name
121
121
- L2 scan: Extracts schema for structured file types and database tables
122
122
- L3 scan: Extracts schema where applicable and subjects the sampled file to system and custom classification rules
123
123
124
-
For all structured file formats, Microsoft Purview scanner samples files in the following way:
124
+
For all structured file formats, the Microsoft Purview Data Map scanner samples files in the following way:
125
125
126
126
- For structured file types, it samples the top 128 rows in each column or the first 1 MB, whichever is lower.
127
127
- For document file formats, it samples the first 20 MB of each file.
@@ -131,7 +131,7 @@ For all structured file formats, Microsoft Purview scanner samples files in the
131
131
132
132
## Resource set file sampling
133
133
134
-
A folder or group of partition files is detected as a *resource set* in Microsoft Purview, if it matches with a system resource set policy or a customer defined resource set policy. If a resource set is detected, then Microsoft Purview will sample each folder that it contains. Learn more about resource sets [here](concept-resource-sets.md).
134
+
A folder or group of partition files is detected as a *resource set* in the Microsoft Purview Data Map if it matches with a system resource set policy or a customer defined resource set policy. If a resource set is detected, then the scanner will sample each folder that it contains. Learn more about resource sets [here](concept-resource-sets.md).
135
135
136
136
File sampling for resource sets by file types:
137
137
@@ -143,7 +143,7 @@ File sampling for resource sets by file types:
143
143
144
144
## Classification
145
145
146
-
All 208 system classification rules apply to structured file formats. Only the MCE classification rules apply to document file types (Not the data scan native regex patterns, bloom filter-based detection). For more information on supported classifications, see [Supported classifications in Microsoft Purview](supported-classifications.md).
146
+
All 208 system classification rules apply to structured file formats. Only the MCE classification rules apply to document file types (Not the data scan native regex patterns, bloom filter-based detection). For more information on supported classifications, see [Supported classifications in the Microsoft Purview Data Map](supported-classifications.md).
0 commit comments