Skip to content

Commit 84ff098

Browse files
authored
Merge pull request #202327 from whhender/language-updates
Language updates
2 parents 81fe5b3 + fae9b4f commit 84ff098

File tree

4 files changed

+67
-67
lines changed

4 files changed

+67
-67
lines changed

articles/purview/azure-purview-connector-overview.md

Lines changed: 15 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
---
2-
title: Microsoft Purview supported data sources and file types
3-
description: This article provides details about supported data sources, file types, and functionalities in Microsoft Purview.
2+
title: Microsoft Purview Data Map supported data sources and file types
3+
description: This article provides details about supported data sources, file types, and functionalities in the Microsoft Purview Data Map.
44
author: linda33wj
55
ms.author: jingwang
66
ms.service: purview
@@ -12,9 +12,9 @@ ms.custom: ignite-fall-2021
1212

1313
# Supported data sources and file types
1414

15-
This article discusses currently supported data sources, file types, and scanning concepts in Microsoft Purview.
15+
This article discusses currently supported data sources, file types, and scanning concepts in the Microsoft Purview Data Map.
1616

17-
## Microsoft Purview data sources
17+
## Microsoft Purview Data Map available data sources
1818

1919
The table below shows the supported capabilities for each data source. Select the data source, or the feature, to learn more.
2020

@@ -60,12 +60,12 @@ The table below shows the supported capabilities for each data source. Select th
6060
\* Besides the lineage on assets within the data source, lineage is also supported if dataset is used as a source/sink in [Data Factory](how-to-link-azure-data-factory.md) or [Synapse pipeline](how-to-lineage-azure-synapse-analytics.md).
6161

6262
> [!NOTE]
63-
> Currently, Microsoft Purview can't scan an asset that has `/`, `\`, or `#` in its name. To scope your scan and avoid scanning assets that have those characters in the asset name, use the example in [Register and scan an Azure SQL Database](register-scan-azure-sql-database.md#creating-the-scan).
63+
> Currently, the Microsoft Purview Data Map can't scan an asset that has `/`, `\`, or `#` in its name. To scope your scan and avoid scanning assets that have those characters in the asset name, use the example in [Register and scan an Azure SQL Database](register-scan-azure-sql-database.md#creating-the-scan).
6464
6565
## Scan regions
66-
The following is a list of all the Azure data source (data center) regions where the Microsoft Purview scanner runs. If your Azure data source is in a region outside of this list, the scanner will run in the region of your Microsoft Purview instance.
66+
The following is a list of all the Azure data source (data center) regions where the Microsoft Purview Data Map scanner runs. If your Azure data source is in a region outside of this list, the scanner will run in the region of your Microsoft Purview instance.
6767

68-
### Microsoft Purview scanner regions
68+
### Microsoft Purview Data Map scanner regions
6969

7070
- Australia East
7171
- Australia Southeast
@@ -97,14 +97,14 @@ The following file types are supported for scanning, for schema extraction, and
9797

9898
- Structured file formats supported by extension: AVRO, ORC, PARQUET, CSV, JSON, PSV, SSV, TSV, TXT, XML, GZIP
9999
> [!Note]
100-
> * Microsoft Purview scanner only supports schema extraction for the structured file types listed above.
101-
> * For AVRO, ORC, and PARQUET file types, Microsoft Purview scanner does not support schema extraction for files that contain complex data types (for example, MAP, LIST, STRUCT).
102-
> * Microsoft Purview scanner supports scanning snappy compressed PARQUET types for schema extraction and classification.
100+
> * The Microsoft Purview Data Map scanner only supports schema extraction for the structured file types listed above.
101+
> * For AVRO, ORC, and PARQUET file types, the scanner does not support schema extraction for files that contain complex data types (for example, MAP, LIST, STRUCT).
102+
> * The scanner supports scanning snappy compressed PARQUET types for schema extraction and classification.
103103
> * For GZIP file types, the GZIP must be mapped to a single csv file within.
104104
> Gzip files are subject to System and Custom Classification rules. We currently don't support scanning a gzip file mapped to multiple files within, or any file type other than csv.
105105
> * For delimited file types (CSV, PSV, SSV, TSV, TXT), we do not support data type detection. The data type will be listed as "string" for all columns.
106106
- Document file formats supported by extension: DOC, DOCM, DOCX, DOT, ODP, ODS, ODT, PDF, POT, PPS, PPSX, PPT, PPTM, PPTX, XLC, XLS, XLSB, XLSM, XLSX, XLT
107-
- Microsoft Purview also supports custom file extensions and custom parsers.
107+
- The Microsoft Purview Data Map also supports custom file extensions and custom parsers.
108108

109109
## Nested data
110110

@@ -116,12 +116,12 @@ Nested data, or nested schema parsing, isn't supported in SQL. A column with nes
116116

117117
## Sampling within a file
118118

119-
In Microsoft Purview terminology,
119+
In Microsoft Purview Data Map terminology,
120120
- L1 scan: Extracts basic information and meta data like file name, size and fully qualified name
121121
- L2 scan: Extracts schema for structured file types and database tables
122122
- L3 scan: Extracts schema where applicable and subjects the sampled file to system and custom classification rules
123123

124-
For all structured file formats, Microsoft Purview scanner samples files in the following way:
124+
For all structured file formats, the Microsoft Purview Data Map scanner samples files in the following way:
125125

126126
- For structured file types, it samples the top 128 rows in each column or the first 1 MB, whichever is lower.
127127
- For document file formats, it samples the first 20 MB of each file.
@@ -131,7 +131,7 @@ For all structured file formats, Microsoft Purview scanner samples files in the
131131

132132
## Resource set file sampling
133133

134-
A folder or group of partition files is detected as a *resource set* in Microsoft Purview, if it matches with a system resource set policy or a customer defined resource set policy. If a resource set is detected, then Microsoft Purview will sample each folder that it contains. Learn more about resource sets [here](concept-resource-sets.md).
134+
A folder or group of partition files is detected as a *resource set* in the Microsoft Purview Data Map if it matches with a system resource set policy or a customer defined resource set policy. If a resource set is detected, then the scanner will sample each folder that it contains. Learn more about resource sets [here](concept-resource-sets.md).
135135

136136
File sampling for resource sets by file types:
137137

@@ -143,7 +143,7 @@ File sampling for resource sets by file types:
143143

144144
## Classification
145145

146-
All 208 system classification rules apply to structured file formats. Only the MCE classification rules apply to document file types (Not the data scan native regex patterns, bloom filter-based detection). For more information on supported classifications, see [Supported classifications in Microsoft Purview](supported-classifications.md).
146+
All 208 system classification rules apply to structured file formats. Only the MCE classification rules apply to document file types (Not the data scan native regex patterns, bloom filter-based detection). For more information on supported classifications, see [Supported classifications in the Microsoft Purview Data Map](supported-classifications.md).
147147

148148
## Next steps
149149

0 commit comments

Comments
 (0)