Skip to content

Commit 211d0cb

Browse files
Merge pull request #228988 from whhender/limitation-update
Adding limitation
2 parents 64561f9 + 2a67f4f commit 211d0cb

File tree

1 file changed

+14
-8
lines changed

1 file changed

+14
-8
lines changed

articles/purview/microsoft-purview-connector-overview.md

Lines changed: 14 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -105,17 +105,23 @@ The following is a list of all the Azure data source (data center) regions where
105105
The following file types are supported for scanning, for schema extraction, and classification where applicable:
106106

107107
- Structured file formats supported by extension: AVRO, ORC, PARQUET, CSV, JSON, PSV, SSV, TSV, TXT, XML, GZIP
108-
> [!Note]
109-
> * The Microsoft Purview Data Map scanner only supports schema extraction for the structured file types listed above.
110-
> * For AVRO, ORC, and PARQUET file types, the scanner does not support schema extraction for files that contain complex data types (for example, MAP, LIST, STRUCT).
111-
> * The scanner supports scanning snappy compressed PARQUET types for schema extraction and classification.
112-
> * For GZIP file types, the GZIP must be mapped to a single csv file within.
113-
> Gzip files are subject to System and Custom Classification rules. We currently don't support scanning a gzip file mapped to multiple files within, or any file type other than csv.
114-
> * For delimited file types (CSV, PSV, SSV, TSV, TXT), we do not support data type detection. The data type will be listed as "string" for all columns. We only support comma(‘,’), semicolon(‘;’), vertical bar(‘|’) and tab(‘\t’) as delimiter. If the field doesn't have quotes on the ends, or the field is a single quote char or there are quotes within the field, the row will be judged as error row. Rows that have different number of columns than the header row will be judged as error rows. (numbers of error rows / numbers of rows sampled ) must be less than 0.1.
115-
> * For Parquet files, if you are using a self-hosted integration runtime, you need to install the **64-bit JRE 8 (Java Runtime Environment) or OpenJDK** on your IR machine. Check our [Java Runtime Environment section at the bottom of the page](manage-integration-runtimes.md#java-runtime-environment-installation) for an installation guide.
116108
- Document file formats supported by extension: DOC, DOCM, DOCX, DOT, ODP, ODS, ODT, PDF, POT, PPS, PPSX, PPT, PPTM, PPTX, XLC, XLS, XLSB, XLSM, XLSX, XLT
117109
- The Microsoft Purview Data Map also supports [custom file extensions and custom parsers](create-a-scan-rule-set.md#create-a-custom-file-type).
118110

111+
> [!Note]
112+
> **Known Limitations:**
113+
> * The Microsoft Purview Data Map scanner only supports schema extraction for the structured file types listed above.
114+
> * For AVRO, ORC, and PARQUET file types, the scanner does not support schema extraction for files that contain complex data types (for example, MAP, LIST, STRUCT).
115+
> * The scanner supports scanning snappy compressed PARQUET types for schema extraction and classification.
116+
> * For GZIP file types, the GZIP must be mapped to a single csv file within.
117+
> Gzip files are subject to System and Custom Classification rules. We currently don't support scanning a gzip file mapped to multiple files within, or any file type other than csv.
118+
> * **For delimited file types (CSV, PSV, SSV, TSV, TXT)**:
119+
> * We do not support data type detection. The data type will be listed as "string" for all columns.
120+
> * We only support comma(‘,’), semicolon(‘;’), vertical bar(‘|’) and tab(‘\t’) as delimiters.
121+
> * Delimited files with less than three rows cannot be determined to be CSV files if they are using a custom delimiter. For example: files with ~ delimiter and less than three rows will not be able to be determined to be CSV files.
122+
> * If the field doesn't have quotes on the ends, or the field is a single quote char or there are quotes within the field, the row will be judged as error row. Rows that have different number of columns than the header row will be judged as error rows. (numbers of error rows / numbers of rows sampled ) must be less than 0.1.
123+
> * For Parquet files, if you are using a self-hosted integration runtime, you need to install the **64-bit JRE 8 (Java Runtime Environment) or OpenJDK** on your IR machine. Check our [Java Runtime Environment section at the bottom of the page](manage-integration-runtimes.md#java-runtime-environment-installation) for an installation guide.
124+
119125
## Schema extraction
120126

121127
Currently, the maximum number of columns supported in asset schema tab is 800 for Azure sources, Power BI and SQL server.

0 commit comments

Comments
 (0)