Merge pull request #228988 from whhender/limitation-update

prmerger-automator[bot] · web-flow · commit 211d0cbc834e · 2023-02-28T22:37:03.000Z
Adding limitation
diff --git a/articles/purview/microsoft-purview-connector-overview.md b/articles/purview/microsoft-purview-connector-overview.md
@@ -105,17 +105,23 @@ The following is a list of all the Azure data source (data center) regions where
 The following file types are supported for scanning, for schema extraction, and classification where applicable:
 
 - Structured file formats supported by extension: AVRO, ORC, PARQUET, CSV, JSON, PSV, SSV, TSV, TXT, XML, GZIP
- > [!Note]
- > * The Microsoft Purview Data Map scanner only supports schema extraction for the structured file types listed above.
- > * For AVRO, ORC, and PARQUET file types, the scanner does not support schema extraction for files that contain complex data types (for example, MAP, LIST, STRUCT). 
- > * The scanner supports scanning snappy compressed PARQUET types for schema extraction and classification. 
- > * For GZIP file types, the GZIP must be mapped to a single csv file within. 
- > Gzip files are subject to System and Custom Classification rules. We currently don't support scanning a gzip file mapped to multiple files within, or any file type other than csv. 
- > * For delimited file types (CSV, PSV, SSV, TSV, TXT), we do not support data type detection. The data type will be listed as "string" for all columns. We only support comma(‘,’), semicolon(‘;’), vertical bar(‘|’) and tab(‘\t’) as delimiter. If the field doesn't have quotes on the ends, or the field is a single quote char or there are quotes within the field, the row will be judged as error row. Rows that have different number of columns than the header row will be judged as error rows. (numbers of error rows / numbers of rows sampled ) must be less than 0.1.
- > * For Parquet files, if you are using a self-hosted integration runtime, you need to install the **64-bit JRE 8 (Java Runtime Environment) or OpenJDK** on your IR machine. Check our [Java Runtime Environment section at the bottom of the page](manage-integration-runtimes.md#java-runtime-environment-installation) for an installation guide.
 - Document file formats supported by extension: DOC, DOCM, DOCX, DOT, ODP, ODS, ODT, PDF, POT, PPS, PPSX, PPT, PPTM, PPTX, XLC, XLS, XLSB, XLSM, XLSX, XLT
 - The Microsoft Purview Data Map also supports [custom file extensions and custom parsers](create-a-scan-rule-set.md#create-a-custom-file-type).
 
+> [!Note]
+> **Known Limitations:**
+> * The Microsoft Purview Data Map scanner only supports schema extraction for the structured file types listed above.
+> * For AVRO, ORC, and PARQUET file types, the scanner does not support schema extraction for files that contain complex data types (for example, MAP, LIST, STRUCT). 
+> * The scanner supports scanning snappy compressed PARQUET types for schema extraction and classification. 
+> * For GZIP file types, the GZIP must be mapped to a single csv file within. 
+> Gzip files are subject to System and Custom Classification rules. We currently don't support scanning a gzip file mapped to multiple files within, or any file type other than csv. 
+> * **For delimited file types (CSV, PSV, SSV, TSV, TXT)**:
+>    * We do not support data type detection. The data type will be listed as "string" for all columns.
+>    * We only support comma(‘,’), semicolon(‘;’), vertical bar(‘|’) and tab(‘\t’) as delimiters.
+>    * Delimited files with less than three rows cannot be determined to be CSV files if they are using a custom delimiter. For example: files with ~ delimiter and less than three rows will not be able to be determined to be CSV files.
+>    * If the field doesn't have quotes on the ends, or the field is a single quote char or there are quotes within the field, the row will be judged as error row. Rows that have different number of columns than the header row will be judged as error rows. (numbers of error rows / numbers of rows sampled ) must be less than 0.1.
+ > * For Parquet files, if you are using a self-hosted integration runtime, you need to install the **64-bit JRE 8 (Java Runtime Environment) or OpenJDK** on your IR machine. Check our [Java Runtime Environment section at the bottom of the page](manage-integration-runtimes.md#java-runtime-environment-installation) for an installation guide.
+
 ## Schema extraction
 
 Currently, the maximum number of columns supported in asset schema tab is 800 for Azure sources, Power BI and SQL server.