Skip to content

Commit 7c95a67

Browse files
committed
Freshness and formatting
1 parent 116acd2 commit 7c95a67

File tree

1 file changed

+6
-5
lines changed

1 file changed

+6
-5
lines changed

articles/synapse-analytics/sql/query-parquet-files.md

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ author: azaricstefan
66
ms.service: azure-synapse-analytics
77
ms.topic: how-to
88
ms.subservice: sql
9-
ms.date: 02/15/2023
9+
ms.date: 12/10/2024
1010
ms.author: stefanazaric
1111
ms.reviewer: whhender
1212
---
@@ -39,7 +39,7 @@ Make sure that you can access this file. If your file is protected with SAS key
3939
> `ALTER DATABASE CURRENT COLLATE Latin1_General_100_BIN2_UTF8;`
4040
> For more information on collations, see [Collation types supported for Synapse SQL](reference-collation-types.md).
4141
42-
If you use the `Latin1_General_100_BIN2_UTF8` collation you will get an additional performance boost compared to the other collations. The `Latin1_General_100_BIN2_UTF8` collation is compatible with parquet string sorting rules. The SQL pool is able to eliminate some parts of the parquet files that will not contain data needed in the queries (file/column-segment pruning). If you use other collations, all data from the parquet files will be loaded into Synapse SQL and the filtering is happening within the SQL process. The `Latin1_General_100_BIN2_UTF8` collation has additional performance optimization that works only for parquet and Cosmos DB. The downside is that you lose fine-grained comparison rules like case insensitivity.
42+
If you use the `Latin1_General_100_BIN2_UTF8` collation you'll get an extra performance boost compared to the other collations. The `Latin1_General_100_BIN2_UTF8` collation is compatible with parquet string sorting rules. The SQL pool is able to eliminate some parts of the parquet files that won't contain data needed in the queries (file/column-segment pruning). If you use other collations, all data from the parquet files will be loaded into Synapse SQL, and the filtering is happening within the SQL process. The `Latin1_General_100_BIN2_UTF8` collation has another performance optimization that works only for parquet and Cosmos DB. The downside is that you lose fine-grained comparison rules like case insensitivity.
4343

4444
### Data source usage
4545

@@ -121,7 +121,7 @@ ORDER BY
121121

122122
You don't need to use the OPENROWSET WITH clause when reading Parquet files. Column names and data types are automatically read from Parquet files.
123123

124-
Have in mind that if you are reading number of files at once, the schema, column names and data types will be inferred from the first file service gets from the storage. This can mean that some of the columns expected are omitted, all because the file used by the service to define the schema did not contain these columns. To explicitly specify the schema, please use OPENROWSET WITH clause.
124+
Have in mind that if you're reading number of files at once, the schema, column names, and data types will be inferred from the first file service gets from the storage. This can mean that some of the columns expected are omitted, all because the file used by the service to define the schema didn't contain these columns. To explicitly specify the schema, use OPENROWSET WITH clause.
125125

126126
The following sample shows the automatic schema inference capabilities for Parquet files. It returns the number of rows in September 2018 without specifying a schema.
127127

@@ -172,6 +172,7 @@ ORDER BY
172172

173173
For Parquet type mapping to SQL native type check [type mapping for Parquet](develop-openrowset.md#type-mapping-for-parquet).
174174

175-
## Next steps
175+
## Next step
176176

177-
Advance to the next article to learn how to [Query Parquet nested types](query-parquet-nested-types.md).
177+
> [!div class="nextstepaction"]
178+
> [How to query Parquet nested types](query-parquet-nested-types.md)

0 commit comments

Comments
 (0)