Skip to content

Commit 7cae97f

Browse files
authored
Merge pull request #171855 from jovanpop-msft/patch-220
Update query-parquet-files.md
2 parents ac02d53 + a4bce2d commit 7cae97f

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

articles/synapse-analytics/sql/query-parquet-files.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,7 @@ Make sure that you can access this file. If your file is protected with SAS key
3838
> You can easily change the default collation of the current database using the following T-SQL statement:
3939
> `alter database current collate Latin1_General_100_BIN2_UTF8`'
4040
41-
If you use a _BIN2 collation you get an additional performance boost. BIN2 collation is compatible with parquet string sorting rules so we a some parts of the parquet files that will not contain data needed in the queries (file/column-segment pruning) can be eliminated. If you use a non-BIN2 collation all data from the parquet fill will be loaded into Synapse SQL with the filtering happening within the SQL process which might be much slower than with file elimination of the unneeded data. BIN2 collation has additional performance optimization that works only for parquet and CosmosDB. The downside is that you lose fine-grained comparison rules like case insensitivity.
41+
If you use the `Latin1_General_100_BIN2_UTF8` collation you will get an additional performance boost compared to the other collations. The `Latin1_General_100_BIN2_UTF8` collation is compatible with parquet string sorting rules. The SQL pool is able to eliminate some parts of the parquet files that will not contain data needed in the queries (file/column-segment pruning). If you use other collations, all data from the parquet files will be loaded into Synapse SQL and the filtering is happening within the SQL process. The `Latin1_General_100_BIN2_UTF8` collation has additional performance optimization that works only for parquet and CosmosDB. The downside is that you lose fine-grained comparison rules like case insensitivity.
4242

4343
### Data source usage
4444

0 commit comments

Comments
 (0)