Skip to content

Commit 11a8004

Browse files
20230215 1017
1 parent bb6a7f2 commit 11a8004

File tree

1 file changed

+5
-5
lines changed

1 file changed

+5
-5
lines changed

articles/synapse-analytics/sql/query-parquet-files.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,7 @@ from openrowset(
3030
format = 'parquet') as rows
3131
```
3232

33-
Make sure that you can access this file. If your file is protected with SAS key or custom Azure identity, you would need to setup [server level credential for sql login](develop-storage-files-storage-access-control.md?tabs=shared-access-signature#server-scoped-credential).
33+
Make sure that you can access this file. If your file is protected with SAS key or custom Azure identity, you would need to set up [server level credential for sql login](develop-storage-files-storage-access-control.md?tabs=shared-access-signature#server-scoped-credential).
3434

3535
> [!IMPORTANT]
3636
> Ensure you are using a UTF-8 database collation (for example `Latin1_General_100_BIN2_UTF8`) because string values in PARQUET files are encoded using UTF-8 encoding.
@@ -39,7 +39,7 @@ Make sure that you can access this file. If your file is protected with SAS key
3939
> `ALTER DATABASE CURRENT COLLATE Latin1_General_100_BIN2_UTF8;`
4040
> For more information on collations, see [Collation types supported for Synapse SQL](reference-collation-types.md).
4141
42-
If you use the `Latin1_General_100_BIN2_UTF8` collation you will get an additional performance boost compared to the other collations. The `Latin1_General_100_BIN2_UTF8` collation is compatible with parquet string sorting rules. The SQL pool is able to eliminate some parts of the parquet files that will not contain data needed in the queries (file/column-segment pruning). If you use other collations, all data from the parquet files will be loaded into Synapse SQL and the filtering is happening within the SQL process. The `Latin1_General_100_BIN2_UTF8` collation has additional performance optimization that works only for parquet and CosmosDB. The downside is that you lose fine-grained comparison rules like case insensitivity.
42+
If you use the `Latin1_General_100_BIN2_UTF8` collation you will get an additional performance boost compared to the other collations. The `Latin1_General_100_BIN2_UTF8` collation is compatible with parquet string sorting rules. The SQL pool is able to eliminate some parts of the parquet files that will not contain data needed in the queries (file/column-segment pruning). If you use other collations, all data from the parquet files will be loaded into Synapse SQL and the filtering is happening within the SQL process. The `Latin1_General_100_BIN2_UTF8` collation has additional performance optimization that works only for parquet and Cosmos DB. The downside is that you lose fine-grained comparison rules like case insensitivity.
4343

4444
### Data source usage
4545

@@ -57,7 +57,7 @@ from openrowset(
5757
) as rows
5858
```
5959

60-
If a data source is protected with SAS key or custom identity you can configure [data source with database scoped credential](develop-storage-files-storage-access-control.md?tabs=shared-access-signature#database-scoped-credential).
60+
If a data source is protected with SAS key or custom identity, you can configure [data source with database scoped credential](develop-storage-files-storage-access-control.md?tabs=shared-access-signature#database-scoped-credential).
6161

6262
### Explicitly specify schema
6363

@@ -81,7 +81,7 @@ from openrowset(
8181
> `geo_id varchar(6) collate Latin1_General_100_BIN2_UTF8`
8282
> For more information on collations, see [Collation types supported for Synapse SQL](../sql/reference-collation-types.md).
8383
84-
In the following sections you can see how to query various types of PARQUET files.
84+
In the following sections, you can see how to query various types of PARQUET files.
8585

8686
## Prerequisites
8787

@@ -121,7 +121,7 @@ ORDER BY
121121

122122
You don't need to use the OPENROWSET WITH clause when reading Parquet files. Column names and data types are automatically read from Parquet files.
123123

124-
The sample below shows the automatic schema inference capabilities for Parquet files. It returns the number of rows in September 2018 without specifying a schema.
124+
The following sample shows the automatic schema inference capabilities for Parquet files. It returns the number of rows in September 2018 without specifying a schema.
125125

126126
> [!NOTE]
127127
> You don't have to specify columns in the OPENROWSET WITH clause when reading Parquet files. In that case, serverless SQL pool query service will utilize metadata in the Parquet file and bind columns by name.

0 commit comments

Comments
 (0)