You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/synapse-analytics/sql/query-parquet-files.md
+5-5Lines changed: 5 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -30,7 +30,7 @@ from openrowset(
30
30
format ='parquet') as rows
31
31
```
32
32
33
-
Make sure that you can access this file. If your file is protected with SAS key or custom Azure identity, you would need to setup[server level credential for sql login](develop-storage-files-storage-access-control.md?tabs=shared-access-signature#server-scoped-credential).
33
+
Make sure that you can access this file. If your file is protected with SAS key or custom Azure identity, you would need to set up[server level credential for sql login](develop-storage-files-storage-access-control.md?tabs=shared-access-signature#server-scoped-credential).
34
34
35
35
> [!IMPORTANT]
36
36
> Ensure you are using a UTF-8 database collation (for example `Latin1_General_100_BIN2_UTF8`) because string values in PARQUET files are encoded using UTF-8 encoding.
@@ -39,7 +39,7 @@ Make sure that you can access this file. If your file is protected with SAS key
39
39
> `ALTER DATABASE CURRENT COLLATE Latin1_General_100_BIN2_UTF8;`
40
40
> For more information on collations, see [Collation types supported for Synapse SQL](reference-collation-types.md).
41
41
42
-
If you use the `Latin1_General_100_BIN2_UTF8` collation you will get an additional performance boost compared to the other collations. The `Latin1_General_100_BIN2_UTF8` collation is compatible with parquet string sorting rules. The SQL pool is able to eliminate some parts of the parquet files that will not contain data needed in the queries (file/column-segment pruning). If you use other collations, all data from the parquet files will be loaded into Synapse SQL and the filtering is happening within the SQL process. The `Latin1_General_100_BIN2_UTF8` collation has additional performance optimization that works only for parquet and CosmosDB. The downside is that you lose fine-grained comparison rules like case insensitivity.
42
+
If you use the `Latin1_General_100_BIN2_UTF8` collation you will get an additional performance boost compared to the other collations. The `Latin1_General_100_BIN2_UTF8` collation is compatible with parquet string sorting rules. The SQL pool is able to eliminate some parts of the parquet files that will not contain data needed in the queries (file/column-segment pruning). If you use other collations, all data from the parquet files will be loaded into Synapse SQL and the filtering is happening within the SQL process. The `Latin1_General_100_BIN2_UTF8` collation has additional performance optimization that works only for parquet and Cosmos DB. The downside is that you lose fine-grained comparison rules like case insensitivity.
43
43
44
44
### Data source usage
45
45
@@ -57,7 +57,7 @@ from openrowset(
57
57
) as rows
58
58
```
59
59
60
-
If a data source is protected with SAS key or custom identity you can configure [data source with database scoped credential](develop-storage-files-storage-access-control.md?tabs=shared-access-signature#database-scoped-credential).
60
+
If a data source is protected with SAS key or custom identity, you can configure [data source with database scoped credential](develop-storage-files-storage-access-control.md?tabs=shared-access-signature#database-scoped-credential).
> For more information on collations, see [Collation types supported for Synapse SQL](../sql/reference-collation-types.md).
83
83
84
-
In the following sections you can see how to query various types of PARQUET files.
84
+
In the following sections, you can see how to query various types of PARQUET files.
85
85
86
86
## Prerequisites
87
87
@@ -121,7 +121,7 @@ ORDER BY
121
121
122
122
You don't need to use the OPENROWSET WITH clause when reading Parquet files. Column names and data types are automatically read from Parquet files.
123
123
124
-
The sample below shows the automatic schema inference capabilities for Parquet files. It returns the number of rows in September 2018 without specifying a schema.
124
+
The following sample shows the automatic schema inference capabilities for Parquet files. It returns the number of rows in September 2018 without specifying a schema.
125
125
126
126
> [!NOTE]
127
127
> You don't have to specify columns in the OPENROWSET WITH clause when reading Parquet files. In that case, serverless SQL pool query service will utilize metadata in the Parquet file and bind columns by name.
0 commit comments