You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/synapse-analytics/sql/develop-tables-external-tables.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -29,7 +29,7 @@ The key differences between Hadoop and native external tables are presented in t
29
29
|[File elimination](#file-elimination) (predicate pushdown) | No | Yes in serverless SQL pool. For the string pushdown, you need to use `Latin1_General_100_BIN2_UTF8` collation on the `VARCHAR` columns to enable pushdown. For more information on collations, refer to [Collation types supported for Synapse SQL](reference-collation-types.md).|
30
30
| Custom format for location | No | Yes, using wildcards like `/year=*/month=*/day=*` for Parquet or CSV formats. Custom folder paths are not available in Delta Lake. In the serverless SQL pool you can also use recursive wildcards `/logs/**` to reference Parquet or CSV files in any sub-folder beneath the referenced folder. |
31
31
| Recursive folder scan | Yes | Yes. In serverless SQL pools must be specified `/**` at the end of the location path. In Dedicated pool the folders are always scanned recursively. |
| Storage authentication | Storage Access Key(SAK), Azure Active Directory passthrough, Managed identity, custom application Azure Active Directory identity |[Shared Access Signature(SAS)](develop-storage-files-storage-access-control.md?tabs=shared-access-signature), [Azure Active Directory passthrough](develop-storage-files-storage-access-control.md?tabs=user-identity), [Managed identity](develop-storage-files-storage-access-control.md?tabs=managed-identity), [Custom application Azure AD identity](develop-storage-files-storage-access-control.md?tabs=service-principal). |
33
33
| Column mapping | Ordinal - the columns in the external table definition are mapped to the columns in the underlying Parquet files by position. | Serverless pool: by name. The columns in the external table definition are mapped to the columns in the underlying Parquet files by column name matching. <br/> Dedicated pool: ordinal matching. The columns in the external table definition are mapped to the columns in the underlying Parquet files by position.|
34
34
| CETAS (exporting/transformation) | Yes | CETAS with the native tables as a target works only in the serverless SQL pool. You cannot use the dedicated SQL pools to export data using native tables. |
35
35
@@ -269,7 +269,7 @@ If you're retrieving data from the text file, store each missing value by using
269
269
270
270
- 0 if the column is defined as a numeric column. Decimal columns aren't supported and will cause an error.
271
271
- Empty string ("") if the column is a string column.
272
-
- 1900-01-01 if the column is a date column.
272
+
-"1900-01-01" if the column is a date column.
273
273
274
274
FALSE -
275
275
Store all missing values as NULL. Any NULL values that are stored by using the word NULL in the delimited text file are imported as the string 'NULL'.
Copy file name to clipboardExpand all lines: articles/synapse-analytics/sql/query-delta-lake-format.md
+6-3Lines changed: 6 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -5,7 +5,7 @@ services: synapse analytics
5
5
ms.service: synapse-analytics
6
6
ms.topic: how-to
7
7
ms.subservice: sql
8
-
ms.date: 12/06/2022
8
+
ms.date: 02/15/2023
9
9
author: jovanpop-msft
10
10
ms.author: jovanpop
11
11
ms.reviewer: sngun, wiassaf
@@ -24,7 +24,7 @@ A serverless SQL pool can read Delta Lake files that are created using Apache Sp
24
24
Apache Spark pools in Azure Synapse enable data engineers to modify Delta Lake files using Scala, PySpark, and .NET. Serverless SQL pools help data analysts to create reports on Delta Lake files created by data engineers.
25
25
26
26
> [!IMPORTANT]
27
-
> Querying Delta Lake format using the serverless SQL pool is **Generally available** functionality. However, querying Spark Delta tables is still in public preview and not production ready. There are known issues that might happen if you query Delta tables created using the Spark pools. See the known issues in the [self-help page](resources-self-help-sql-on-demand.md#delta-lake).
27
+
> Querying Delta Lake format using the serverless SQL pool is **Generally available** functionality. However, querying Spark Delta tables is still in public preview and not production ready. There are known issues that might happen if you query Delta tables created using the Spark pools. See the known issues in [Serverless SQL pool self-help](resources-self-help-sql-on-demand.md#delta-lake).
28
28
29
29
## Quickstart example
30
30
@@ -68,7 +68,8 @@ Make sure you can access your file. If your file is protected with SAS key or cu
68
68
> Ensure you are using a UTF-8 database collation (for example `Latin1_General_100_BIN2_UTF8`) because string values in Delta Lake files are encoded using UTF-8 encoding.
69
69
> A mismatch between the text encoding in the Delta Lake file and the collation may cause unexpected conversion errors.
70
70
> You can easily change the default collation of the current database using the following T-SQL statement:
71
-
> `alter database current collate Latin1_General_100_BIN2_UTF8`
71
+
> `ALTER DATABASE CURRENT COLLATE Latin1_General_100_BIN2_UTF8;`
72
+
> For more information on collations, see [Collation types supported for Synapse SQL](reference-collation-types.md).
72
73
73
74
### Data source usage
74
75
@@ -132,7 +133,9 @@ With the explicit specification of the result set schema, you can minimize the t
132
133
133
134
134
135
### Query partitioned data
136
+
135
137
The data set provided in this sample is divided (partitioned) into separate subfolders.
138
+
136
139
Unlike [Parquet](query-parquet-files.md), you don't need to target specific partitions using the `FILEPATH` function. The `OPENROWSET` will identify partitioning
137
140
columns in your Delta Lake folder structure and enable you to directly query data using these columns. This example shows fare amounts by year, month, and payment_type for the first three months of 2017.
Copy file name to clipboardExpand all lines: articles/synapse-analytics/sql/query-parquet-files.md
+6-4Lines changed: 6 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,7 +6,7 @@ author: azaricstefan
6
6
ms.service: synapse-analytics
7
7
ms.topic: how-to
8
8
ms.subservice: sql
9
-
ms.date: 05/20/2020
9
+
ms.date: 02/15/2023
10
10
ms.author: stefanazaric
11
11
ms.reviewer: sngun
12
12
---
@@ -36,7 +36,8 @@ Make sure that you can access this file. If your file is protected with SAS key
36
36
> Ensure you are using a UTF-8 database collation (for example `Latin1_General_100_BIN2_UTF8`) because string values in PARQUET files are encoded using UTF-8 encoding.
37
37
> A mismatch between the text encoding in the PARQUET file and the collation may cause unexpected conversion errors.
38
38
> You can easily change the default collation of the current database using the following T-SQL statement:
39
-
> `alter database current collate Latin1_General_100_BIN2_UTF8`'
39
+
> `ALTER DATABASE CURRENT COLLATE Latin1_General_100_BIN2_UTF8;`
40
+
> For more information on collations, see [Collation types supported for Synapse SQL](reference-collation-types.md).
40
41
41
42
If you use the `Latin1_General_100_BIN2_UTF8` collation you will get an additional performance boost compared to the other collations. The `Latin1_General_100_BIN2_UTF8` collation is compatible with parquet string sorting rules. The SQL pool is able to eliminate some parts of the parquet files that will not contain data needed in the queries (file/column-segment pruning). If you use other collations, all data from the parquet files will be loaded into Synapse SQL and the filtering is happening within the SQL process. The `Latin1_General_100_BIN2_UTF8` collation has additional performance optimization that works only for parquet and CosmosDB. The downside is that you lose fine-grained comparison rules like case insensitivity.
42
43
@@ -75,9 +76,10 @@ from openrowset(
75
76
> Make sure that you are explicilty specifying some UTF-8 collation (for example `Latin1_General_100_BIN2_UTF8`) for all string columns in `WITH` clause or set some UTF-8 collation at database level.
76
77
> Mismatch between text encoding in the file and string column collation might cause unexpected conversion errors.
77
78
> You can easily change default collation of the current database using the following T-SQL statement:
78
-
> `alter database current collate Latin1_General_100_BIN2_UTF8`
79
-
> You can easily set collation on the colum types using the following definition:
79
+
> `ALTER DATABASE CURRENT COLLATE Latin1_General_100_BIN2_UTF8;`
80
+
> You can easily set collation on the colum types, for example:
0 commit comments