You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/synapse-analytics/sql/develop-tables-external-tables.md
+34-33Lines changed: 34 additions & 33 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,32 +1,35 @@
1
1
---
2
-
title: Use external tables with Synapse SQL
3
-
description: Reading or writing data files with external tables in Synapse SQL
4
-
author: jovanpop-msft
5
-
ms.author: jovanpop
2
+
title: Use External Tables with Synapse SQL
3
+
description: Reading or writing data files with external tables in Synapse SQL.
4
+
author: WilliamDAssafMSFT
5
+
ms.author: wiassaf
6
+
ms.reviewer: jovanpop, periclesrocha
7
+
ms.date: 02/19/2025
6
8
ms.service: azure-synapse-analytics
7
-
ms.topic: concept-article
8
9
ms.subservice: sql
9
-
ms.date: 01/08/2025
10
-
ms.reviewer: wiassaf
10
+
ms.topic: concept-article
11
11
---
12
12
13
13
# Use external tables with Synapse SQL
14
14
15
-
An external table points to data located in Hadoop, Azure Storage blob, or Azure Data Lake Storage. You can use external tables to read data from files or write data to files in Azure Storage. With Synapse SQL, you can use external tables to read external data using dedicated SQL pool or serverless SQL pool.
15
+
An external table points to data located in Hadoop, Azure Storage blob, or Azure Data Lake Storage (ADLS).
16
+
17
+
You can use external tables to read data from files or write data to files in Azure Storage. With Azure Synapse SQL, you can use external tables to read external data using dedicated SQL pool or serverless SQL pool.
16
18
17
19
Depending on the type of the external data source, you can use two types of external tables:
20
+
18
21
-**Hadoop external tables** that you can use to read and export data in various data formats such as CSV, Parquet, and ORC. Hadoop external tables are available in dedicated SQL pools, but they aren't available in serverless SQL pools.
19
-
-**Native external tables** that you can use to read and export data in various data formats such as CSV and Parquet. Native external tables are available in serverless SQL pools, and they are in **public preview** in dedicated SQL pools. Writing/exporting data using CETAS and the native external tables is available only in the serverless SQL pool, but not in the dedicated SQL pools.
22
+
-**Native external tables** that you can use to read and export data in various data formats such as CSV and Parquet. Native external tables are available in serverless SQL pools and in dedicated SQL pools. Writing/exporting data using CETAS and the native external tables is available only in the serverless SQL pool, but not in the dedicated SQL pools.
20
23
21
24
The key differences between Hadoop and native external tables:
22
25
23
26
| External table type | Hadoop | Native |
24
27
| --- | --- | --- |
25
-
| Dedicated SQL pool | Available |Only Parquet tables are available in **public preview**.|
28
+
| Dedicated SQL pool | Available | Parquet only|
26
29
| Serverless SQL pool | Not available | Available |
|[Folder partition elimination](#folder-partition-elimination)| No | Partition elimination is available only in the partitioned tables created on Parquet or CSV formats that are synchronized from Apache Spark pools. You might create external tables on Parquet partitioned folders, but the partitioning columns are inaccessible and ignored, while the partition elimination won't be applied. Don't create [external tables on Delta Lake folders](create-use-external-tables.md#delta-tables-on-partitioned-folders) because they aren't supported. Use [Delta partitioned views](create-use-views.md#delta-lake-partitioned-views) if you need to query partitioned Delta Lake data. |
29
-
|[File elimination](#file-elimination) (predicate pushdown) | No | Yes in serverless SQL pool. For the string pushdown, you need to use `Latin1_General_100_BIN2_UTF8` collation on the `VARCHAR` columns to enable pushdown. For more information on collations, see [Collation types supported for Synapse SQL](reference-collation-types.md).|
32
+
|[File elimination](#file-elimination) (predicate pushdown) | No | Yes in serverless SQL pool. For the string pushdown, you need to use `Latin1_General_100_BIN2_UTF8` collation on the `VARCHAR` columns to enable pushdown. For more information on collations, see [Database collation support for Synapse SQL in Azure Synapse Analytics](reference-collation-types.md).|
30
33
| Custom format for location | No | Yes, using wildcards like `/year=*/month=*/day=*` for Parquet or CSV formats. Custom folder paths aren't available in Delta Lake. In the serverless SQL pool, you can also use recursive wildcards `/logs/**` to reference Parquet or CSV files in any subfolder beneath the referenced folder. |
31
34
| Recursive folder scan | Yes | Yes. In serverless SQL pools must be specified `/**` at the end of the location path. In Dedicated pool the folders are always scanned recursively. |
@@ -40,8 +43,8 @@ The key differences between Hadoop and native external tables:
40
43
41
44
You can use external tables to:
42
45
43
-
- Query Azure Blob Storage and Azure Data Lake Gen2 with Transact-SQL statements.
44
-
- Store query results to files in Azure Blob Storage or Azure Data Lake Storage using [CETAS](develop-tables-cetas.md).
46
+
- Query Azure Blob Storage and ADLS Gen2 with Transact-SQL statements.
47
+
- Store query results to files in Azure Blob Storage or Azure Data Lake Storage using [CETAS with Synapse SQL](develop-tables-cetas.md).
45
48
- Import data from Azure Blob Storage and Azure Data Lake Storage and store it in a dedicated SQL pool (only Hadoop tables in dedicated pool).
46
49
47
50
> [!NOTE]
@@ -66,20 +69,20 @@ The folder partition elimination is available in the native external tables that
66
69
### File elimination
67
70
68
71
Some data formats such as Parquet and Delta contain file statistics for each column (for example, min/max values for each column). The queries that filter data won't read the files where the required column values don't exist. The query will first explore min/max values for the columns used in the query predicate to find the files that don't contain the required data. These files are ignored and eliminated from the query plan.
69
-
This technique is also known as filter predicate pushdown and it can improve the performance of your queries. Filter pushdown is available in the serverless SQL pools on Parquet and Delta formats. To apply filter pushdown for the string types, use the VARCHAR type with the `Latin1_General_100_BIN2_UTF8` collation. For more information on collations, see [Collation types supported for Synapse SQL](reference-collation-types.md).
72
+
This technique is also known as filter predicate pushdown and it can improve the performance of your queries. Filter pushdown is available in the serverless SQL pools on Parquet and Delta formats. To apply filter pushdown for the string types, use the VARCHAR type with the `Latin1_General_100_BIN2_UTF8` collation. For more information on collations, see [Database collation support for Synapse SQL in Azure Synapse Analytics](reference-collation-types.md).
70
73
71
74
### Security
72
75
73
76
User must have `SELECT` permission on an external table to read the data.
74
77
External tables access underlying Azure storage using the database scoped credential defined in data source using the following rules:
75
78
- Data source without credential enables external tables to access publicly available files on Azure storage.
76
79
- Data source can have a credential that enables external tables to access only the files on Azure storage using SAS token or workspace Managed Identity - For examples, see [the Develop storage files storage access control](develop-storage-files-storage-access-control.md#examples) article.
77
-
80
+
78
81
### Example for CREATE EXTERNAL DATA SOURCE
79
82
80
83
#### [Hadoop](#tab/hadoop)
81
84
82
-
The following example creates a Hadoop external data source in dedicated SQL pool for Azure Data Lake Gen2 pointing to the New York data set:
85
+
The following example creates a Hadoop external data source in dedicated SQL pool for ADLS Gen2 pointing to the public New York data set:
83
86
84
87
```sql
85
88
CREATEDATABASESCOPED CREDENTIAL [ADLS_credential]
@@ -95,7 +98,7 @@ WITH
95
98
) ;
96
99
```
97
100
98
-
The following example creates an external data source for Azure Data Lake Gen2 pointing to the publicly available New York data set:
101
+
The following example creates an external data source for ADLS Gen2 pointing to the publicly available New York data set:
99
102
100
103
```sql
101
104
CREATE EXTERNAL DATA SOURCE YellowTaxi
@@ -105,7 +108,7 @@ WITH ( LOCATION = 'https://azureopendatastorage.blob.core.windows.net/nyctlc/yel
105
108
106
109
#### [Native](#tab/native)
107
110
108
-
The following example creates an external data source in serverless or dedicated SQL pool for Azure Data Lake Gen2 that can be accessed using SAS credential:
111
+
The following example creates an external data source in serverless or dedicated SQL pool for ADLS Gen2 that can be accessed using SAS credential:
109
112
110
113
```sql
111
114
CREATEDATABASESCOPED CREDENTIAL [sqlondemand]
@@ -117,17 +120,18 @@ CREATE EXTERNAL DATA SOURCE SqlOnDemandDemo WITH (
117
120
CREDENTIAL = sqlondemand
118
121
);
119
122
```
123
+
120
124
> [!NOTE]
121
125
> The SQL users need to have proper permissions on database scoped credentials to access the data source in Azure Synapse Analytics Serverless SQL Pool. [Access external storage using serverless SQL pool in Azure Synapse Analytics](./develop-storage-files-overview.md?tabs=impersonation#permissions).
122
-
The following example creates an external data source for Azure Data Lake Gen2 pointing to the publicly available New York data set:
126
+
127
+
The following example creates an external data source for ADLS Gen2 pointing to the publicly available New York data set:
123
128
124
129
```sql
125
130
CREATE EXTERNAL DATA SOURCE YellowTaxi
126
131
WITH ( LOCATION ='https://azureopendatastorage.blob.core.windows.net/nyctlc/yellow/')
127
132
```
128
133
---
129
134
130
-
131
135
### Example for CREATE EXTERNAL FILE FORMAT
132
136
133
137
The following example creates an external file format for census files:
@@ -178,25 +182,22 @@ Using Data Lake exploration capabilities of Synapse Studio you can now create an
178
182
- You must have at least [permissions to create an external table](/sql/t-sql/statements/create-external-table-transact-sql?view=azure-sqldw-latest&preserve-view=true#permissions-2) and query external tables on the Synapse SQL pool (dedicated or serverless).
179
183
180
184
From the Data panel, select the file that you would like to create the external table from:
:::image type="content" source="media/develop-tables-external-tables/external-table.png" alt-text="Screenshot from the Azure portal of the Azure Synapse Analytics create external table experience." lightbox="media/develop-tables-external-tables/external-table.png":::
183
187
184
188
A dialog window will open. Select dedicated SQL pool or serverless SQL pool, give a name to the table and select open script:
:::image type="content" source="media/develop-tables-external-tables/external-table-dialog.png" alt-text="Screenshot from the Azure portal of the Azure Synapse Analytics of the create external table dialog.":::
188
191
189
192
The SQL Script is autogenerated inferring the schema from the file:
:::image type="content" source="media/develop-tables-external-tables/external-table-t-sql.png" alt-text="Screenshot from the Azure portal of a T-SQL script that creates an external table." lightbox="media/develop-tables-external-tables/external-table-t-sql.png":::
195
+
196
+
Run the script. The script will automatically run a `SELECT TOP 100 *`:
197
+
198
+
:::image type="content" source="media/develop-tables-external-tables/external-table-resultset.png" alt-text="Screenshot from the Azure portal of a T-SQL script's result set that shows the external table." lightbox="media/develop-tables-external-tables/external-table-resultset.png":::
196
199
197
-
The external table is now created, for future exploration of the content of this external table the user can query it directly from the Data pane:
0 commit comments