Skip to content

Commit fb4e1cc

Browse files
authored
Merge pull request #294930 from WilliamDAssafMSFT/20250219-periclesrocha-patch-6
20250219 periclesrocha patch 6
2 parents 76553d2 + dbe9f6a commit fb4e1cc

File tree

6 files changed

+34
-33
lines changed

6 files changed

+34
-33
lines changed

articles/synapse-analytics/sql/develop-tables-external-tables.md

Lines changed: 34 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -1,32 +1,35 @@
11
---
2-
title: Use external tables with Synapse SQL
3-
description: Reading or writing data files with external tables in Synapse SQL
4-
author: jovanpop-msft
5-
ms.author: jovanpop
2+
title: Use External Tables with Synapse SQL
3+
description: Reading or writing data files with external tables in Synapse SQL.
4+
author: WilliamDAssafMSFT
5+
ms.author: wiassaf
6+
ms.reviewer: jovanpop, periclesrocha
7+
ms.date: 02/19/2025
68
ms.service: azure-synapse-analytics
7-
ms.topic: concept-article
89
ms.subservice: sql
9-
ms.date: 01/08/2025
10-
ms.reviewer: wiassaf
10+
ms.topic: concept-article
1111
---
1212

1313
# Use external tables with Synapse SQL
1414

15-
An external table points to data located in Hadoop, Azure Storage blob, or Azure Data Lake Storage. You can use external tables to read data from files or write data to files in Azure Storage. With Synapse SQL, you can use external tables to read external data using dedicated SQL pool or serverless SQL pool.
15+
An external table points to data located in Hadoop, Azure Storage blob, or Azure Data Lake Storage (ADLS).
16+
17+
You can use external tables to read data from files or write data to files in Azure Storage. With Azure Synapse SQL, you can use external tables to read external data using dedicated SQL pool or serverless SQL pool.
1618

1719
Depending on the type of the external data source, you can use two types of external tables:
20+
1821
- **Hadoop external tables** that you can use to read and export data in various data formats such as CSV, Parquet, and ORC. Hadoop external tables are available in dedicated SQL pools, but they aren't available in serverless SQL pools.
19-
- **Native external tables** that you can use to read and export data in various data formats such as CSV and Parquet. Native external tables are available in serverless SQL pools, and they are in **public preview** in dedicated SQL pools. Writing/exporting data using CETAS and the native external tables is available only in the serverless SQL pool, but not in the dedicated SQL pools.
22+
- **Native external tables** that you can use to read and export data in various data formats such as CSV and Parquet. Native external tables are available in serverless SQL pools and in dedicated SQL pools. Writing/exporting data using CETAS and the native external tables is available only in the serverless SQL pool, but not in the dedicated SQL pools.
2023

2124
The key differences between Hadoop and native external tables:
2225

2326
| External table type | Hadoop | Native |
2427
| --- | --- | --- |
25-
| Dedicated SQL pool | Available | Only Parquet tables are available in **public preview**. |
28+
| Dedicated SQL pool | Available | Parquet only |
2629
| Serverless SQL pool | Not available | Available |
27-
| Supported formats | Delimited/CSV, Parquet, ORC, Hive RC, and RC | Serverless SQL pool: Delimited/CSV, Parquet, and [Delta Lake](query-delta-lake-format.md)<br/>Dedicated SQL pool: Parquet (preview) |
30+
| Supported formats | Delimited/CSV, Parquet, ORC, Hive RC, and RC | Serverless SQL pool: Delimited/CSV, Parquet, and [Delta Lake](query-delta-lake-format.md)<br/>Dedicated SQL pool: Parquet |
2831
| [Folder partition elimination](#folder-partition-elimination) | No | Partition elimination is available only in the partitioned tables created on Parquet or CSV formats that are synchronized from Apache Spark pools. You might create external tables on Parquet partitioned folders, but the partitioning columns are inaccessible and ignored, while the partition elimination won't be applied. Don't create [external tables on Delta Lake folders](create-use-external-tables.md#delta-tables-on-partitioned-folders) because they aren't supported. Use [Delta partitioned views](create-use-views.md#delta-lake-partitioned-views) if you need to query partitioned Delta Lake data. |
29-
| [File elimination](#file-elimination) (predicate pushdown) | No | Yes in serverless SQL pool. For the string pushdown, you need to use `Latin1_General_100_BIN2_UTF8` collation on the `VARCHAR` columns to enable pushdown. For more information on collations, see [Collation types supported for Synapse SQL](reference-collation-types.md).|
32+
| [File elimination](#file-elimination) (predicate pushdown) | No | Yes in serverless SQL pool. For the string pushdown, you need to use `Latin1_General_100_BIN2_UTF8` collation on the `VARCHAR` columns to enable pushdown. For more information on collations, see [Database collation support for Synapse SQL in Azure Synapse Analytics](reference-collation-types.md).|
3033
| Custom format for location | No | Yes, using wildcards like `/year=*/month=*/day=*` for Parquet or CSV formats. Custom folder paths aren't available in Delta Lake. In the serverless SQL pool, you can also use recursive wildcards `/logs/**` to reference Parquet or CSV files in any subfolder beneath the referenced folder. |
3134
| Recursive folder scan | Yes | Yes. In serverless SQL pools must be specified `/**` at the end of the location path. In Dedicated pool the folders are always scanned recursively. |
3235
| Storage authentication | Storage Access Key(SAK), Microsoft Entra passthrough, Managed identity, custom application Microsoft Entra identity | [Shared Access Signature(SAS)](develop-storage-files-storage-access-control.md?tabs=shared-access-signature), [Microsoft Entra passthrough](develop-storage-files-storage-access-control.md?tabs=user-identity), [Managed identity](develop-storage-files-storage-access-control.md?tabs=managed-identity), [Custom application Microsoft Entra identity](develop-storage-files-storage-access-control.md?tabs=service-principal). |
@@ -40,8 +43,8 @@ The key differences between Hadoop and native external tables:
4043

4144
You can use external tables to:
4245

43-
- Query Azure Blob Storage and Azure Data Lake Gen2 with Transact-SQL statements.
44-
- Store query results to files in Azure Blob Storage or Azure Data Lake Storage using [CETAS](develop-tables-cetas.md).
46+
- Query Azure Blob Storage and ADLS Gen2 with Transact-SQL statements.
47+
- Store query results to files in Azure Blob Storage or Azure Data Lake Storage using [CETAS with Synapse SQL](develop-tables-cetas.md).
4548
- Import data from Azure Blob Storage and Azure Data Lake Storage and store it in a dedicated SQL pool (only Hadoop tables in dedicated pool).
4649

4750
> [!NOTE]
@@ -66,20 +69,20 @@ The folder partition elimination is available in the native external tables that
6669
### File elimination
6770

6871
Some data formats such as Parquet and Delta contain file statistics for each column (for example, min/max values for each column). The queries that filter data won't read the files where the required column values don't exist. The query will first explore min/max values for the columns used in the query predicate to find the files that don't contain the required data. These files are ignored and eliminated from the query plan.
69-
This technique is also known as filter predicate pushdown and it can improve the performance of your queries. Filter pushdown is available in the serverless SQL pools on Parquet and Delta formats. To apply filter pushdown for the string types, use the VARCHAR type with the `Latin1_General_100_BIN2_UTF8` collation. For more information on collations, see [Collation types supported for Synapse SQL](reference-collation-types.md).
72+
This technique is also known as filter predicate pushdown and it can improve the performance of your queries. Filter pushdown is available in the serverless SQL pools on Parquet and Delta formats. To apply filter pushdown for the string types, use the VARCHAR type with the `Latin1_General_100_BIN2_UTF8` collation. For more information on collations, see [Database collation support for Synapse SQL in Azure Synapse Analytics](reference-collation-types.md).
7073

7174
### Security
7275

7376
User must have `SELECT` permission on an external table to read the data.
7477
External tables access underlying Azure storage using the database scoped credential defined in data source using the following rules:
7578
- Data source without credential enables external tables to access publicly available files on Azure storage.
7679
- Data source can have a credential that enables external tables to access only the files on Azure storage using SAS token or workspace Managed Identity - For examples, see [the Develop storage files storage access control](develop-storage-files-storage-access-control.md#examples) article.
77-
80+
7881
### Example for CREATE EXTERNAL DATA SOURCE
7982

8083
#### [Hadoop](#tab/hadoop)
8184

82-
The following example creates a Hadoop external data source in dedicated SQL pool for Azure Data Lake Gen2 pointing to the New York data set:
85+
The following example creates a Hadoop external data source in dedicated SQL pool for ADLS Gen2 pointing to the public New York data set:
8386

8487
```sql
8588
CREATE DATABASE SCOPED CREDENTIAL [ADLS_credential]
@@ -95,7 +98,7 @@ WITH
9598
) ;
9699
```
97100

98-
The following example creates an external data source for Azure Data Lake Gen2 pointing to the publicly available New York data set:
101+
The following example creates an external data source for ADLS Gen2 pointing to the publicly available New York data set:
99102

100103
```sql
101104
CREATE EXTERNAL DATA SOURCE YellowTaxi
@@ -105,7 +108,7 @@ WITH ( LOCATION = 'https://azureopendatastorage.blob.core.windows.net/nyctlc/yel
105108

106109
#### [Native](#tab/native)
107110

108-
The following example creates an external data source in serverless or dedicated SQL pool for Azure Data Lake Gen2 that can be accessed using SAS credential:
111+
The following example creates an external data source in serverless or dedicated SQL pool for ADLS Gen2 that can be accessed using SAS credential:
109112

110113
```sql
111114
CREATE DATABASE SCOPED CREDENTIAL [sqlondemand]
@@ -117,17 +120,18 @@ CREATE EXTERNAL DATA SOURCE SqlOnDemandDemo WITH (
117120
CREDENTIAL = sqlondemand
118121
);
119122
```
123+
120124
> [!NOTE]
121125
> The SQL users need to have proper permissions on database scoped credentials to access the data source in Azure Synapse Analytics Serverless SQL Pool. [Access external storage using serverless SQL pool in Azure Synapse Analytics](./develop-storage-files-overview.md?tabs=impersonation#permissions).
122-
The following example creates an external data source for Azure Data Lake Gen2 pointing to the publicly available New York data set:
126+
127+
The following example creates an external data source for ADLS Gen2 pointing to the publicly available New York data set:
123128

124129
```sql
125130
CREATE EXTERNAL DATA SOURCE YellowTaxi
126131
WITH ( LOCATION = 'https://azureopendatastorage.blob.core.windows.net/nyctlc/yellow/')
127132
```
128133
---
129134

130-
131135
### Example for CREATE EXTERNAL FILE FORMAT
132136

133137
The following example creates an external file format for census files:
@@ -178,25 +182,22 @@ Using Data Lake exploration capabilities of Synapse Studio you can now create an
178182
- You must have at least [permissions to create an external table](/sql/t-sql/statements/create-external-table-transact-sql?view=azure-sqldw-latest&preserve-view=true#permissions-2) and query external tables on the Synapse SQL pool (dedicated or serverless).
179183

180184
From the Data panel, select the file that you would like to create the external table from:
181-
> [!div class="mx-imgBorder"]
182-
>![externaltable1](./media/develop-tables-external-tables/external-table-1.png)
185+
186+
:::image type="content" source="media/develop-tables-external-tables/external-table.png" alt-text="Screenshot from the Azure portal of the Azure Synapse Analytics create external table experience." lightbox="media/develop-tables-external-tables/external-table.png":::
183187

184188
A dialog window will open. Select dedicated SQL pool or serverless SQL pool, give a name to the table and select open script:
185189

186-
> [!div class="mx-imgBorder"]
187-
>![externaltable2](./media/develop-tables-external-tables/external-table-2.png)
190+
:::image type="content" source="media/develop-tables-external-tables/external-table-dialog.png" alt-text="Screenshot from the Azure portal of the Azure Synapse Analytics of the create external table dialog.":::
188191

189192
The SQL Script is autogenerated inferring the schema from the file:
190-
> [!div class="mx-imgBorder"]
191-
>![externaltable3](./media/develop-tables-external-tables/external-table-3.png)
192193

193-
Run the script. The script will automatically run a Select Top 100 *.:
194-
> [!div class="mx-imgBorder"]
195-
>![externaltable4](./media/develop-tables-external-tables/external-table-4.png)
194+
:::image type="content" source="media/develop-tables-external-tables/external-table-t-sql.png" alt-text="Screenshot from the Azure portal of a T-SQL script that creates an external table." lightbox="media/develop-tables-external-tables/external-table-t-sql.png":::
195+
196+
Run the script. The script will automatically run a `SELECT TOP 100 *`:
197+
198+
:::image type="content" source="media/develop-tables-external-tables/external-table-resultset.png" alt-text="Screenshot from the Azure portal of a T-SQL script's result set that shows the external table." lightbox="media/develop-tables-external-tables/external-table-resultset.png":::
196199

197-
The external table is now created, for future exploration of the content of this external table the user can query it directly from the Data pane:
198-
> [!div class="mx-imgBorder"]
199-
>![externaltable5](./media/develop-tables-external-tables/external-table-5.png)
200+
The external table is now created. You can now query the external table directly from the Data pane.
200201

201202
## Related content
202203

0 commit comments

Comments
 (0)