Skip to content

Commit 1253fc3

Browse files
Merge pull request #292428 from WilliamDAssafMSFT/20241227-external-tables-tutorials
20241227 external tables tutorials
2 parents d0684b0 + 4be9790 commit 1253fc3

7 files changed

+445
-55
lines changed

articles/synapse-analytics/sql-data-warehouse/design-elt-data-loading.md

Lines changed: 64 additions & 53 deletions
Large diffs are not rendered by default.

articles/synapse-analytics/sql-data-warehouse/sql-data-warehouse-load-from-azure-blob-storage-with-polybase.md

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -346,4 +346,9 @@ GROUP BY p.[BrandName]
346346
## Next steps
347347

348348
To load the full data set, run the example [load the full Contoso retail data warehouse](https://github.com/Microsoft/sql-server-samples/tree/master/samples/databases/contoso-data-warehouse/readme.md) from the Microsoft SQL Server samples repository.
349-
For more development tips, see [Design decisions and coding techniques for data warehouses](sql-data-warehouse-overview-develop.md).
349+
350+
## Related content
351+
352+
- [Design decisions and coding techniques for data warehouses](sql-data-warehouse-overview-develop.md)
353+
- [Tutorial: Load external data using Microsoft Entra ID](../sql/tutorial-load-data-using-entra-id.md)
354+
- [Tutorial: Load external data using a managed identity](../sql/tutorial-external-tables-using-managed-identity.md)

articles/synapse-analytics/sql/load-data-overview.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -111,6 +111,8 @@ It is best practice to load data into a staging table. Staging tables allow you
111111

112112
To load data with PolyBase, you can use any of these loading options:
113113

114+
- [Load external data using Microsoft Entra ID](../sql/tutorial-load-data-using-entra-id.md)
115+
- [Load external data using a managed identity](../sql/tutorial-external-tables-using-managed-identity.md)
114116
- [PolyBase with T-SQL](../sql-data-warehouse/sql-data-warehouse-load-from-azure-blob-storage-with-polybase.md?bc=%2fazure%2fsynapse-analytics%2fbreadcrumb%2ftoc.json&toc=%2fazure%2fsynapse-analytics%2ftoc.json) works well when your data is in Azure Blob storage or Azure Data Lake Store. It gives you the most control over the loading process, but also requires you to define external data objects. The other methods define these objects behind the scenes as you map source tables to destination tables. To orchestrate T-SQL loads, you can use Azure Data Factory, SSIS, or Azure functions.
115117
- [PolyBase with SSIS](/sql/integration-services/load-data-to-sql-data-warehouse?view=azure-sqldw-latest&preserve-view=true) works well when your source data is in SQL Server. SSIS defines the source to destination table mappings, and also orchestrates the load. If you already have SSIS packages, you can modify the packages to work with the new data warehouse destination.
116118
- [PolyBase with Azure Data Factory (ADF)](../../data-factory/load-azure-sql-data-warehouse.md) is another orchestration tool. It defines a pipeline and schedules jobs.
Lines changed: 202 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,202 @@
1+
---
2+
title: "Tutorial: Load External Data Using a Managed Identity"
3+
description: This tutorial shows how to connect to external data for queries or ingestion using a managed identity.
4+
author: WilliamDAssafMSFT
5+
ms.author: wiassaf
6+
ms.reviewer: periclesrocha
7+
ms.date: 01/04/2025
8+
ms.service: azure-synapse-analytics
9+
ms.subservice: sql
10+
ms.topic: tutorial
11+
---
12+
13+
# Tutorial: Load external data using a managed identity
14+
15+
This article explains how to create external tables or ingest data from Azure Data Lake Storage (ADLS) Gen2 accounts using a managed identity.
16+
17+
## Prerequisites
18+
19+
The following resources are required to complete this tutorial:
20+
21+
- An Azure Data Lake Storage (ADLS) Gen2 account
22+
- An Azure Synapse Analytics workspace and a dedicated SQL pool
23+
24+
## Give the workspace identity access to the storage account
25+
26+
Each Azure Synapse Analytics workspace automatically creates a managed identity that helps you configure secure access to external data from your workspace. To learn more about managed identities for Azure Synapse Analytics, visit [Managed service identity for Azure Synapse Analytics](../synapse-service-identity.md).
27+
28+
To enable your managed identity to access data on ADLS Gen2 accounts, you need to give your identity access to the source account. To grant proper permissions, follow these steps:
29+
30+
1. In the Azure portal, find your storage account.
31+
1. Select **Data storage -> Containers**, and navigate to the folder where the source data the external table needs access to is.
32+
1. Select **Access control (IAM)**.
33+
1. Select **Add -> Add role assignment**.
34+
1. In the list of job function roles, select **Storage Blob Data Contributor** and select **Next**.
35+
1. In the **Add role assignment** page, select **+ Select members**. The **Select members** pane opens.
36+
1. Type the name of your workspace identity. The workspace identity is the same as your workspace name. When displayed, pick your workspace identity, then **Select**.
37+
1. In the **Add role assignment** page, make sure the list of Members include your desired Microsoft Entra ID account. Once verified, select **Review + assign**.
38+
1. In the confirmation page, review the changes and select **Review + assign**.
39+
40+
Your workspace identity is now a member of the Storage Blob Data Contributor role and has access to the source folder.
41+
42+
> [!NOTE]
43+
> These steps also apply to secure ADLS Gen2 accounts that are configured to restrict public access. To learn more about securing your ADLS Gen2 account, see [Configure Azure Storage firewalls and virtual networks](/azure/storage/common/storage-network-security).
44+
45+
## Ingest data using COPY INTO
46+
47+
The T-SQL `COPY INTO` statement provides flexible, high-throughput data ingestion into your tables, and is the primary strategy to ingest data into your dedicated SQL pool tables. `COPY INTO` allows users to ingest data from external locations without having to create any of the extra database objects that are required for external tables.
48+
49+
To run the `COPY INTO` statement using a workspace managed identity for authentication, use the following T-SQL command:
50+
51+
```sql
52+
COPY INTO <TableName>
53+
FROM 'https://<AccountName>.dfs.core.windows.net/<Container>/<Folder>/ '
54+
WITH
55+
(
56+
CREDENTIAL = (IDENTITY = 'Managed Identity'),
57+
[<CopyIntoOptions>]
58+
);
59+
```
60+
61+
Where:
62+
63+
- `<TableName>` is the name of the table to ingest data into
64+
- `<AccountName>` is your ADLS Gen2 account name
65+
- `<Container>` is the name of the container within your storage account where the source data is stored
66+
- `<Folder>` is the folder (or path with subfolders) where the source data is stored within your container. You can also provide a file name if pointing directly to a single file.
67+
- `<CopyIntoOptions>` is the list of any other options you wish to provide to the COPY INTO statement.
68+
69+
To learn more and explore the full syntax of COPY INTO, see [COPY INTO (Transact-SQL)](/sql/t-sql/statements/copy-into-transact-sql?view=azure-sqldw-latest&preserve-view=true).
70+
71+
## Query data on ADLS Gen2 using external tables
72+
73+
External tables allow users to query data from Azure Data Lake Storage (ADLS) Gen2 accounts without the need to ingest data first. Users can create an external table which points to files on an ADLS Gen2 container, and query it just like a regular user table.
74+
75+
The following steps describe the process to create a new external table pointing to data on ADLS Gen2, using a managed identity for authentication.
76+
77+
### Create the required database objects
78+
79+
External tables require the following objects to be created:
80+
81+
1. A database master key that encrypts the database scoped credential's secret
82+
1. A database scoped credential that uses your workspace identity
83+
1. An external data source that points to the source folder
84+
1. An external file format that defines the format of the source files
85+
1. An external table definition that is used for queries
86+
87+
To follow these steps, use the SQL editor in the Azure Synapse Workspace, or your preferred SQL client connected to your dedicated SQL Pool. Let's look at these steps in detail.
88+
89+
#### Create the database master key
90+
91+
The database master key is a symmetric key used to protect the private keys of certificates and asymmetric keys that are present in the database and secrets in database scoped credentials. If there's already a master key in the database, you don't need to create a new one. Replace `<Secure Password>` with a secure password. This password is used to encrypt the master key in the database.
92+
93+
To create a master key, use the following T-SQL command:
94+
95+
```sql
96+
-- Replace <Secure Password> with a secure password
97+
CREATE MASTER KEY ENCRYPTION BY PASSWORD = '<Secure Password>';
98+
```
99+
100+
To learn more about the database master key, see [CREATE MASTER KEY (Transact-SQL)](/sql/t-sql/statements/create-master-key-transact-sql?view=azure-sqldw-latest&preserve-view=true).
101+
102+
#### Create the database scoped credential
103+
104+
A database scoped credential uses your workspace identity and is needed to access to the external location anytime the external table requires access to the source data.
105+
106+
To create the database scoped credential, use the following command. Replace `<CredentialName>` with the name you would like to use for your database scoped credential.
107+
108+
```sql
109+
CREATE DATABASE SCOPED CREDENTIAL <CredentialName> WITH IDENTITY = 'Managed Service Identity';
110+
```
111+
112+
To learn more about database scoped credentials, see [CREATE DATABASE SCOPED CREDENTIAL (Transact-SQL)](/sql/t-sql/statements/create-database-scoped-credential-transact-sql?view=azure-sqldw-latest&preserve-view=true).
113+
114+
#### Create the external data source
115+
116+
The next step is to create an external data source that specifies where the source data used by the external table resides.
117+
118+
To create the external data source, use the following T-SQL command:
119+
120+
```sql
121+
CREATE EXTERNAL DATA SOURCE <ExternalDataSourceName>
122+
WITH (
123+
TYPE = HADOOP,
124+
LOCATION = 'abfss://<Container>@<AccountName>.dfs.core.windows.net/<Folder>/,
125+
CREDENTIAL = <CredentialName>
126+
);
127+
```
128+
129+
Where:
130+
131+
- `<ExternalDataSourceName>` is the name you want to use for your external data source.
132+
- `<AccountName>` is your ADLS Gen2 account name.
133+
- `<Container>` is the name of the container within your storage account where the source data is stored.
134+
- `<Folder>` is the folder (or path with subfolders) where the source data is stored within your container. You can also provide a file name if pointing directly to a single file.
135+
- `<Credential>` is the name of [the database scoped credential you created earlier](#create-the-database-scoped-credential).
136+
137+
To learn more about external data sources, see [CREATE EXTERNAL DATA SOURCE (Transact-SQL)](/sql/t-sql/statements/create-external-data-source-transact-sql?view=azure-sqldw-latest&preserve-view=true&tabs=dedicated).
138+
139+
#### Create the external file format
140+
141+
The next step is to create the external file format. It specifies the actual layout of the data referenced by the external table.
142+
143+
To create the external file format, use the following T-SQL command. Replace `<FileFormatName>` with the name you want to use for your external file format.
144+
145+
```sql
146+
CREATE EXTERNAL FILE FORMAT <FileFormatName>
147+
WITH (
148+
FORMAT_TYPE = DELIMITEDTEXT,
149+
FORMAT_OPTIONS (
150+
FIELD_TERMINATOR = ',',
151+
STRING_DELIMITER = '"',
152+
FIRST_ROW = 2,
153+
USE_TYPE_DEFAULT = True
154+
)
155+
);
156+
```
157+
158+
In this example, adjust parameters such as `FIELD_TERMINATOR`, `STRING_DELIMITER`, `FIRST_ROW`, and others as needed in accordance with your source data. For more formatting options and to learn more about `EXTERNAL FILE FORMAT`, see [CREATE EXTERNAL FILE FORMAT](/sql/t-sql/statements/create-external-file-format-transact-sql?view=azure-sqldw-latest&preserve-view=true).
159+
160+
#### Create the external table
161+
162+
Now that all the necessary objects that hold the metadata to securely access external data are created, it's time to create the external table. To create the external table, use the following T-SQL command:
163+
164+
```sql
165+
-- Adjust the table name and columns to your desired name and external table schema
166+
CREATE EXTERNAL TABLE <ExternalTableName> (
167+
Col1 INT,
168+
Col2 NVARCHAR(100),
169+
Col4 INT
170+
)
171+
WITH
172+
(
173+
LOCATION = '<Path>',
174+
DATA_SOURCE = <ExternalDataSourceName>,
175+
FILE_FORMAT = <FileFormatName>
176+
);
177+
```
178+
179+
Where:
180+
181+
- `<ExternalTableName>` is the name you want to use for your external table.
182+
- `<Path>` is the path of the source data, relative to the [location specified in the external data source](#create-the-external-data-source).
183+
- `<ExternalDataSourceName>` is the name of [the external data source you created previously](#create-the-external-data-source).
184+
- `<FileFormatName>` is the name of [the external file format you created previously](#create-the-external-file-format).
185+
186+
Make sure to adjust the table name and schema to the desired name and the schema of the data in your source files.
187+
188+
At this point, all the metadata required to access the external table are created. To test your external table, use a query such as the following T-SQL sample to validate your work:
189+
190+
```sql
191+
SELECT TOP 10 Col1, Col2 FROM <ExternalTableName>;
192+
```
193+
194+
If everything was configured properly, you should see the data from your source data as a result of this query.
195+
196+
To learn more and explore the full syntax of `CREATE EXTERNAL TABLE`, see [CREATE EXTERNAL TABLE (Transact-SQL)](/sql/t-sql/statements/create-external-table-transact-sql?view=azure-sqldw-latest&preserve-view=true&tabs=dedicated).
197+
198+
## Related content
199+
200+
- [Tutorial: Load external data using a managed identity](tutorial-external-tables-using-managed-identity.md)
201+
- [Load Contoso retail data into dedicated SQL pools in Azure Synapse Analytics](../sql-data-warehouse/sql-data-warehouse-load-from-azure-blob-storage-with-polybase.md)
202+
- [Managed identities for Azure Synapse Analytics](../synapse-service-identity.md)

0 commit comments

Comments
 (0)