Skip to content

Commit 0efe15d

Browse files
committed
Polybase tutorials
New tutorials to explain how to use external tables and COPY INTO with Entra ID and managed identity
1 parent 55e8d87 commit 0efe15d

File tree

2 files changed

+348
-0
lines changed

2 files changed

+348
-0
lines changed
Lines changed: 210 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,210 @@
1+
---
2+
title: 'Tutorial: create external tables or ingest data from on ADLS Gen2 using a managed identity'
3+
description: This tutorial shows how to connect to external data for queries or ingestion using a managed identity.
4+
author: periclesrocha
5+
ms.service: azure-synapse-analytics
6+
ms.topic: tutorial
7+
ms.subservice: sql
8+
ms.date: 01/04/2025
9+
ms.custom:
10+
ms.author: periclesrocha
11+
ms.reviewer: WilliamDAssafMSFT
12+
---
13+
14+
# Tutorial: create external tables or ingest data from on ADLS Gen2 using a managed identity
15+
16+
Applies to: Azure Synapse Analytics
17+
18+
This article explains how to create external tables or ingest data from Azure Data Lake Storage Gen2 accounts using a managed identity.
19+
20+
## Prerequisites:
21+
22+
This tutorial requires the following resources to be in place:
23+
24+
* An Azure Data Lake Storage Gen2 (ADLS Gen2) account
25+
* An Azure Synapse Analytics workspace and a dedicated SQL Pool
26+
27+
## Give the workspace identity access to the storage account
28+
29+
Each Azure Synapse Analytics workspace automatically creates a managed identity that helps you configure secure access to external data from your workspace. To learn more about managed identities for Azure Synapse Analytics, visit [Managed service identity for Azure Synapse Analytics - Azure Synapse | Microsoft Learn](https://learn.microsoft.com/en-us/azure/synapse-analytics/synapse-service-identity).
30+
31+
To enable your managed identity to access data on ADLS Gen2 accounts, you need to give your identity access to the source account. To grant proper permissions, follow these steps:
32+
33+
1. In the Azure Portal, find your storage account.
34+
2. Select **Data storage -> Containers**, and navigate to the folder where the source data the external table needs access to is.
35+
3. Select **Access control (IAM)**.
36+
4. Select **Add -> Add role assignment**.
37+
5. In the list of job function roles, select **Storage Blob Data Contributor** and select **Next**.
38+
6. In the Add role assignment page, select **+ Select members**. The Select members pane opens in the right-hand corner.
39+
7. Type the name of your workspace identity until it is displayed. The workspace identity is the same as your workspace name. Pick your workspace identity and chose **Select**.
40+
8. Back to the Add role assignment page, make sure the list of Members include your workspace identity. Once verified, select **Review + assign**.
41+
9. In the confirmation page, review the changes and select **Review + assign**.
42+
43+
Your workspace identity is now a member of the Storage Blob Data Contributor role and has access to the source folder.
44+
45+
Note: these steps also apply to secure ADLS Gen2 accounts that are configured to restrict public access. To learn more about securing your ADLS Gen2 account, visit [Configure Azure Storage firewalls and virtual networks | Microsoft Learn](https://learn.microsoft.com/en-us/azure/storage/common/storage-network-security?tabs=azure-portal).
46+
47+
## Ingest data using COPY INTO
48+
49+
The COPY INTO statement provides flexible, high-throughput data ingestion into your tables, and is the primary strategy to ingest data into your dedicated SQL Pool tables. It allows users to ingest data from external locations without having to create any of the additional database objects that are required for external tables.
50+
51+
To run the COPY INTO statement using a workspace managed identity for authentication, use the following command:
52+
53+
```sql
54+
COPY INTO <TableName>
55+
FROM 'https://<AccountName>.dfs.core.windows.net/<Container>/<Folder>/ '
56+
WITH
57+
(
58+
CREDENTIAL = (IDENTITY = 'Managed Identity'),
59+
[<CopyIntoOptions>]
60+
)
61+
```
62+
63+
Where:
64+
65+
* \<TableName> is the name of the table you will ingest data into
66+
* \<AccountName> is your ADLS Gen2 account name
67+
* \<Container> is the name of the container within your storage account where the source data is stored
68+
* \<Folder> is the folder (or path with subfolders) where the source data is stored within your container. You can also provide a file name if pointing directly to a single file.
69+
* \<CopyIntoOptions> is the list of any other options you wish to provide to the COPY INTO statement.
70+
71+
To learn more and explore the full syntax of COPY INTO, refer to <https://learn.microsoft.com/en-us/sql/t-sql/statements/copy-into-transact-sql?view=azure-sqldw-latest>.
72+
73+
## Query data on ADLS Gen2 using external tables
74+
75+
External tables allow users to query data from ADLS Gen2 accounts without the need to ingest data first. Users can create an external table which points to files on an ADLS Gen2 container, and query it just like a regular user table.
76+
77+
The following steps describe the process to create a new external table pointing to data on ADLS Gen2, using a managed identity for authentication.
78+
79+
### Create the required database objects
80+
81+
External tables require the following objects to be created:
82+
83+
1. A database master key that encrypts the database scoped credential’s secret
84+
2. A database scoped credential that uses your workspace identity.
85+
3. An external data source that points to the source folder.
86+
4. An external file format that defines the format of the source files.
87+
5. An external table definition that is used for queries.
88+
89+
To follow these steps, you will need to use the SQL editor in the Azure Synapse Workspace, or your preferred SQL client connected to your dedicated SQL Pool. Let’s look at these steps in detail.
90+
91+
#### Create the database master key
92+
93+
The database master key is a symmetric key used to protect the private keys of certificates and asymmetric keys that are present in the database and secrets in database scoped credentials. If there is already a master key in the database, you do not need to create a new one.
94+
95+
To create a master key, use the following command:
96+
97+
```sql
98+
-- Replace <Secure Password Here> with a secure password
99+
CREATE MASTER KEY ENCRYPTION BY PASSWORD = '<Secure Password Here>'
100+
```
101+
102+
Where:
103+
104+
* \<Secure Password Here> should be replaced with a strong password. This password is used to encrypt the master key in the database
105+
106+
To learn more about the database master key, refer to <https://learn.microsoft.com/en-us/sql/t-sql/statements/create-master-key-transact-sql?view=azure-sqldw-latest>.
107+
108+
#### Create the database scoped credential
109+
110+
A database scoped credential uses your workspace identity and is needed to access to the external location anytime the external table requires access to the source data.
111+
112+
To create the database scoped credential, use the following command:
113+
114+
```sql
115+
CREATE DATABASE SCOPED CREDENTIAL <CredentialName> WITH IDENTITY = 'Managed Service Identity'
116+
```
117+
118+
Where:
119+
120+
* \<CredentialName> should be replaced with the name you would like to use for your database scoped credential
121+
122+
To learn more about database scoped credentials, refer to <https://learn.microsoft.com/en-us/sql/t-sql/statements/create-database-scoped-credential-transact-sql?view=azure-sqldw-latest>.
123+
124+
#### Create the external data source
125+
126+
The next step is to create an external data source that specifies where the source data used by the external table resides.
127+
128+
To create the external data source, use the following command:
129+
130+
```sql
131+
CREATE EXTERNAL DATA SOURCE <ExternalDataSourceName>
132+
WITH (
133+
TYPE = hadoop,
134+
LOCATION = 'abfss://<Container>@<AccountName>.dfs.core.windows.net/<Folder>/,
135+
CREDENTIAL = <CredentialName>
136+
)
137+
```
138+
139+
Where:
140+
141+
* \<ExternalDataSourceName> is the name you want to use for your external data source
142+
* \<AccountName> is your ADLS Gen2 account name
143+
* \<Container> is the name of the container within your storage account where the source data is stored
144+
* \<Folder> is the folder (or path with subfolders) where the source data is stored within your container. You can also provide a file name if pointing directly to a single file.
145+
* \<Credential> is the name of the database scoped credential you created in step b)
146+
147+
To learn more about external data sources, refer to <https://learn.microsoft.com/en-us/sql/t-sql/statements/create-external-data-source-transact-sql?view=azure-sqldw-latest&tabs=dedicated>.
148+
149+
#### Create the external file format
150+
151+
The next step is to create the external file format. It specifies the actual layout of the data referenced by the external table.
152+
153+
To create the external file format, use the following command:
154+
155+
```sql
156+
CREATE EXTERNAL FILE FORMAT <FileFormatName>
157+
WITH (
158+
FORMAT_TYPE = DELIMITEDTEXT,
159+
FORMAT_OPTIONS (
160+
FIELD_TERMINATOR = ',',
161+
STRING_DELIMITER = '"',
162+
FIRST_ROW = 2,
163+
USE_TYPE_DEFAULT = True
164+
)
165+
)
166+
```
167+
168+
Where:
169+
170+
* \<FileFormatName> is the name you want to use for your external file format
171+
172+
In the example above, adjust parameters such as FIELD\_TERMINATOR, STRING\_DELIMITER, FIRST\_ROW and others as needed in accordance with your source data. For more formatting options and to learn more about EXTERNAL FILE FORMAT, visit <https://learn.microsoft.com/en-us/sql/t-sql/statements/create-external-file-format-transact-sql?view=azure-sqldw-latest&tabs=delimited>.
173+
174+
#### Create the external table
175+
176+
Now that we’ve created all the necessary objects that hold the metadata to securely access external data, it is time to create the external table. To create the external table, use the following command:
177+
178+
```sql
179+
-- Adjust the table name and columns to your desired name and external table schema
180+
CREATE EXTERNAL TABLE <ExternalTableName> (
181+
Col1 INT,
182+
Col2 NVARCHAR(100),
183+
Col4 INT
184+
)
185+
WITH
186+
(
187+
LOCATION = '<Path>',
188+
DATA_SOURCE = <ExternalDataSourceName>,
189+
FILE_FORMAT = <FileFormatName>
190+
)
191+
```
192+
193+
Where:
194+
195+
* \<ExternalTableName> is the name you want to use for your external table
196+
* \<Path> is the relative path of the source data from the location specified in the external data source on step c)
197+
* \<ExternalDataSourceName> is the name of the external data source you created previously c)
198+
* \<FileFormatName> is the name of the external file format you created in step d)
199+
200+
Make sure to adjust the table name and schema to the desired name and the schema of the data in your source files.
201+
202+
At this point, all the metadata required to access the external table has been created. To test your external table, use a simple query such as the one below:
203+
204+
```sql
205+
SELECT TOP 10 Col1, Col2 FROM <ExternalTableName>
206+
```
207+
208+
If everything was configured properly, you should see the data from your source data as a result of this query.
209+
210+
To learn more and explore the full syntax of EXTERNAL TABLE, refer to <https://learn.microsoft.com/en-us/sql/t-sql/statements/create-external-table-transact-sql?view=azure-sqldw-latest&tabs=dedicated>.
Lines changed: 138 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,138 @@
1+
---
2+
title: 'Tutorial: load data using Entra ID'
3+
description: This tutorial shows how to connect to external data for queries or ingestion using Entra ID passthrough
4+
author: periclesrocha
5+
ms.service: azure-synapse-analytics
6+
ms.topic: tutorial
7+
ms.subservice: sql
8+
ms.date: 01/04/2025
9+
ms.custom:
10+
ms.author: periclesrocha
11+
ms.reviewer: WilliamDAssafMSFT
12+
---
13+
14+
# Tutorial: load data using EntraID
15+
16+
Applies to: Azure Synapse Analytics
17+
18+
This article explains how to create external tables using EntraID passthrough.
19+
20+
## Prerequisites:
21+
22+
This tutorial requires the following resources to be in place:
23+
24+
* An Azure Synapse Analytics workspace and a dedicated SQL Pool
25+
26+
## Give the EntraID account access to the storage account
27+
28+
This examples uses an EntraID account (or group) to authenticate to the source data.
29+
30+
To enable access to data on ADLS Gen2 accounts, you need to give your EntraID account (or group) access to the source account. To grant the proper permissions, follow these steps:
31+
32+
1. In the Azure Portal, find your storage account.
33+
2. Select **Data storage -> Containers**, and navigate to the folder where the source data the external table needs access to is.
34+
3. Select **Access control (IAM)**.
35+
4. Select **Add -> Add role assignment**.
36+
5. In the list of job function roles, select **Storage Blob Data Reader** and select **Next**.
37+
6. In the Add role assignment page, select **+ Select members**. The Select members pane opens in the right-hand corner.
38+
7. Type the name of the desired EntraID account or group until it is displayed. Pick your desired EntraID account and chose **Select**.
39+
8. Back to the Add role assignment page, make sure the list of Members include your desired EntraID account. Once verified, select **Review + assign**.
40+
9. In the confirmation page, review the changes and select **Review + assign**.
41+
42+
The EntraID account or group is now a member of the Storage Blob Data Reader role and has access to the source folder.
43+
44+
# Create the required database objects
45+
46+
External tables require the following objects to be created:
47+
48+
1. An external data source that points to the source folder.
49+
2. An external file format that defines the format of the source files.
50+
3. An external table definition that is used for queries.
51+
52+
To follow these steps, you will need to use the SQL editor in the Azure Synapse Workspace, or your preferred SQL client connected to your dedicated SQL Pool. Let’s look at these steps in detail.
53+
54+
### Create the external data source
55+
56+
The next step is to create an external data source that specifies where the source data used by the external table resides.
57+
58+
To create the external data source, use the following command:
59+
60+
```sql
61+
CREATE EXTERNAL DATA SOURCE <ExternalDataSourceName>
62+
WITH (
63+
TYPE = hadoop,
64+
LOCATION = 'abfss://<Container>@<AccountName>.dfs.core.windows.net/<Folder>/
65+
)
66+
```
67+
68+
Where:
69+
70+
* \<ExternalDataSourceName> is the name you want to use for your external data source
71+
* \<AccountName> is your ADLS Gen2 account name
72+
* \<Container> is the name of the container within your storage account where the source data is stored
73+
* \<Folder> is the folder (or path with subfolders) where the source data is stored within your container
74+
75+
To learn more about external data sources, refer to <https://learn.microsoft.com/en-us/sql/t-sql/statements/create-external-data-source-transact-sql?view=azure-sqldw-latest&tabs=dedicated>.
76+
77+
### Create the external file format
78+
79+
The next step is to create the external file format. It specifies the actual layout of the data referenced by the external table.
80+
81+
To create the external file format, use the following command:
82+
83+
```sql
84+
CREATE EXTERNAL FILE FORMAT <FileFormatName>
85+
WITH (
86+
FORMAT_TYPE = DELIMITEDTEXT,
87+
FORMAT_OPTIONS (
88+
FIELD_TERMINATOR = ',',
89+
STRING_DELIMITER = '"',
90+
FIRST_ROW = 2,
91+
USE_TYPE_DEFAULT = True
92+
)
93+
)
94+
```
95+
96+
Where:
97+
98+
* \<FileFormatName> is the name you want to use for your external file format
99+
100+
In the example above, adjust parameters such as FIELD\_TERMINATOR, STRING\_DELIMITER, FIRST\_ROW and others as needed in accordance with your source data. For more formatting options and to learn more about EXTERNAL FILE FORMAT, visit <https://learn.microsoft.com/en-us/sql/t-sql/statements/create-external-file-format-transact-sql?view=azure-sqldw-latest&tabs=delimited>.
101+
102+
### Create the external table
103+
104+
Now that we’ve created all the necessary objects that hold the metadata to securely access external data, it is time to create the external table. To create the external table, use the following command:
105+
106+
```sql
107+
-- Adjust the table name and columns to your desired name and external table schema
108+
CREATE EXTERNAL TABLE <ExternalTableName> (
109+
Col1 INT,
110+
Col2 NVARCHAR(100),
111+
Col4 INT
112+
)
113+
WITH
114+
(
115+
LOCATION = '<Path>',
116+
DATA_SOURCE = <ExternalDataSourceName>,
117+
FILE_FORMAT = <FileFormatName>
118+
)
119+
```
120+
121+
Where:
122+
123+
* \<ExternalTableName> is the name you want to use for your external table
124+
* \<Path> is the relative path of the source data from the location specified in the external data source on step c)
125+
* \<ExternalDataSourceName> is the name of the external data source you created previously c)
126+
* \<FileFormatName> is the name of the external file format you created in step d)
127+
128+
Make sure to adjust the table name and schema to the desired name and the schema of the data in your source files.
129+
130+
At this point, all the metadata required to access the external table has been created. To test your external table, use a simple query such as the one below:
131+
132+
```sql
133+
SELECT TOP 10 Col1, Col2 FROM <ExternalTableName>
134+
```
135+
136+
If everything was configured properly, you should see the data from your source data as a result of this query.
137+
138+
To learn more and explore the full syntax of EXTERNAL TABLE, refer to <https://learn.microsoft.com/en-us/sql/t-sql/statements/create-external-table-transact-sql?view=azure-sqldw-latest&tabs=dedicated>.

0 commit comments

Comments
 (0)