|
| 1 | +--- |
| 2 | +title: Authentication mechanisms with the COPY statement |
| 3 | +description: Outlines the authentication mechanisms to bulk load data |
| 4 | +services: synapse-analytics |
| 5 | +author: kevinvngo |
| 6 | +ms.service: synapse-analytics |
| 7 | +ms.topic: overview |
| 8 | +ms.subservice: |
| 9 | +ms.date: 05/06/2020 |
| 10 | +ms.author: kevin |
| 11 | +ms.reviewer: jrasnick |
| 12 | +--- |
| 13 | + |
| 14 | +# Securely load data using Synapse SQL |
| 15 | + |
| 16 | +This article highlights and provides examples on the secure authentication mechanisms for the [COPY statement](https://docs.microsoft.com/sql/t-sql/statements/copy-into-transact-sql?view=azure-sqldw-latest). The COPY statement is the most flexible and secure way of bulk loading data in Synapse SQL. |
| 17 | +## Supported authentication mechanisms |
| 18 | + |
| 19 | +The following matrix describes the supported authentication methods for each file type and storage account. This applies to the source storage location and the error file location. |
| 20 | + |
| 21 | +| | CSV | Parquet | ORC | |
| 22 | +| :------------------: | :-------------------------------: | :-------------------------------: | :-------------------------------: | |
| 23 | +| Azure blob storage | SAS/MSI/SERVICE PRINCIPAL/KEY/AAD | SAS/KEY | SAS/KEY | |
| 24 | +| Azure Data Lake Gen2 | SAS/MSI/SERVICE PRINCIPAL/KEY/AAD | SAS/MSI/SERVICE PRINCIPAL/KEY/AAD | SAS/MSI/SERVICE PRINCIPAL/KEY/AAD | |
| 25 | + |
| 26 | +## A. Storage account key with LF as the row terminator |
| 27 | + |
| 28 | + |
| 29 | +```sql |
| 30 | +--Note when specifying the column list, input field numbers start from 1 |
| 31 | +COPY INTO target_table (Col_one default 'myStringDefault' 1, Col_two default 1 3) |
| 32 | +FROM 'https://adlsgen2account.dfs.core.windows.net/myblobcontainer/folder1/' |
| 33 | +WITH ( |
| 34 | + FILE_TYPE = 'CSV' |
| 35 | + ,CREDENTIAL=(IDENTITY= 'Storage Account Key', SECRET='<Your_Account_Key>') |
| 36 | + --CREDENTIAL should look something like this: |
| 37 | + --CREDENTIAL=(IDENTITY= 'Storage Account Key', SECRET='x6RWv4It5F2msnjelv3H4DA80n0QW0daPdw43jM0nyetx4c6CpDkdj3986DX5AHFMIf/YN4y6kkCnU8lb+Wx0Pj+6MDw=='), |
| 38 | + ,ROWTERMINATOR='0x0A' --0x0A specifies to use the Line Feed character (Unix based systems) |
| 39 | +) |
| 40 | +``` |
| 41 | + |
| 42 | +## B. Shared Access Signatures (SAS) with CRLF as the row terminator |
| 43 | +```sql |
| 44 | +COPY INTO target_table |
| 45 | +FROM 'https://adlsgen2account.dfs.core.windows.net/myblobcontainer/folder1/' |
| 46 | +WITH ( |
| 47 | + FILE_TYPE = 'CSV' |
| 48 | + ,CREDENTIAL=(IDENTITY= 'Shared Access Signature', SECRET='<Your_SAS_Token>') |
| 49 | + --CREDENTIAL should look something like this: |
| 50 | + --CREDENTIAL=(IDENTITY= 'Shared Access Signature', SECRET='?sv=2018-03-28&ss=bfqt&srt=sco&sp=rl&st=2016-10-17T20%3A14%3A55Z&se=2021-10-18T20%3A19%3A00Z&sig=IEoOdmeYnE9%2FKiJDSFSYsz4AkNa%2F%2BTx61FuQ%2FfKHefqoBE%3D'), |
| 51 | + ,ROWTERMINATOR='\n'-- COPY command automatically prefixes the \r character when \n (newline) is specified. This results in carriage return newline (\r\n) for Windows based systems. |
| 52 | +) |
| 53 | +``` |
| 54 | + |
| 55 | +> [!IMPORTANT] |
| 56 | +> |
| 57 | +> - Specifying the ROWTERMINATOR as '\r\n' will be interpreted as '\r\r\n' which will result in parsing issues |
| 58 | +
|
| 59 | +## C. Managed Identity |
| 60 | + |
| 61 | +Managed Identity authentication is required when your storage account is attached to a VNet. |
| 62 | + |
| 63 | +### Prerequisites |
| 64 | + |
| 65 | +1. Install Azure PowerShell using this [guide](/powershell/azure/install-az-ps?toc=/azure/synapse-analytics/sql-data-warehouse/toc.json&bc=/azure/synapse-analytics/sql-data-warehouse/breadcrumb/toc.json). |
| 66 | +2. If you have a general-purpose v1 or blob storage account, you must first upgrade to general-purpose v2 using this [guide](../../storage/common/storage-account-upgrade.md?toc=/azure/synapse-analytics/sql-data-warehouse/toc.json&bc=/azure/synapse-analytics/sql-data-warehouse/breadcrumb/toc.json). |
| 67 | +3. You must have **Allow trusted Microsoft services to access this storage account** turned on under Azure Storage account **Firewalls and Virtual networks** settings menu. Refer to this [guide](../../storage/common/storage-network-security.md?toc=/azure/synapse-analytics/sql-data-warehouse/toc.json&bc=/azure/synapse-analytics/sql-data-warehouse/breadcrumb/toc.json#exceptions) for more information. |
| 68 | +#### Steps |
| 69 | + |
| 70 | +1. In PowerShell, **register your SQL server** with Azure Active Directory (AAD): |
| 71 | + |
| 72 | + ```powershell |
| 73 | + Connect-AzAccount |
| 74 | + Select-AzSubscription -SubscriptionId your-subscriptionId |
| 75 | + Set-AzSqlServer -ResourceGroupName your-database-server-resourceGroup -ServerName your-database-servername -AssignIdentity |
| 76 | + ``` |
| 77 | + |
| 78 | +2. Create a **general-purpose v2 Storage Account** using this [guide](../../storage/common/storage-account-create.md?toc=/azure/synapse-analytics/sql-data-warehouse/toc.json&bc=/azure/synapse-analytics/sql-data-warehouse/breadcrumb/toc.json). |
| 79 | + |
| 80 | + > [!NOTE] |
| 81 | + > If you have a general-purpose v1 or blob storage account, you must **first upgrade to v2** using this [guide](../../storage/common/storage-account-upgrade.md?toc=/azure/synapse-analytics/sql-data-warehouse/toc.json&bc=/azure/synapse-analytics/sql-data-warehouse/breadcrumb/toc.json). |
| 82 | +
|
| 83 | +3. Under your storage account, navigate to **Access Control (IAM)**, and select **Add role assignment**. Assign **Storage Blob Data Owner, Contributor, or Reader** RBAC role to your SQL server. |
| 84 | + |
| 85 | + > [!NOTE] |
| 86 | + > Only members with Owner privilege can perform this step. For various built-in roles for Azure resources, refer to this [guide](../../role-based-access-control/built-in-roles.md?toc=/azure/synapse-analytics/sql-data-warehouse/toc.json&bc=/azure/synapse-analytics/sql-data-warehouse/breadcrumb/toc.json). |
| 87 | + |
| 88 | +4. You can now run the COPY statement specifying "Managed Identity": |
| 89 | + |
| 90 | + ```sql |
| 91 | + COPY INTO dbo.target_table |
| 92 | + FROM 'https://myaccount.blob.core.windows.net/myblobcontainer/folder1/*.txt' |
| 93 | + WITH ( |
| 94 | + FILE_TYPE = 'CSV', |
| 95 | + CREDENTIAL = (IDENTITY = 'Managed Identity'), |
| 96 | + ) |
| 97 | + ``` |
| 98 | + |
| 99 | +> [!IMPORTANT] |
| 100 | +> |
| 101 | +> - Specify the **Storage** **Blob Data** Owner, Contributor, or Reader RBAC role. These roles are different than the Azure built-in roles of Owner, Contributor, and Reader. |
| 102 | +
|
| 103 | +## D. Azure Active Directory Authentication (AAD) |
| 104 | +#### Steps |
| 105 | + |
| 106 | +1. Under your storage account, navigate to **Access Control (IAM)**, and select **Add role assignment**. Assign **Storage Blob Data Owner, Contributor, or Reader** RBAC role to your AAD user. |
| 107 | + |
| 108 | +2. Configure Azure AD authentication by going through the following [documentation](https://docs.microsoft.com/azure/sql-database/sql-database-aad-authentication-configure?tabs=azure-powershell#create-an-azure-ad-administrator-for-azure-sql-server). |
| 109 | + |
| 110 | +3. Connect to your SQL pool using Active Directory where you can now run the COPY statement without specifying any credentials: |
| 111 | + |
| 112 | + ```sql |
| 113 | + COPY INTO dbo.target_table |
| 114 | + FROM 'https://myaccount.blob.core.windows.net/myblobcontainer/folder1/*.txt' |
| 115 | + WITH ( |
| 116 | + FILE_TYPE = 'CSV' |
| 117 | + ) |
| 118 | + ``` |
| 119 | + |
| 120 | +> [!IMPORTANT] |
| 121 | +> |
| 122 | +> - Specify the **Storage** **Blob Data** Owner, Contributor, or Reader RBAC role. These roles are different than the Azure built-in roles of Owner, Contributor, and Reader. |
| 123 | +
|
| 124 | +## E. Service Principal Authentication |
| 125 | +#### Steps |
| 126 | + |
| 127 | +1. [Create an Azure Active Directory (AAD) application](https://docs.microsoft.com/azure/active-directory/develop/howto-create-service-principal-portal#create-an-azure-active-directory-application) |
| 128 | +2. [Get application ID](https://docs.microsoft.com/azure/active-directory/develop/howto-create-service-principal-portal#get-values-for-signing-in) |
| 129 | +3. [Get the authentication key](https://docs.microsoft.com/azure/active-directory/develop/howto-create-service-principal-portal#create-a-new-application-secret) |
| 130 | +4. [Get the V1 OAuth 2.0 token endpoint](https://docs.microsoft.com/azure/data-lake-store/data-lake-store-service-to-service-authenticate-using-active-directory?toc=/azure/synapse-analytics/sql-data-warehouse/toc.json&bc=/azure/synapse-analytics/sql-data-warehouse/breadcrumb/toc.json#step-4-get-the-oauth-20-token-endpoint-only-for-java-based-applications) |
| 131 | +5. [Assign read, write, and execution permissions to your AAD application](https://docs.microsoft.com/azure/data-lake-store/data-lake-store-service-to-service-authenticate-using-active-directory?toc=/azure/synapse-analytics/sql-data-warehouse/toc.json&bc=/azure/synapse-analytics/sql-data-warehouse/breadcrumb/toc.json#step-3-assign-the-azure-ad-application-to-the-azure-data-lake-storage-gen1-account-file-or-folder) on your storage account |
| 132 | +6. You can now run the COPY statement: |
| 133 | + |
| 134 | + ```sql |
| 135 | + COPY INTO dbo.target_table |
| 136 | + FROM 'https://myaccount.blob.core.windows.net/myblobcontainer/folder0/*.txt' |
| 137 | + WITH ( |
| 138 | + FILE_TYPE = 'CSV' |
| 139 | + ,CREDENTIAL=(IDENTITY= '<application_ID>@<OAuth_2.0_Token_EndPoint>' , SECRET= '<authentication_key>') |
| 140 | + --CREDENTIAL should look something like this: |
| 141 | + --,CREDENTIAL=(IDENTITY= '92761aac-12a9-4ec3-89b8-7149aef4c35b@https://login.microsoftonline.com/72f714bf-86f1-41af-91ab-2d7cd011db47/oauth2/token', SECRET='juXi12sZ6gse]woKQNgqwSywYv]7A.M') |
| 142 | + ) |
| 143 | + ``` |
| 144 | + |
| 145 | +> [!IMPORTANT] |
| 146 | +> |
| 147 | +> - Use the **V1** version of the OAuth 2.0 token endpoint |
| 148 | +
|
| 149 | +## Next steps |
| 150 | + |
| 151 | +- Check the [COPY statement article](https://docs.microsoft.com/sql/t-sql/statements/copy-into-transact-sql?view=azure-sqldw-latest#syntax) article for the detailed syntax |
| 152 | +- Check the [data loading overview](https://docs.microsoft.com/azure/synapse-analytics/sql-data-warehouse/design-elt-data-loading#what-is-elt) article for loading best practices |
0 commit comments