You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/azure-databricks/databricks-extract-load-sql-data-warehouse.md
+20-20Lines changed: 20 additions & 20 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,6 @@
1
1
---
2
2
title: 'Tutorial - Perform ETL operations using Azure Databricks'
3
-
description: In this tutorial, learn how to extract data from Data Lake Storage Gen2 into Azure Databricks, transform the data, and then load the data into Azure SQL Data Warehouse.
3
+
description: In this tutorial, learn how to extract data from Data Lake Storage Gen2 into Azure Databricks, transform the data, and then load the data into Azure Synapse Analytics.
4
4
author: mamccrea
5
5
ms.author: mamccrea
6
6
ms.reviewer: jasonh
@@ -11,13 +11,13 @@ ms.date: 01/29/2020
11
11
---
12
12
# Tutorial: Extract, transform, and load data by using Azure Databricks
13
13
14
-
In this tutorial, you perform an ETL (extract, transform, and load data) operation by using Azure Databricks. You extract data from Azure Data Lake Storage Gen2 into Azure Databricks, run transformations on the data in Azure Databricks, and load the transformed data into Azure SQL Data Warehouse.
14
+
In this tutorial, you perform an ETL (extract, transform, and load data) operation by using Azure Databricks. You extract data from Azure Data Lake Storage Gen2 into Azure Databricks, run transformations on the data in Azure Databricks, and load the transformed data into Azure Synapse Analytics.
15
15
16
-
The steps in this tutorial use the SQL Data Warehouse connector for Azure Databricks to transfer data to Azure Databricks. This connector, in turn, uses Azure Blob Storage as temporary storage for the data being transferred between an Azure Databricks cluster and Azure SQL Data Warehouse.
16
+
The steps in this tutorial use the Azure Synapse connector for Azure Databricks to transfer data to Azure Databricks. This connector, in turn, uses Azure Blob Storage as temporary storage for the data being transferred between an Azure Databricks cluster and Azure Synapse.
17
17
18
18
The following illustration shows the application flow:
19
19
20
-

20
+

21
21
22
22
This tutorial covers the following tasks:
23
23
@@ -29,9 +29,9 @@ This tutorial covers the following tasks:
29
29
> * Create a service principal.
30
30
> * Extract data from the Azure Data Lake Storage Gen2 account.
31
31
> * Transform data in Azure Databricks.
32
-
> * Load data into Azure SQL Data Warehouse.
32
+
> * Load data into Azure Synapse.
33
33
34
-
If you don’t have an Azure subscription, create a [free account](https://azure.microsoft.com/free/?WT.mc_id=A261C142F) before you begin.
34
+
If you don't have an Azure subscription, create a [free account](https://azure.microsoft.com/free/?WT.mc_id=A261C142F) before you begin.
35
35
36
36
> [!Note]
37
37
> This tutorial cannot be carried out using **Azure Free Trial Subscription**.
@@ -41,9 +41,9 @@ If you don’t have an Azure subscription, create a [free account](https://azure
41
41
42
42
Complete these tasks before you begin this tutorial:
43
43
44
-
* Create an Azure SQL data warehouse, create a server-level firewall rule, and connect to the server as a server admin. See [Quickstart: Create and query an Azure SQL data warehouse in the Azure portal](../synapse-analytics/sql-data-warehouse/create-data-warehouse-portal.md).
44
+
* Create an Azure Synapse, create a server-level firewall rule, and connect to the server as a server admin. See [Quickstart: Create and query a Synapse SQL pool using the Azure portal](../synapse-analytics/sql-data-warehouse/create-data-warehouse-portal.md).
45
45
46
-
* Create a master key for the Azure SQL data warehouse. See [Create a database master key](https://docs.microsoft.com/sql/relational-databases/security/encryption/create-a-database-master-key).
46
+
* Create a master key for the Azure Synapse. See [Create a database master key](https://docs.microsoft.com/sql/relational-databases/security/encryption/create-a-database-master-key).
47
47
48
48
* Create an Azure Blob storage account, and a container within it. Also, retrieve the access key to access the storage account. See [Quickstart: Upload, download, and list blobs with the Azure portal](../storage/blobs/storage-quickstart-blobs-portal.md).
49
49
@@ -57,7 +57,7 @@ Complete these tasks before you begin this tutorial:
57
57
58
58
If you'd prefer to use an access control list (ACL) to associate the service principal with a specific file or directory, reference [Access control in Azure Data Lake Storage Gen2](../storage/blobs/data-lake-storage-access-control.md).
59
59
60
-
* When performing the steps in the [Get values for signing in](https://docs.microsoft.com/azure/active-directory/develop/howto-create-service-principal-portal#get-values-for-signing-in) section of the article, paste the tenant ID, app ID, and secret values into a text file. You'll need those soon.
60
+
* When performing the steps in the [Get values for signing in](https://docs.microsoft.com/azure/active-directory/develop/howto-create-service-principal-portal#get-values-for-signing-in) section of the article, paste the tenant ID, app ID, and secret values into a text file.
61
61
62
62
* Sign in to the [Azure portal](https://portal.azure.com/).
63
63
@@ -67,7 +67,7 @@ Make sure that you complete the prerequisites of this tutorial.
67
67
68
68
Before you begin, you should have these items of information:
69
69
70
-
:heavy_check_mark: The database name, database server name, user name, and password of your Azure SQL Data warehouse.
70
+
:heavy_check_mark: The database name, database server name, user name, and password of your Azure Synapse.
71
71
72
72
:heavy_check_mark: The access key of your blob storage account.
73
73
@@ -310,11 +310,11 @@ The raw sample data **small_radio_json.json** file captures the audience for a r
In this section, you upload the transformed data into Azure SQL Data Warehouse. You use the Azure SQL Data Warehouse connector for Azure Databricks to directly upload a dataframe as a table in a SQL data warehouse.
315
+
In this section, you upload the transformed data into Azure Synapse. You use the Azure Synapse connector for Azure Databricks to directly upload a dataframe as a table in a Synapse Spark pool.
316
316
317
-
As mentioned earlier, the SQL Data Warehouse connector uses Azure Blob storage as temporary storage to upload data between Azure Databricks and Azure SQL Data Warehouse. So, you start by providing the configuration to connect to the storage account. You must already have already created the account as part of the prerequisites for this article.
317
+
As mentioned earlier, the Azure Synapse connector uses Azure Blob storage as temporary storage to upload data between Azure Databricks and Azure Synapse. So, you start by providing the configuration to connect to the storage account. You must already have already created the account as part of the prerequisites for this article.
318
318
319
319
1. Provide the configuration to access the Azure Storage account from Azure Databricks.
320
320
@@ -324,7 +324,7 @@ As mentioned earlier, the SQL Data Warehouse connector uses Azure Blob storage a
324
324
valblobAccessKey="<access-key>"
325
325
```
326
326
327
-
2. Specify a temporary folder to use while moving data between Azure Databricks and Azure SQL Data Warehouse.
327
+
2. Specify a temporary folder to use while moving data between Azure Databricks and Azure Synapse.
4. Provide the values to connect to the Azure SQL Data Warehouse instance. You must have created a SQL data warehouse as a prerequisite. Use the fully qualified server name for **dwServer**. For example, `<servername>.database.windows.net`.
340
+
4. Provide the values to connect to the Azure Synapse instance. You must have created an Azure Synapse Analytics service as a prerequisite. Use the fully qualified server name for **dwServer**. For example, `<servername>.database.windows.net`.
341
341
342
342
```scala
343
-
//SQL Data Warehouse related settings
343
+
//Azure Synapse related settings
344
344
valdwDatabase="<database-name>"
345
345
valdwServer="<database-server-name>"
346
346
valdwUser="<user-name>"
@@ -351,7 +351,7 @@ As mentioned earlier, the SQL Data Warehouse connector uses Azure Blob storage a
5. Run the following snippet to load the transformed dataframe, **renamedColumnsDF**, as a table in a SQL data warehouse. This snippet creates a table called **SampleTable** in the SQL database.
354
+
5. Run the following snippet to load the transformed dataframe, **renamedColumnsDF**, as a table in Azure Synapse. This snippet creates a table called **SampleTable** in the SQL database.
355
355
356
356
```scala
357
357
spark.conf.set(
@@ -362,9 +362,9 @@ As mentioned earlier, the SQL Data Warehouse connector uses Azure Blob storage a
362
362
```
363
363
364
364
> [!NOTE]
365
-
> This sample uses the `forward_spark_azure_storage_credentials` flag, which causes SQL Data Warehouse to access data from blob storage using an Access Key. This is the only supported method of authentication.
365
+
> This sample uses the `forward_spark_azure_storage_credentials` flag, which causes Azure Synapse to access data from blob storage using an Access Key. This is the only supported method of authentication.
366
366
>
367
-
> If your Azure Blob Storage is restricted to select virtual networks, SQL Data Warehouse requires [Managed Service Identity instead of Access Keys](../sql-database/sql-database-vnet-service-endpoint-rule-overview.md#impact-of-using-vnet-service-endpoints-with-azure-storage). This will cause the error "This request is not authorized to perform this operation."
367
+
> If your Azure Blob Storage is restricted to select virtual networks, Azure Synapse requires [Managed Service Identity instead of Access Keys](../sql-database/sql-database-vnet-service-endpoint-rule-overview.md#impact-of-using-vnet-service-endpoints-with-azure-storage). This will cause the error "This request is not authorized to perform this operation."
368
368
369
369
6. Connect to the SQL database and verify that you see a database named **SampleTable**.
370
370
@@ -392,7 +392,7 @@ In this tutorial, you learned how to:
392
392
> * Create a notebook in Azure Databricks
393
393
> * Extract data from a Data Lake Storage Gen2 account
394
394
> * Transform data in Azure Databricks
395
-
> * Load data into Azure SQL Data Warehouse
395
+
> * Load data into Azure Synapse
396
396
397
397
Advance to the next tutorial to learn about streaming real-time data into Azure Databricks using Azure Event Hubs.
0 commit comments