You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/azure-databricks/databricks-extract-load-sql-data-warehouse.md
+15-15Lines changed: 15 additions & 15 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -13,15 +13,15 @@ ms.date: 07/26/2018
13
13
---
14
14
# Tutorial: Extract, transform, and load data using Azure Databricks
15
15
16
-
In this tutorial, you perform an ETL (extract, transform, and load data) operation using Azure Databricks. You extract data from Azure Data Lake Store into Azure Databricks, run transformations on the data in Azure Databricks, and then load the transformed data into Azure SQL Data Warehouse.
16
+
In this tutorial, you perform an ETL (extract, transform, and load data) operation using Azure Databricks. You extract data from Azure Data Lake Store into Azure Databricks, run transformations on the data in Azure Databricks, and then load the transformed data into Azure SQL Data Warehouse.
17
17
18
18
The steps in this tutorial use the SQL Data Warehouse connector for Azure Databricks to transfer data to Azure Databricks. This connector, in turn, uses Azure Blob Storage as temporary storage for the data being transferred between an Azure Databricks cluster and Azure SQL Data Warehouse.
19
19
20
20
The following illustration shows the application flow:
21
21
22
22

23
23
24
-
This tutorial covers the following tasks:
24
+
This tutorial covers the following tasks:
25
25
26
26
> [!div class="checklist"]
27
27
> * Create an Azure Databricks workspace
@@ -58,7 +58,7 @@ In this section, you create an Azure Databricks workspace using the Azure portal
58
58
59
59

60
60
61
-
Provide the following values:
61
+
Provide the following values:
62
62
63
63
|Property |Description |
64
64
|---------|---------|
@@ -89,14 +89,14 @@ In this section, you create an Azure Databricks workspace using the Azure portal
89
89
Accept all other defaults other than the following values:
90
90
91
91
* Enter a name for the cluster.
92
-
* For this article, create a cluster with **4.0** runtime.
92
+
* For this article, create a cluster with **4.0** runtime.
93
93
* Make sure you select the **Terminate after \_\_ minutes of inactivity** checkbox. Provide a duration (in minutes) to terminate the cluster, if the cluster is not being used.
94
94
95
95
Select **Create cluster**. Once the cluster is running, you can attach notebooks to the cluster and run Spark jobs.
96
96
97
97
## Create an Azure Data Lake Store account
98
98
99
-
In this section, you create an Azure Data Lake Store account and associate an Azure Active Directory service principal with it. Later in this tutorial, you use this service principal in Azure Databricks to access Azure Data Lake Store.
99
+
In this section, you create an Azure Data Lake Store account and associate an Azure Active Directory service principal with it. Later in this tutorial, you use this service principal in Azure Databricks to access Azure Data Lake Store.
100
100
101
101
1. From the [Azure portal](https://portal.azure.com), select **Create a resource** > **Storage** > **Data Lake Store**.
102
102
3. In the **New Data Lake Store** blade, provide the values as shown in the following screenshot:
@@ -183,7 +183,7 @@ When programmatically logging in, you need to pass the tenant ID with your authe
183
183
184
184
1. Copy the **Directory ID**. This value is your tenant ID.
@@ -300,7 +300,7 @@ You have now extracted the data from Azure Data Lake Store into Azure Databricks
300
300
301
301
## Transform data in Azure Databricks
302
302
303
-
The raw sample data **small_radio_json.json** captures the audience for a radio station and has a variety of columns. In this section, you transform the data to only retrieve specific columns in from the dataset.
303
+
The raw sample data **small_radio_json.json** captures the audience for a radio station and has a variety of columns. In this section, you transform the data to only retrieve specific columns in from the dataset.
304
304
305
305
1. Start by retrieving only the columns *firstName*, *lastName*, *gender*, *location*, and *level* from the dataframe you already created.
306
306
@@ -334,7 +334,7 @@ The raw sample data **small_radio_json.json** captures the audience for a radio
5. Run the following snippet to load the transformed dataframe, **renamedColumnsDf**, as a table in SQL data warehouse. This snippet creates a table called **SampleTable** in the SQL database. Please note that Azure SQL DW requires a master key. You can create a master key by executing "CREATE MASTER KEY;" command in SQL Server Management Studio.
402
+
5. Run the following snippet to load the transformed dataframe, **renamedColumnsDf**, as a table in SQL data warehouse. This snippet creates a table called **SampleTable** in the SQL database. Please note that Azure SQL DW requires a master key. You can create a master key by executing "CREATE MASTER KEY;" command in SQL Server Management Studio.
@@ -428,9 +428,9 @@ After you have finished running the tutorial, you can terminate the cluster. To
428
428
429
429

430
430
431
-
If you do not manually terminate the cluster it will automatically stop, provided you selected the **Terminate after __ minutes of inactivity** checkbox while creating the cluster. In such a case, the cluster automatically stops if it has been inactive for the specified time.
431
+
If you do not manually terminate the cluster it will automatically stop, provided you selected the **Terminate after \_\_ minutes of inactivity** checkbox while creating the cluster. In such a case, the cluster automatically stops if it has been inactive for the specified time.
0 commit comments