You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/storage/blobs/data-lake-storage-use-databricks-spark.md
+15-26Lines changed: 15 additions & 26 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -7,7 +7,7 @@ author: normesta
7
7
ms.subservice: data-lake-storage-gen2
8
8
ms.service: storage
9
9
ms.topic: tutorial
10
-
ms.date: 11/19/2019
10
+
ms.date: 02/01/2023
11
11
ms.author: normesta
12
12
ms.reviewer: dineshm
13
13
ms.custom: devx-track-python, py-fresh-zinc
@@ -41,6 +41,10 @@ If you don't have an Azure subscription, create a [free account](https://azure.m
41
41
42
42
See [Tutorial: Connect to Azure Data Lake Storage Gen2](/azure/databricks/getting-started/connect-to-azure-storage) (Steps 1 through 3). After completing these steps, make sure to paste the tenant ID, app ID, and client secret values into a text file. You'll need those soon.
43
43
44
+
- An Azure Databricks workspace. See [Create an Azure Databricks workspace](/databricks/getting-started/#--create-an-azure-databricks-workspace).
45
+
46
+
- An Azure Databricks cluster. See [Create a cluster](/databricks/getting-started/quick-start#step-1-create-a-cluster).
47
+
44
48
### Download the flight data
45
49
46
50
This tutorial uses flight data from the Bureau of Transportation Statistics to demonstrate how to perform an ETL operation. You must download this data to complete the tutorial.
@@ -73,26 +77,6 @@ In this section, you create an Azure Databricks service by using the Azure porta
73
77
74
78
4. Select **Pin to dashboard** and then select **Create**.
75
79
76
-
## Create a Spark cluster in Azure Databricks
77
-
78
-
1. In the Azure portal, go to the Databricks service that you created, and select **Launch Workspace**.
79
-
80
-
2. You're redirected to the Azure Databricks portal. From the portal, select **Cluster**.
81
-
82
-

83
-
84
-
3. In the **New cluster** page, provide the values to create a cluster.
85
-
86
-

87
-
88
-
Fill in values for the following fields, and accept the default values for the other fields:
89
-
90
-
- Enter a name for the cluster.
91
-
92
-
- Make sure you select the **Terminate after 120 minutes of inactivity** checkbox. Provide a duration (in minutes) to terminate the cluster, if the cluster is not being used.
93
-
94
-
4. Select **Create cluster**. After the cluster is running, you can attach notebooks to the cluster and run Spark jobs.
95
-
96
80
## Ingest data
97
81
98
82
### Copy source data into the storage account
@@ -125,15 +109,20 @@ In this section, you'll create a container and a folder in your storage account.
125
109
126
110
1. In the [Azure portal](https://portal.azure.com), go to the Azure Databricks service that you created, and select **Launch Workspace**.
127
111
128
-
2. On the left, select **Workspace**. From the **Workspace** drop-down, select **Create** > **Notebook**.
112
+
2. In the sidebar, select **Workspace**.
113
+
114
+
3. In the Workspace folder, select **Create > Notebook**.
115
+
116
+
> [!div class="mx-imgBorder"]
117
+
> 
129
118
130
-

119
+
4. In the **Create Notebook** dialog, enter a name and then select **Python**in the **Default Language** drop-down list. This selection determines the default language of the notebook.
131
120
132
-
3. In the **Create Notebook**dialog box, enter a name for the notebook. Select **Python** as the language, and then select the Spark cluster that you created earlier.
121
+
5. In the **Cluster**drop-down list, make sure that the cluster you created earlier is selected.
133
122
134
-
4. Select**Create**.
123
+
6. Click**Create**. The notebook opens with an empty cell at the top.
135
124
136
-
5. Copy and paste the following code block into the first cell, but don't run this code yet.
125
+
7. Copy and paste the following code block into the first cell, but don't run this code yet.
0 commit comments