Skip to content

Commit 55bcd7e

Browse files
authored
Merge pull request #225981 from normesta/gen2
Fixing outdated screenshots
2 parents cc225ac + 91bb36a commit 55bcd7e

File tree

3 files changed

+16
-52
lines changed

3 files changed

+16
-52
lines changed

articles/storage/blobs/data-lake-storage-use-databricks-spark.md

Lines changed: 16 additions & 52 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ author: normesta
77
ms.subservice: data-lake-storage-gen2
88
ms.service: storage
99
ms.topic: tutorial
10-
ms.date: 11/19/2019
10+
ms.date: 02/01/2023
1111
ms.author: normesta
1212
ms.reviewer: dineshm
1313
ms.custom: devx-track-python, py-fresh-zinc
@@ -21,7 +21,6 @@ This tutorial shows you how to connect your Azure Databricks cluster to data sto
2121
In this tutorial, you will:
2222

2323
> [!div class="checklist"]
24-
> - Create a Databricks cluster
2524
> - Ingest unstructured data into a storage account
2625
> - Run analytics on your data in Blob storage
2726
@@ -41,58 +40,18 @@ If you don't have an Azure subscription, create a [free account](https://azure.m
4140

4241
See [Tutorial: Connect to Azure Data Lake Storage Gen2](/azure/databricks/getting-started/connect-to-azure-storage) (Steps 1 through 3). After completing these steps, make sure to paste the tenant ID, app ID, and client secret values into a text file. You'll need those soon.
4342

44-
### Download the flight data
43+
- An Azure Databricks workspace. See [Create an Azure Databricks workspace](/azure/databricks/getting-started/#--create-an-azure-databricks-workspace).
44+
45+
- An Azure Databricks cluster. See [Create a cluster](/azure/databricks/getting-started/quick-start#step-1-create-a-cluster).
46+
47+
## Download the flight data
4548

4649
This tutorial uses flight data from the Bureau of Transportation Statistics to demonstrate how to perform an ETL operation. You must download this data to complete the tutorial.
4750

4851
1. Download the [On_Time_Reporting_Carrier_On_Time_Performance_1987_present_2016_1.zip](https://github.com/Azure-Samples/AzureStorageSnippets/blob/master/blobs/tutorials/On_Time_Reporting_Carrier_On_Time_Performance_1987_present_2016_1.zip) file. This file contains the flight data.
4952

5053
2. Unzip the contents of the zipped file and make a note of the file name and the path of the file. You need this information in a later step.
5154

52-
## Create an Azure Databricks service
53-
54-
In this section, you create an Azure Databricks service by using the Azure portal.
55-
56-
1. In the Azure portal, select **Create a resource** > **Analytics** > **Azure Databricks**.
57-
58-
![Databricks on Azure portal](./media/data-lake-storage-use-databricks-spark/azure-databricks-on-portal.png "Databricks on Azure portal")
59-
60-
2. Under **Azure Databricks Service**, provide the following values to create a Databricks service:
61-
62-
|Property |Description |
63-
|---------|---------|
64-
|**Workspace name** | Provide a name for your Databricks workspace. |
65-
|**Subscription** | From the drop-down, select your Azure subscription. |
66-
|**Resource group** | Specify whether you want to create a new resource group or use an existing one. A resource group is a container that holds related resources for an Azure solution. For more information, see [Azure Resource Group overview](../../azure-resource-manager/management/overview.md). |
67-
|**Location** | Select **West US 2**. For other available regions, see [Azure services available by region](https://azure.microsoft.com/regions/services/). |
68-
|**Pricing Tier** | Select **Standard**. |
69-
70-
![Create an Azure Databricks workspace](./media/data-lake-storage-use-databricks-spark/create-databricks-workspace.png "Create an Azure Databricks service")
71-
72-
3. The account creation takes a few minutes. To monitor the operation status, view the progress bar at the top.
73-
74-
4. Select **Pin to dashboard** and then select **Create**.
75-
76-
## Create a Spark cluster in Azure Databricks
77-
78-
1. In the Azure portal, go to the Databricks service that you created, and select **Launch Workspace**.
79-
80-
2. You're redirected to the Azure Databricks portal. From the portal, select **Cluster**.
81-
82-
![Databricks on Azure](./media/data-lake-storage-use-databricks-spark/databricks-on-azure.png "Databricks on Azure")
83-
84-
3. In the **New cluster** page, provide the values to create a cluster.
85-
86-
![Create Databricks Spark cluster on Azure](./media/data-lake-storage-use-databricks-spark/create-databricks-spark-cluster.png "Create Databricks Spark cluster on Azure")
87-
88-
Fill in values for the following fields, and accept the default values for the other fields:
89-
90-
- Enter a name for the cluster.
91-
92-
- Make sure you select the **Terminate after 120 minutes of inactivity** checkbox. Provide a duration (in minutes) to terminate the cluster, if the cluster is not being used.
93-
94-
4. Select **Create cluster**. After the cluster is running, you can attach notebooks to the cluster and run Spark jobs.
95-
9655
## Ingest data
9756

9857
### Copy source data into the storage account
@@ -125,15 +84,20 @@ In this section, you'll create a container and a folder in your storage account.
12584

12685
1. In the [Azure portal](https://portal.azure.com), go to the Azure Databricks service that you created, and select **Launch Workspace**.
12786

128-
2. On the left, select **Workspace**. From the **Workspace** drop-down, select **Create** > **Notebook**.
87+
2. In the sidebar, select **Workspace**.
88+
89+
3. In the Workspace folder, select **Create > Notebook**.
90+
91+
> [!div class="mx-imgBorder"]
92+
> ![Screenshot of create notebook option.](./media/data-lake-storage-use-databricks-spark/create-notebook.png)
12993
130-
![Create a notebook in Databricks](./media/data-lake-storage-use-databricks-spark/databricks-create-notebook.png "Create notebook in Databricks")
94+
4. In the **Create Notebook** dialog, enter a name and then select **Python** in the **Default Language** drop-down list. This selection determines the default language of the notebook.
13195

132-
3. In the **Create Notebook** dialog box, enter a name for the notebook. Select **Python** as the language, and then select the Spark cluster that you created earlier.
96+
5. In the **Cluster** drop-down list, make sure that the cluster you created earlier is selected.
13397

134-
4. Select **Create**.
98+
6. Click **Create**. The notebook opens with an empty cell at the top.
13599

136-
5. Copy and paste the following code block into the first cell, but don't run this code yet.
100+
7. Copy and paste the following code block into the first cell, but don't run this code yet.
137101

138102
```python
139103
configs = {"fs.azure.account.auth.type": "OAuth",
75 KB
Loading
371 Bytes
Loading

0 commit comments

Comments
 (0)