You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/data-factory/transform-data-using-databricks-notebook.md
+12-12Lines changed: 12 additions & 12 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -7,7 +7,7 @@ ms.topic: tutorial
7
7
ms.author: abnarain
8
8
author: nabhishek
9
9
ms.custom: seo-lt-2019
10
-
ms.date: 08/31/2021
10
+
ms.date: 09/08/2021
11
11
---
12
12
13
13
# Run a Databricks notebook with the Databricks Notebook Activity in Azure Data Factory
@@ -104,7 +104,7 @@ In this section, you author a Databricks linked service. This linked service con
104
104
105
105
1. For **Access Token**, generate it from Azure Databricks workplace. You can find the steps [here](https://docs.databricks.com/api/latest/authentication.html#generate-token).
1. For **Cluster version**, select the version you want to use.
108
108
109
109
1. For **Cluster node type**, select **Standard\_D3\_v2** under **General Purpose (HDD)** category for this tutorial.
110
110
@@ -144,13 +144,13 @@ In this section, you author a Databricks linked service. This linked service con
144
144
145
145
1. Create a **New Folder** in Workplace and call it as **adftutorial**.
146
146
147
-

147
+
:::image type="content" source="media/transform-data-using-databricks-notebook/databricks-notebook-activity-image13.png" alt-text="Screenshot showing how to create a new folder.":::
148
148
149
149
1.[Screenshot showing how to create a new notebook.](https://docs.databricks.com/user-guide/notebooks/index.html#creating-a-notebook) (Python), let’s call it **mynotebook** under **adftutorial** Folder, click **Create.**
150
150
151
-

151
+
:::image type="content" source="media/transform-data-using-databricks-notebook/databricks-notebook-activity-image14.png" alt-text="Screenshot showing how to create a new notebook.":::
152
152
153
-

153
+
:::image type="content" source="media/transform-data-using-databricks-notebook/databricks-notebook-activity-image15.png" alt-text="Screenshot showing how to set the properties of the new notebook.":::
154
154
155
155
1. In the newly created notebook "mynotebook'" add the following code:
156
156
@@ -163,7 +163,7 @@ In this section, you author a Databricks linked service. This linked service con
163
163
print (y)
164
164
```
165
165
166
-

166
+
:::image type="content" source="media/transform-data-using-databricks-notebook/databricks-notebook-activity-image16.png" alt-text="Screenshot showing how to create widgets for parameters.":::
167
167
168
168
1. The **Notebook Path** in this case is **/adftutorial/mynotebook**.
169
169
@@ -197,25 +197,25 @@ The **Pipeline run** dialog box asks for the **name** parameter. Use **/path/fil
197
197
198
198
1. Switch to the **Monitor** tab. Confirm that you see a pipeline run. It takes approximately 5-8 minutes to create a Databricks job cluster, where the notebook is executed.
199
199
200
-

200
+
:::image type="content" source="media/transform-data-using-databricks-notebook/databricks-notebook-activity-image-22.png" alt-text="Screenshot showing how to monitor the pipeline.":::
201
201
202
202
1. Select **Refresh** periodically to check the status of the pipeline run.
203
203
204
-
1. To see activity runs associated with the pipeline run, select **View Activity Runs** in the **Actions** column.
204
+
1. To see activity runs associated with the pipeline run, select **pipeline1** link in the **Pipeline name** column.
205
205
206
-

206
+
1. In the **Activity runs** page, select **Output** in the **Activity name** column to view the output of each activity, and you can find the link to Databricks logs in the **Output** pane for more detailed Spark logs.
207
207
208
-
You can switch back to the pipeline runs view by selecting the **Pipelines** link at the top.
208
+
1. You can switch back to the pipeline runs view by selecting the **All pipeline runs** link in the breadcrumb menu at the top.
209
209
210
210
## Verify the output
211
211
212
212
You can log on to the **Azure Databricks workspace**, go to **Clusters** and you can see the **Job** status as *pending execution, running, or terminated*.
213
213
214
-

214
+
:::image type="content" source="media/transform-data-using-databricks-notebook/databricks-notebook-activity-image24.png" alt-text="Screenshot showing how to view the job cluster and the job.":::
215
215
216
216
You can click on the **Job name** and navigate to see further details. On successful run, you can validate the parameters passed and the output of the Python notebook.
217
217
218
-

218
+
:::image type="content" source="media/transform-data-using-databricks-notebook/databricks-output.png" alt-text="Screenshot showing how to view the run details and output.":::
0 commit comments