You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/data-factory/solution-template-databricks-notebook.md
+15-27Lines changed: 15 additions & 27 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -17,39 +17,37 @@ ms.date: 03/03/2020
17
17
18
18
In this tutorial, you create an end-to-end pipeline containing **Validation**, **Copy**, and **Notebook** activities in Data Factory.
19
19
20
-
-**Validation**activity is used to ensure the source dataset is ready for downstream consumption, before triggering the copy and analytics job.
20
+
-**Validation**ensures that your source dataset is ready for downstream consumption before you trigger the copy and analytics job.
21
21
22
-
-**Copy**activity copies the source file/ dataset to the sink storage. The sink storage is mounted as DBFS in the Databricks notebook so that the dataset can be directly consumed by Spark.
22
+
-**Copy**duplicates the source dataset to the sink storage which is mounted as DBFS in the Databricks notebook. In this way, the dataset can be directly consumed by Spark.
23
23
24
-
-**Databricks Notebook**activity triggers the Databricks notebook that transforms the dataset, and adds it to a processed folder/ SQL DW.
24
+
-**Notebook** triggers the Databricks notebook that transforms the dataset. It also adds the dataset to a processed folder or SQL Data Warehouse.
25
25
26
-
To keep this template simple, the template doesn't create a scheduled trigger. You can add that if necessary.
26
+
For simplicity, the template in this tutorial doesn't create a scheduled trigger. You can add one if necessary.
1. Create a **blob storage account**and a container called `sinkdata`to be used as **sink**. Keep a note of the **storage account name**, **container name**, and **access key**, since they are referenced later in the template.
32
+
- A **blob storage account**with a container called `sinkdata`for use as **sink**
33
33
34
-
2. Ensure you have an **Azure Databricks workspace** or create a new one.
34
+
Make note of the **storage account name**, **container name**, and **access key**. You'll need these values later in the template.
35
35
36
-
3.**Import the notebook for Transformation**.
37
-
1. In your Azure Databricks, reference following screenshots for importing a **Transformation** notebook to the Databricks workspace. It does not have to be in the same location as below, but remember the path that you choose for later.
1. Select **Settings** tab. For *Notebook path*, the template defines a path by default. You may need to browse and select the correct notebook path uploaded in**Prerequisite**2.
1. Select **Settings** tab. For *Notebook path*, the template defines a path by default. You may need to browse and select the correct notebook path uploaded in**Prerequisite**2.
0 commit comments