Skip to content

Commit d4cce39

Browse files
committed
partial edit
1 parent daeeef0 commit d4cce39

File tree

1 file changed

+15
-27
lines changed

1 file changed

+15
-27
lines changed

articles/data-factory/solution-template-databricks-notebook.md

Lines changed: 15 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -17,39 +17,37 @@ ms.date: 03/03/2020
1717

1818
In this tutorial, you create an end-to-end pipeline containing **Validation**, **Copy**, and **Notebook** activities in Data Factory.
1919

20-
- **Validation** activity is used to ensure the source dataset is ready for downstream consumption, before triggering the copy and analytics job.
20+
- **Validation** ensures that your source dataset is ready for downstream consumption before you trigger the copy and analytics job.
2121

22-
- **Copy** activity copies the source file/ dataset to the sink storage. The sink storage is mounted as DBFS in the Databricks notebook so that the dataset can be directly consumed by Spark.
22+
- **Copy** duplicates the source dataset to the sink storage which is mounted as DBFS in the Databricks notebook. In this way, the dataset can be directly consumed by Spark.
2323

24-
- **Databricks Notebook** activity triggers the Databricks notebook that transforms the dataset, and adds it to a processed folder/ SQL DW.
24+
- **Notebook** triggers the Databricks notebook that transforms the dataset. It also adds the dataset to a processed folder or SQL Data Warehouse.
2525

26-
To keep this template simple, the template doesn't create a scheduled trigger. You can add that if necessary.
26+
For simplicity, the template in this tutorial doesn't create a scheduled trigger. You can add one if necessary.
2727

2828
![1](media/solution-template-Databricks-notebook/pipeline-example.png)
2929

3030
## Prerequisites
3131

32-
1. Create a **blob storage account** and a container called `sinkdata` to be used as **sink**. Keep a note of the **storage account name**, **container name**, and **access key**, since they are referenced later in the template.
32+
- A **blob storage account** with a container called `sinkdata` for use as **sink**
3333

34-
2. Ensure you have an **Azure Databricks workspace** or create a new one.
34+
Make note of the **storage account name**, **container name**, and **access key**. You'll need these values later in the template.
3535

36-
3. **Import the notebook for Transformation**.
37-
1. In your Azure Databricks, reference following screenshots for importing a **Transformation** notebook to the Databricks workspace. It does not have to be in the same location as below, but remember the path that you choose for later.
38-
<<<<<<< HEAD
36+
- An **Azure Databricks workspace**
3937

40-
![2](media/solution-template-Databricks-notebook/Databricks-tutorial-image02.png)
38+
## Import a notebook for Transformation
4139

42-
1. Select "Import from: **URL**", and enter following URL in the textbox:
43-
=======
44-
45-
![2](media/solution-template-Databricks-notebook/import-notebook.png)
46-
47-
1. Select "Import from: **URL**", and enter following URL in the textbox:
40+
To import a **Transformation** notebook to your Databricks workspace:
41+
42+
1. Sign in to your Azure Databricks account.
43+
1. In your Databricks workspace, select Import.
44+
![2](media/solution-template-Databricks-notebook/import-notebook.png)
45+
Your Databricks location can be different from the one shown, but remember it for later.
46+
1. Select "Import from: **URL**", and enter following URL in the textbox:
4847

4948
* `https://adflabstaging1.blob.core.windows.net/share/Transformations.html`
5049

5150
![3](media/solution-template-Databricks-notebook/import-from-url.png)
52-
>>>>>>> 41a83460d90e44f5f7f328ce77e1e2f72733848a
5351

5452
`https://adflabstaging1.blob.core.windows.net/share/Transformations.html`
5553

@@ -129,23 +127,13 @@ In the new pipeline created, most settings have been configured automatically wi
129127

130128
![14](media/solution-template-Databricks-notebook/copy-sink-settings.png)
131129

132-
<<<<<<< HEAD
133-
1. A Notebook activity **Transformation** is created, and the linked service created in previous step is selected.
134-
![16](media/solution-template-Databricks-notebook/Databricks-tutorial-image16.png)
135-
136-
1. Select **Settings** tab. For *Notebook path*, the template defines a path by default. You may need to browse and select the correct notebook path uploaded in **Prerequisite** 2.
137-
138-
![17](media/solution-template-Databricks-notebook/databricks-tutorial-image17.png)
139-
140-
=======
141130
1. A Notebook activity **Transformation** is created, and the linked service created in previous step is selected.
142131
![16](media/solution-template-Databricks-notebook/notebook-activity.png)
143132

144133
1. Select **Settings** tab. For *Notebook path*, the template defines a path by default. You may need to browse and select the correct notebook path uploaded in **Prerequisite** 2.
145134

146135
![17](media/solution-template-Databricks-notebook/notebook-settings.png)
147136

148-
>>>>>>> 41a83460d90e44f5f7f328ce77e1e2f72733848a
149137
1. Check out the *Base Parameters* created as shown in the screenshot. They are to be passed to the Databricks notebook from Data Factory.
150138

151139
![Base parameters](media/solution-template-Databricks-notebook/base-parameters.png)

0 commit comments

Comments
 (0)