You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/data-factory/solution-template-databricks-notebook.md
+35-35Lines changed: 35 additions & 35 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -15,21 +15,21 @@ ms.date: 03/03/2020
15
15
16
16
# Transformation with Azure Databricks
17
17
18
-
In this tutorial, you create an end-to-end pipeline containing the **Validation**, **Copy data**, and **Notebook** activities in Data Factory.
18
+
In this tutorial, you create an end-to-end pipeline that contains the **Validation**, **Copy data**, and **Notebook** activities in Azure Data Factory.
19
19
20
20
-**Validation** ensures that your source dataset is ready for downstream consumption before you trigger the copy and analytics job.
21
21
22
-
-**Copy data** duplicates the source dataset to the sink storage, which is mounted as DBFS in the Databricks notebook. In this way, the dataset can be directly consumed by Spark.
22
+
-**Copy data** duplicates the source dataset to the sink storage, which is mounted as DBFS in the Azure Databricks notebook. In this way, the dataset can be directly consumed by Spark.
23
23
24
-
-**Notebook** triggers the Databricks notebook that transforms the dataset. It also adds the dataset to a processed folder or SQL Data Warehouse.
24
+
-**Notebook** triggers the Databricks notebook that transforms the dataset. It also adds the dataset to a processed folder or Azure SQL Data Warehouse.
25
25
26
26
For simplicity, the template in this tutorial doesn't create a scheduled trigger. You can add one if necessary.
**Save the access token**for later use in creating a Databricks linked service. The access token looks something like `dapi32db32cbb4w6eee18b7d87e45exxxxxx`.
87
+
*Save the access token*for later use in creating a Databricks linked service. The access token looks something like `dapi32db32cbb4w6eee18b7d87e45exxxxxx`.
88
88
89
89
## How to use this template
90
90
@@ -94,89 +94,89 @@ To import a **Transformation** notebook to your Databricks workspace:
94
94
95
95
-**Source Blob Connection** – to access the source data.
96
96
97
-
For this exercise, you can use the public blob storage that contains the source files. Reference following screenshot for configuration. Use the following **SASURL** to connect to source storage (read-only access):
97
+
For this exercise, you can use the public blob storage that contains the source files. Reference the following screenshot for the configuration. Use the following **SASURL** to connect to source storage (read-only access):

108
108
109
109
-**Azure Databricks** – to connect to the Databricks cluster.
110
110
111
-
Create a Databricks-linked service using the access key you generated previously. You may opt to select an *interactive cluster*if you have one. This example uses the **New job cluster** option.
111
+
Create a Databricks-linked service by using the access key that you generated previously. You can opt to select an *interactive cluster*if you have one. This example uses the **New job cluster** option.

114
114
115
115
1. Select **Use this template**. You'll see a pipeline created.
116
116
117
117

118
118
119
119
## Pipeline introduction and configuration
120
120
121
-
In the new pipeline, most settings are configured automatically with default values. Review the configurations of your pipeline and make any necessary changes.
121
+
In the new pipeline, most settings are configured automatically with default values. Review the configurations of your pipeline and make any necessary changes.
122
122
123
-
1. In the **Validation** activity **Availability flag**, verify that the source **Dataset** value isset to `SourceAvailabilityDataset` that you created earlier.
123
+
1. In the **Validation** activity **Availability flag**, verify that the source **Dataset** value isset to `SourceAvailabilityDataset` that you created earlier.
**Databricks linked service** should be pre-populated with the value from a previous step, as shown:
138
+

139
139
140
140
To check the **Notebook** settings:
141
141
142
-
1. Select the **Settings** tab. For **Notebook path**, verify that the default path is correct. You may need to browse and choose the correct notebook path.
142
+
1. Select the **Settings** tab. For **Notebook path**, verify that the default path is correct. You might need to browse and choose the correct notebook path.
1. Expand the **Base Parameters** selector and verify that the parameters match what is shown in the following screenshot. These parameters are passed to the Databricks notebook from Data Factory.

158
158
159
159
-**SourceFilesDataset**- to access the source data.

174
174
175
-
You can also verify the data file using storage explorer.
175
+
You can also verify the data fileby using Azure Storage Explorer.
176
176
177
177
> [!NOTE]
178
-
> For correlating with Data Factory pipeline runs, this example appends the pipeline run IDfrom data factory to the output folder. This helps keep track of files generated by each run.
> For correlating with Data Factory pipeline runs, this example appends the pipeline run IDfromthe data factory to the output folder. This helps keep track of files generated by each run.
179
+
> 
0 commit comments