You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/data-factory/solution-template-databricks-notebook.md
+9-9Lines changed: 9 additions & 9 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -15,11 +15,11 @@ ms.date: 03/03/2020
15
15
16
16
# Transformation with Azure Databricks
17
17
18
-
In this tutorial, you create an end-to-end pipeline containing **Validation**, **Copy**, and **Notebook** activities in Data Factory.
18
+
In this tutorial, you create an end-to-end pipeline containing the **Validation**, **Copy data**, and **Notebook** activities in Data Factory.
19
19
20
20
-**Validation** ensures that your source dataset is ready for downstream consumption before you trigger the copy and analytics job.
21
21
22
-
-**Copy** duplicates the source dataset to the sink storage, which is mounted as DBFS in the Databricks notebook. In this way, the dataset can be directly consumed by Spark.
22
+
-**Copy data** duplicates the source dataset to the sink storage, which is mounted as DBFS in the Databricks notebook. In this way, the dataset can be directly consumed by Spark.
23
23
24
24
-**Notebook** triggers the Databricks notebook that transforms the dataset. It also adds the dataset to a processed folder or SQL Data Warehouse.
25
25
@@ -84,7 +84,7 @@ To import a **Transformation** notebook to your Databricks workspace:
**Save the access token**for later use in creating a Databricks linked service. The access token looks something like 'dapi32db32cbb4w6eee18b7d87e45exxxxxx'.
87
+
**Save the access token**for later use in creating a Databricks linked service. The access token looks something like `dapi32db32cbb4w6eee18b7d87e45exxxxxx`.
88
88
89
89
## How to use this template
90
90
@@ -102,13 +102,13 @@ To import a **Transformation** notebook to your Databricks workspace:
102
102
103
103
-**Destination Blob Connection** – to store the copied data.
104
104
105
-
In the linked service, select your sink storage blob.
105
+
In the **New linked service** window, select your sink storage blob.
-**Azure Databricks** – to connect to the Databricks cluster.
110
110
111
-
Create a Databricks-linked service using the access key you generated previously. You may opt to select an *interactive cluster*if you have one. This example uses the *New job cluster* option.
111
+
Create a Databricks-linked service using the access key you generated previously. You may opt to select an *interactive cluster*if you have one. This example uses the **New job cluster** option.
@@ -118,9 +118,9 @@ To import a **Transformation** notebook to your Databricks workspace:
118
118
119
119
## Pipeline introduction and configuration
120
120
121
-
In the new pipeline, most settings are configured automatically with default values. Review the configurations of your pipeline and make any necessary changes:
121
+
In the new pipeline, most settings are configured automatically with default values. Review the configurations of your pipeline and make any necessary changes.
122
122
123
-
1. In the **Validation** activity **Availability flag**, verify that the source Dataset value isset to the `SourceAvailabilityDataset` created earlier.
123
+
1. In the **Validation** activity **Availability flag**, verify that the source **Dataset** value isset to `SourceAvailabilityDataset`that you created earlier.
@@ -139,7 +139,7 @@ In the new pipeline, most settings are configured automatically with default va
139
139
140
140
To check the **Notebook** settings:
141
141
142
-
1. Select **Settings** tab. For **Notebook path**, verify that the default path is correct. You may need to browse and choose the correct notebook path.
142
+
1. Select the **Settings** tab. For **Notebook path**, verify that the default path is correct. You may need to browse and choose the correct notebook path.
@@ -175,7 +175,7 @@ In the new pipeline, most settings are configured automatically with default va
175
175
You can also verify the data file using storage explorer.
176
176
177
177
> [!NOTE]
178
-
> For correlating with Data Factory pipeline runs, this example appends the pipeline run IDfrom data factory to the output folder. This way you can track back the files generated via each run.
178
+
> For correlating with Data Factory pipeline runs, this example appends the pipeline run IDfrom data factory to the output folder. This helps keep track of files generated by each run.
0 commit comments