You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/data-factory/solution-template-databricks-notebook.md
+32-27Lines changed: 32 additions & 27 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -19,7 +19,7 @@ In this tutorial, you create an end-to-end pipeline containing **Validation**, *
19
19
20
20
-**Validation** ensures that your source dataset is ready for downstream consumption before you trigger the copy and analytics job.
21
21
22
-
-**Copy** duplicates the source dataset to the sink storage which is mounted as DBFS in the Databricks notebook. In this way, the dataset can be directly consumed by Spark.
22
+
-**Copy** duplicates the source dataset to the sink storage, which is mounted as DBFS in the Databricks notebook. In this way, the dataset can be directly consumed by Spark.
23
23
24
24
-**Notebook** triggers the Databricks notebook that transforms the dataset. It also adds the dataset to a processed folder or SQL Data Warehouse.
25
25
@@ -108,7 +108,7 @@ To import a **Transformation** notebook to your Databricks workspace:
108
108
109
109
-**Azure Databricks** – to connect to the Databricks cluster.
110
110
111
-
Create a Databricks-linked service using the access key you generated earier. You may opt to select an *interactive cluster*if you have one. This example uses the *New job cluster* option.
111
+
Create a Databricks-linked service using the access key you generated previously. You may opt to select an *interactive cluster*if you have one. This example uses the *New job cluster* option.
@@ -120,58 +120,63 @@ To import a **Transformation** notebook to your Databricks workspace:
120
120
121
121
In the new pipeline, most settings are configured automatically with default values. Review the configurations of your pipeline and make any necessary changes:
122
122
123
-
- A _Validation_ activity **Availability flag**is created for checking the source. The Dataset value should be set to *SourceAvailabilityDataset* which was created earlier.
123
+
1. In the **Validation**activity **Availability flag**, verify that the sourceDataset value isset to the `SourceAvailabilityDataset` created earlier.
- A _Copy data_ activity **file-to-blob**is created for copying the dataset from the source to the sink. Check the source and sink tabs to change these settings.
127
+
1. In the **Copy data**activity **file-to-blob**, check the source and sink tabs. Change settings if necessary.
1. In the **Notebook** activity **Transformation**, review and update the paths and settings as needed.
138
136
139
-
1. Select **Settings** tab. For *Notebook path*, the template defines a path by default. You may need to browse and select the correct notebook path uploaded in**Prerequisite**2.
137
+
The **Databricks linked service** should be pre-populated with the value from a previous step, as shown:
1. Select **Settings** tab. For **Notebook path**, verify that the default path is correct. You may need to browse and choose the correct notebook path.
1. Expand the **Base Parameters** selector and verify that the parameters match what is shown in the following screenshot. These parameters are passed to the Databricks notebook from Data Factory.
You can also verify the data file using storage explorer. (For correlating with Data Factory pipeline runs, this example appends the pipeline run IDfrom data factory to the output folder. This way you can track back the files generated via each run.)
175
+
You can also verify the data file using storage explorer.
> For correlating with Data Factory pipeline runs, this example appends the pipeline run IDfrom data factory to the output folder. This way you can track back the files generated via each run.
0 commit comments