Skip to content

Commit 23a6d02

Browse files
committed
wrap up and acrolinx
1 parent 4e0bd65 commit 23a6d02

File tree

1 file changed

+32
-27
lines changed

1 file changed

+32
-27
lines changed

articles/data-factory/solution-template-databricks-notebook.md

Lines changed: 32 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ In this tutorial, you create an end-to-end pipeline containing **Validation**, *
1919

2020
- **Validation** ensures that your source dataset is ready for downstream consumption before you trigger the copy and analytics job.
2121

22-
- **Copy** duplicates the source dataset to the sink storage which is mounted as DBFS in the Databricks notebook. In this way, the dataset can be directly consumed by Spark.
22+
- **Copy** duplicates the source dataset to the sink storage, which is mounted as DBFS in the Databricks notebook. In this way, the dataset can be directly consumed by Spark.
2323

2424
- **Notebook** triggers the Databricks notebook that transforms the dataset. It also adds the dataset to a processed folder or SQL Data Warehouse.
2525

@@ -108,7 +108,7 @@ To import a **Transformation** notebook to your Databricks workspace:
108108

109109
- **Azure Databricks** – to connect to the Databricks cluster.
110110

111-
Create a Databricks-linked service using the access key you generated earier. You may opt to select an *interactive cluster* if you have one. This example uses the *New job cluster* option.
111+
Create a Databricks-linked service using the access key you generated previously. You may opt to select an *interactive cluster* if you have one. This example uses the *New job cluster* option.
112112

113113
![8](media/solution-template-Databricks-notebook/databricks-connection.png)
114114

@@ -120,58 +120,63 @@ To import a **Transformation** notebook to your Databricks workspace:
120120

121121
In the new pipeline, most settings are configured automatically with default values. Review the configurations of your pipeline and make any necessary changes:
122122

123-
- A _Validation_ activity **Availability flag** is created for checking the source. The Dataset value should be set to *SourceAvailabilityDataset* which was created earlier.
123+
1. In the **Validation** activity **Availability flag**, verify that the source Dataset value is set to the `SourceAvailabilityDataset` created earlier.
124124

125125
![12](media/solution-template-Databricks-notebook/validation-settings.png)
126126

127-
- A _Copy data_ activity **file-to-blob** is created for copying the dataset from the source to the sink. Check the source and sink tabs to change these settings.
127+
1. In the **Copy data** activity **file-to-blob**, check the source and sink tabs. Change settings if necessary.
128128

129-
- Source tab
129+
- Source tab
130130
![13](media/solution-template-Databricks-notebook/copy-source-settings.png)
131131

132-
- Sink tab
132+
- Sink tab
133133
![14](media/solution-template-Databricks-notebook/copy-sink-settings.png)
134134

135-
\***
136-
- A _Notebook_ activity **Transformation** is created with the linked service from earlier.
137-
![16](media/solution-template-Databricks-notebook/notebook-activity.png)
135+
1. In the **Notebook** activity **Transformation**, review and update the paths and settings as needed.
138136

139-
1. Select **Settings** tab. For *Notebook path*, the template defines a path by default. You may need to browse and select the correct notebook path uploaded in **Prerequisite** 2.
137+
The **Databricks linked service** should be pre-populated with the value from a previous step, as shown:
138+
![16](media/solution-template-Databricks-notebook/notebook-activity.png)
139+
140+
To check the **Notebook** settings:
141+
142+
1. Select **Settings** tab. For **Notebook path**, verify that the default path is correct. You may need to browse and choose the correct notebook path.
140143

141144
![17](media/solution-template-Databricks-notebook/notebook-settings.png)
142-
143-
1. Check out the *Base Parameters* created as shown in the screenshot. They are to be passed to the Databricks notebook from Data Factory.
144145

145-
![Base parameters](media/solution-template-Databricks-notebook/base-parameters.png)
146+
1. Expand the **Base Parameters** selector and verify that the parameters match what is shown in the following screenshot. These parameters are passed to the Databricks notebook from Data Factory.
147+
148+
![Base parameters](media/solution-template-Databricks-notebook/base-parameters.png)
146149

147-
1. **Pipeline Parameters** is defined as below.
150+
1. Verify that the **Pipeline Parameters** match what is shown in the following screenshot:
151+
![15](media/solution-template-Databricks-notebook/pipeline-parameters.png)
148152

149-
![15](media/solution-template-Databricks-notebook/pipeline-parameters.png)
153+
1. Connect to your datasets.
150154

151-
1. Setting up datasets.
152-
1. **SourceAvailabilityDataset** is created to check if source data is available.
155+
- **SourceAvailabilityDataset** - to check that the source data is available.
153156

154-
![9](media/solution-template-Databricks-notebook/source-availability-dataset.png)
157+
![9](media/solution-template-Databricks-notebook/source-availability-dataset.png)
155158

156-
1. **SourceFilesDataset** - for copying the source data.
159+
- **SourceFilesDataset** - to access the source data.
157160

158-
![10](media/solution-template-Databricks-notebook/source-file-dataset.png)
161+
![10](media/solution-template-Databricks-notebook/source-file-dataset.png)
159162

160-
1. **DestinationFilesDataset**for copying into the sink/destination location.
163+
- **DestinationFilesDataset**to copy the data into the sink destination location. Use the following values:
161164

162-
1. Linked service - *sinkBlob_LS* created in previous step.
165+
- **Linked service** - `sinkBlob_LS`, created in a previous step.
163166

164-
2. File path - *sinkdata/staged_sink*.
167+
- **File path** - `sinkdata/staged_sink`.
165168

166-
![11](media/solution-template-Databricks-notebook/destination-dataset.png)
169+
![11](media/solution-template-Databricks-notebook/destination-dataset.png)
167170

168-
1. Select **Debug** to run the pipeline. You can find link to Databricks logs for more detailed Spark logs.
171+
1. Select **Debug** to run the pipeline. You can find the link to Databricks logs for more detailed Spark logs.
169172

170173
![18](media/solution-template-Databricks-notebook/pipeline-run-output.png)
171174

172-
You can also verify the data file using storage explorer. (For correlating with Data Factory pipeline runs, this example appends the pipeline run ID from data factory to the output folder. This way you can track back the files generated via each run.)
175+
You can also verify the data file using storage explorer.
173176

174-
![19](media/solution-template-Databricks-notebook/verify-data-files.png)
177+
> [!NOTE]
178+
> For correlating with Data Factory pipeline runs, this example appends the pipeline run ID from data factory to the output folder. This way you can track back the files generated via each run.
179+
> ![19](media/solution-template-Databricks-notebook/verify-data-files.png)
175180

176181
## Next steps
177182

0 commit comments

Comments
 (0)