Skip to content

Commit cb3c137

Browse files
committed
partial
1 parent d4cce39 commit cb3c137

File tree

1 file changed

+39
-33
lines changed

1 file changed

+39
-33
lines changed

articles/data-factory/solution-template-databricks-notebook.md

Lines changed: 39 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -39,21 +39,19 @@ For simplicity, the template in this tutorial doesn't create a scheduled trigger
3939

4040
To import a **Transformation** notebook to your Databricks workspace:
4141

42-
1. Sign in to your Azure Databricks account.
43-
1. In your Databricks workspace, select Import.
42+
1. Sign in to your Azure Databricks workspace, and then select **Import**.
4443
![2](media/solution-template-Databricks-notebook/import-notebook.png)
45-
Your Databricks location can be different from the one shown, but remember it for later.
46-
1. Select "Import from: **URL**", and enter following URL in the textbox:
47-
48-
* `https://adflabstaging1.blob.core.windows.net/share/Transformations.html`
49-
50-
![3](media/solution-template-Databricks-notebook/import-from-url.png)
44+
Your workspace path can be different from the one shown, but remember it for later.
45+
1. Select **Import from: URL**. In the textbox, enter `https://adflabstaging1.blob.core.windows.net/share/Transformations.html`
46+
47+
![3](media/solution-template-Databricks-notebook/import-from-url.png)
5148

52-
`https://adflabstaging1.blob.core.windows.net/share/Transformations.html`
49+
1. Now let's update the **Transformation** notebook with your storage connection information.
5350

54-
![3](media/solution-template-Databricks-notebook/Databricks-tutorial-image03.png)
51+
In the imported notebook, go to **command 5** as shown in the following code snippet.
5552

56-
4. Now let's update the **Transformation** notebook with your storage connection information. Go to **command 5** (as shown in below code snippet) in the imported notebook above, and replace `<storage name>`and `<access key>` with your own storage connection information. Ensure this account is the same storage account created earlier and contains the `sinkdata` container.
53+
- Replace `<storage name>`and `<access key>` with your own storage connection information.
54+
- Use the storage account with the `sinkdata` container.
5755

5856
```python
5957
# Supply storageName and accessKey values  
@@ -77,62 +75,70 @@ To import a **Transformation** notebook to your Databricks workspace:
7775
print e \# Otherwise print the whole stack trace.  
7876
```
7977

80-
5. Generate a **Databricks access token** for Data Factory to access Databricks. **Save the access token** for later use in creating a Databricks linked service, which looks something like 'dapi32db32cbb4w6eee18b7d87e45exxxxxx'.
81-
78+
1. Generate a **Databricks access token** for Data Factory to access Databricks.
79+
1. In your Databricks workspace, select your user profile icon in the upper right.
80+
1. Select **User Settings**.
8281
![4](media/solution-template-Databricks-notebook/user-setting.png)
82+
1. Select **Generate New Token** under the **Access Tokens** tab.
83+
1. Select **Generate**.
8384

8485
![5](media/solution-template-Databricks-notebook/generate-new-token.png)
8586

87+
**Save the access token** for later use in creating a Databricks linked service. The access token looks something like 'dapi32db32cbb4w6eee18b7d87e45exxxxxx'.
88+
8689
## How to use this template
8790

88-
1. Go to **Transformation with Azure Databricks** template. Create new linked services for following connections.
91+
1. Go to the **Transformation with Azure Databricks** template and create new linked services for following connections.
8992

90-
![Connections setting](media/solution-template-Databricks-notebook/connections-preview.png)
93+
![Connections setting](media/solution-template-Databricks-notebook/connections-preview.png)
9194

92-
1. **Source Blob Connection**for accessing source data.
95+
- **Source Blob Connection**to access the source data.
9396

94-
You can use the public blob storage containing the source files for this sample. Reference following screenshot for configuration. Use below **SAS URL** to connect to source storage (read-only access):
97+
For this exercise, you can use the public blob storage that contains the source files. Reference following screenshot for configuration. Use the following **SAS URL** to connect to source storage (read-only access):
9598

96-
`https://storagewithdata.blob.core.windows.net/data?sv=2018-03-28&si=read%20and%20list&sr=c&sig=PuyyS6%2FKdB2JxcZN0kPlmHSBlD8uIKyzhBWmWzznkBw%3D`
99+
`https://storagewithdata.blob.core.windows.net/data?sv=2018-03-28&si=read%20and%20list&sr=c&sig=PuyyS6%2FKdB2JxcZN0kPlmHSBlD8uIKyzhBWmWzznkBw%3D`
97100

98101
![6](media/solution-template-Databricks-notebook/source-blob-connection.png)
99102

100-
1. **Destination Blob Connection**for copying data into.
103+
- **Destination Blob Connection**to store the copied data.
101104

102-
In the sink linked service, select a storage created in the **Prerequisite** 1.
105+
In the linked service, select your sink storage blob.
103106

104-
![7](media/solution-template-Databricks-notebook/destination-blob-connection.png)
107+
![7](media/solution-template-Databricks-notebook/destination-blob-connection.png)
105108

106-
1. **Azure Databricks**for connecting to Databricks cluster.
109+
- **Azure Databricks**to connect to the Databricks cluster.
107110

108-
Create a Databricks linked service using access key generated in **Prerequisite** 2.c. If you have an *interactive cluster*, you may select that. (This example uses the *New job cluster* option.)
111+
Create a Databricks-linked service using the access key you generated earier. You may opt to select an *interactive cluster* if you have one. This example uses the *New job cluster* option.
109112

110113
![8](media/solution-template-Databricks-notebook/databricks-connection.png)
111114

112-
1. Select **Use this template**, and you would see a pipeline created as shown below:
115+
1. Select **Use this template**. You'll see a pipeline created.
113116

114117
![Create a pipeline](media/solution-template-Databricks-notebook/new-pipeline.png)
115118

116119
## Pipeline introduction and configuration
117120

118-
In the new pipeline created, most settings have been configured automatically with default values. Check out the configurations and update where necessary to suit your own settings. For details, you can check below instructions and screenshots for reference.
121+
In the new pipeline, most settings are configured automatically with default values. Review the configurations of your pipeline and make any necessary changes:
119122

120-
1. A Validation activity **Availability flag** is created for doing a Source Availability check. *SourceAvailabilityDataset* created in previous step is selected as Dataset.
123+
- A _Validation_ activity **Availability flag** is created for checking the source. The Dataset value should be set to *SourceAvailabilityDataset* which was created earlier.
121124

122-
![12](media/solution-template-Databricks-notebook/validation-settings.png)
125+
![12](media/solution-template-Databricks-notebook/validation-settings.png)
123126

124-
1. A Copy activity **file-to-blob** is created for copying dataset from source to sink. Reference the below screenshots for source and sink configurations in the copy activity.
127+
- A _Copy data_ activity **file-to-blob** is created for copying the dataset from the source to the sink. Check the source and sink tabs to change these settings.
125128

126-
![13](media/solution-template-Databricks-notebook/copy-source-settings.png)
129+
- Source tab
130+
![13](media/solution-template-Databricks-notebook/copy-source-settings.png)
127131

128-
![14](media/solution-template-Databricks-notebook/copy-sink-settings.png)
132+
- Sink tab
133+
![14](media/solution-template-Databricks-notebook/copy-sink-settings.png)
129134

130-
1. A Notebook activity **Transformation** is created, and the linked service created in previous step is selected.
135+
\***
136+
- A _Notebook_ activity **Transformation** is created with the linked service from earlier.
131137
![16](media/solution-template-Databricks-notebook/notebook-activity.png)
132138

133-
1. Select **Settings** tab. For *Notebook path*, the template defines a path by default. You may need to browse and select the correct notebook path uploaded in **Prerequisite** 2.
139+
1. Select **Settings** tab. For *Notebook path*, the template defines a path by default. You may need to browse and select the correct notebook path uploaded in **Prerequisite** 2.
134140

135-
![17](media/solution-template-Databricks-notebook/notebook-settings.png)
141+
![17](media/solution-template-Databricks-notebook/notebook-settings.png)
136142

137143
1. Check out the *Base Parameters* created as shown in the screenshot. They are to be passed to the Databricks notebook from Data Factory.
138144

0 commit comments

Comments
 (0)