Skip to content

Commit 5fc465c

Browse files
Merge pull request #224782 from v-lanjunli/newdocforadf
new doc for adf
2 parents 4e4aa53 + ffda679 commit 5fc465c

File tree

7 files changed

+175
-0
lines changed

7 files changed

+175
-0
lines changed

articles/data-factory/TOC.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -725,6 +725,8 @@ items:
725725
href: transform-data-using-script.md
726726
- name: Compute linked services
727727
href: compute-linked-services.md
728+
- name: Synapse Notebook activity
729+
href: transform-data-synapse-notebook.md
728730
- name: Synapse Spark job definition activity
729731
href: transform-data-synapse-spark-job-definition.md
730732
- name: Control flow
47.2 KB
Loading
45 KB
Loading
35.1 KB
Loading
15.2 KB
Loading
48.7 KB
Loading
Lines changed: 173 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,173 @@
1+
---
2+
title: Transform data with Synapse Notebook
3+
titleSuffix: Azure Data Factory & Azure Synapse
4+
description: Learn how to process or transform data by running a Synapse notebook in Azure Data Factory and Synapse Analytics pipelines.
5+
ms.service: data-factory
6+
ms.subservice: tutorials
7+
ms.custom: synapse
8+
author: nabhishek
9+
ms.author: jejiang
10+
ms.topic: conceptual
11+
ms.date: 07/09/2022
12+
---
13+
14+
# Transform data by running a Synapse Notebook
15+
[!INCLUDE[appliesto-adf-asa-md](includes/appliesto-adf-asa-md.md)]
16+
17+
The Azure Synapse Notebook Activity in a [pipeline](concepts-pipelines-activities.md) runs a Synapse notebook in your Azure Synapse Analytics workspace. This article builds on the [data transformation activities](transform-data.md) article, which presents a general overview of data transformation and the supported transformation activities.
18+
19+
You can create an Azure Synapse Analytics notebook activity directly through the Azure Data Factory Studio user interface. For a step-by-step walkthrough of how to create a Synapse notebook activity using the user interface, you can refer to the following.
20+
21+
## Add a Notebook activity for Synapse to a pipeline with UI
22+
23+
To use a Notebook activity for Synapse in a pipeline, complete the following steps:
24+
25+
## General settings
26+
27+
1. Search for _Notebook_ in the pipeline Activities pane, and drag a Notebook activity under the Synapse to the pipeline canvas.
28+
2. Select the new Notebook activity on the canvas if it is not already selected.
29+
3. In the General settings, enter sample for Name.
30+
4. (Option) You can also enter a description.
31+
5. Timeout: Maximum amount of time an activity can run. Default is 12 hours, and the maximum amount of time allowed is 7 days. Format is in D.HH:MM:SS.
32+
6. Retry: Maximum number of retry attempts.
33+
7. Retry interval (sec): The number of seconds between each retry attempt.
34+
8. Secure output: When checked, output from the activity won't be captured in logging.
35+
9. Secure input: The number of seconds between each retry attempt
36+
37+
## Azure Synapse Analytics (Artifacts) settings
38+
39+
Select the **Azure Synapse Analytics (Artifacts)** tab to select or create a new [Azure Synapse Analytics linked service](compute-linked-services.md#azure-synapse-analytics-linked-service) that will execute the Notebook activity.
40+
41+
:::image type="content" source="./media/transform-data-synapse-notebook/notebook-activity.png" alt-text="Screenshot of the linked service tab for a Notebook activity." lightbox="./media/transform-data-synapse-notebook/notebook-activity.png":::
42+
43+
44+
45+
46+
47+
## Settings tab
48+
1. Select the new Synapse Notebook activity on the canvas if it is not already selected.
49+
50+
2. Select the Settings tab.
51+
52+
3. Expand the Notebook list, you can select an existing notebook in the linked Azure Synapse Analytics (Artifacts).
53+
54+
4. Click the Open button to open the page of the linked service where the selected notebook is located.
55+
56+
> [!NOTE]
57+
>
58+
> If the Workspace resource ID in the linked service is empty, the Open button will be disabled.
59+
>
60+
> :::image type="content" source="./media/transform-data-synapse-notebook/resource-id-empty.png" alt-text="Screenshot of the open button is disabled." lightbox="./media/transform-data-synapse-notebook/resource-id-empty.png":::
61+
62+
5. Select the **Settings** tab and choose the notebook, and optional base parameters to pass to the notebook.
63+
64+
:::image type="content" source="./media/transform-data-synapse-notebook/notebook-activity-settings.png" alt-text="Screenshot of the Settings tab for a Notebook activity." lightbox="./media/transform-data-synapse-notebook/notebook-activity-settings.png":::
65+
66+
6. (Optional) You can fill in information for Synapse notebook. If the following settings are empty, the settings of the Synapse notebook itself will be used to run; if the following settings are not empty, these settings will replace the settings of the Synapse notebook itself.
67+
68+
| Property | Description |
69+
| ----- | ----- |
70+
|Spark pool | Reference to the Spark pool. You can select Apache Spark pool from the list. |
71+
|Executor size | Number of cores and memory to be used for executors allocated in the specified Apache Spark pool for the session. For dynamic content, valid values are Small/Medium/Large/XLarge/XXLarge. |
72+
|Dynamically allocate executors| This setting maps to the dynamic allocation property in Spark configuration for Spark Application executors allocation.|
73+
|Min executors| Min number of executors to be allocated in the specified Spark pool for the job.|
74+
|Max executors| Max number of executors to be allocated in the specified Spark pool for the job.|
75+
|Driver size| Number of cores and memory to be used for driver given in the specified Apache Spark pool for the job.|
76+
77+
## Azure Synapse Analytics Notebook activity definition
78+
79+
Here is the sample JSON definition of an Azure Synapse Analytics Notebook Activity:
80+
81+
```json
82+
{
83+
"activities": [
84+
{
85+
"name": "demo",
86+
"description": "description",
87+
"type": "SynapseNotebook",
88+
"dependsOn": [],
89+
"policy": {
90+
"timeout": "7.00:00:00",
91+
"retry": 0,
92+
"retryIntervalInSeconds": 30,
93+
"secureOutput": false,
94+
"secureInput": false
95+
},
96+
"userProperties": [
97+
{
98+
"name": "testproperties",
99+
"value": "test123"
100+
}
101+
],
102+
"typeProperties": {
103+
"notebook": {
104+
"referenceName": {
105+
"value": "Notebookname",
106+
"type": "Expression"
107+
},
108+
"type": "NotebookReference"
109+
},
110+
"parameters": {
111+
"test": {
112+
"value": "testvalue",
113+
"type": "string"
114+
}
115+
},
116+
"snapshot": true,
117+
"sparkPool": {
118+
"referenceName": {
119+
"value": "SampleSpark",
120+
"type": "Expression"
121+
},
122+
"type": "BigDataPoolReference"
123+
}
124+
},
125+
"linkedServiceName": {
126+
"referenceName": "AzureSynapseArtifacts1",
127+
"type": "LinkedServiceReference"
128+
}
129+
}
130+
]
131+
}
132+
```
133+
134+
## Azure Synapse Analytics Notebook activity properties
135+
136+
The following table describes the JSON properties used in the JSON
137+
definition:
138+
139+
|Property|Description|Required|
140+
|---|---|---|
141+
|name|Name of the activity in the pipeline.|Yes|
142+
|description|Text describing what the activity does.|No|
143+
|type|For Azure Synapse Analytics Notebook Activity, the activity type is SynapseNotebook.|Yes|
144+
|notebook|The name of the notebook to be run in the Azure Synapse Analytics. |Yes|
145+
|sparkPool|The spark pool required to run Azure Synapse Analytics Notebook.|No|
146+
|parameter|Parameter required to run Azure Synapse Analytics Notebook. For more information see [Transform data by running a Synapse notebook](../synapse-analytics/synapse-notebook-activity.md#assign-parameters-values-from-a-pipeline)|No|
147+
148+
## Designate a parameters cell
149+
150+
Azure Data Factory looks for the parameters cell and uses the values as defaults for the parameters passed in at execution time. The execution engine will add a new cell beneath the parameters cell with input parameters to overwrite the default values. You can refer to [Transform data by running a Synapse notebook](../synapse-analytics/synapse-notebook-activity.md#designate-a-parameters-cell).
151+
152+
## Read Synapse notebook cell output value
153+
154+
You can read notebook cell output value in activity, for this panel, you can refer to [Transform data by running a Synapse notebook](../synapse-analytics/synapse-notebook-activity.md#read-synapse-notebook-cell-output-value).
155+
156+
## Run another Synapse notebook
157+
158+
You can reference other notebooks in a Synapse notebook activity via calling [%run magic](../synapse-analytics/spark/apache-spark-development-using-notebooks.md#notebook-reference) or [mssparkutils notebook utilities](../synapse-analytics/spark/microsoft-spark-utilities.md#notebook-utilities). Both support nesting function calls. The key differences of these two methods that you should consider based on your scenario are:
159+
160+
- [%run magic](../synapse-analytics/spark/apache-spark-development-using-notebooks.md#notebook-reference) copies all cells from the referenced notebook to the %run cell and shares the variable context. When notebook1 references notebook2 via `%run notebook2` and notebook2 calls a [mssparkutils.notebook.exit](../synapse-analytics/spark/microsoft-spark-utilities.md#exit-a-notebook) function, the cell execution in notebook1 will be stopped. We recommend you use %run magic when you want to "include" a notebook file.
161+
- [mssparkutils notebook utilities](../synapse-analytics/spark/microsoft-spark-utilities.md#notebook-utilities) calls the referenced notebook as a method or a function. The variable context isn't shared. When notebook1 references notebook2 via `mssparkutils.notebook.run("notebook2")` and notebook2 calls a [mssparkutils.notebook.exit](../synapse-analytics/spark/microsoft-spark-utilities.md#exit-a-notebook) function, the cell execution in notebook1 will continue. We recommend you use mssparkutils notebook utilities when you want to "import" a notebook.
162+
163+
## See Azure Synapse Analytics Notebook activity run history
164+
165+
Go to Pipeline runs under the **Monitor** tab, you'll see the pipeline you have triggered. Open the pipeline that contains notebook activity to see the run history.
166+
167+
:::image type="content" source="./media/transform-data-synapse-notebook/input-output-history-notebook.png" alt-text="Screenshot of the input and output for a Notebook activity." lightbox="./media/transform-data-synapse-notebook/input-output-history-notebook.png":::
168+
169+
For Open notebook snapshot, this feature is not currently supported.
170+
171+
You can see the notebook activity input or output by selecting the input or Output button. If your pipeline failed with a user error, select the output to check the result field to see the detailed user error traceback.
172+
173+
:::image type="content" source="./media/transform-data-synapse-notebook/notebook-output-user-error.png" alt-text="Screenshot of the output user error for a Notebook activity." lightbox="./media/transform-data-synapse-notebook/notebook-output-user-error.png":::

0 commit comments

Comments
 (0)