You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/data-factory/concepts-pipelines-activities.md
+18-8Lines changed: 18 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,12 +1,13 @@
1
1
---
2
2
title: Pipelines and activities
3
3
titleSuffix: Azure Data Factory & Azure Synapse
4
-
description: Learn about pipelines and activities in Azure Data Factory and Azure Synapse Analytics.
4
+
description: Learn how to use pipelines and activities in Azure Data Factory and Azure Synapse Analytics to create data-driven workflows for data movement and processing scenarios.
5
+
#customer intent: As a data engineer, I want to understand pipelines and activities so that I can create efficient data workflows.
5
6
author: dcstwh
6
7
ms.author: weetok
7
8
ms.subservice: orchestration
8
-
ms.custom: synapse
9
-
ms.topic: conceptual
9
+
ms.custom: FY25Q1-Linter, synapse
10
+
ms.topic: concept-article
10
11
ms.date: 03/11/2024
11
12
---
12
13
@@ -19,6 +20,7 @@ ms.date: 03/11/2024
19
20
This article helps you understand pipelines and activities in Azure Data Factory and Azure Synapse Analytics and use them to construct end-to-end data-driven workflows for your data movement and data processing scenarios.
20
21
21
22
## Overview
23
+
22
24
A Data Factory or Synapse Workspace can have one or more pipelines. A pipeline is a logical grouping of activities that together perform a task. For example, a pipeline could contain a set of activities that ingest and clean log data, and then kick off a mapping data flow to analyze the log data. The pipeline allows you to manage the activities as a set instead of each one individually. You deploy and schedule the pipeline instead of the activities independently.
23
25
24
26
The activities in a pipeline define actions to perform on your data. For example, you can use a copy activity to copy data from SQL Server to an Azure Blob Storage. Then, use a data flow activity or a Databricks Notebook activity to process and transform data from the blob storage to an Azure Synapse Analytics pool on top of which business intelligence reporting solutions are built.
@@ -32,7 +34,6 @@ An input dataset represents the input for an activity in the pipeline, and an ou
32
34
> [!NOTE]
33
35
> There is a default soft limit of maximum 80 activities per pipeline, which includes inner activities for containers.
34
36
35
-
36
37
## Data movement activities
37
38
38
39
Copy Activity in Data Factory copies data from a source data store to a sink data store. Data Factory supports the data stores listed in the table in this section. Data from any source can be written to any sink.
@@ -44,6 +45,7 @@ Click a data store to learn how to copy data to and from that store.
Azure Data Factory and Azure Synapse Analytics support the following transformation activities that can be added either individually or chained with another activity.
48
50
49
51
For more information, see the [data transformation activities](transform-data.md) article.
@@ -66,6 +68,7 @@ Data transformation activity | Compute environment
The following control flow activities are supported:
70
73
71
74
Control activity | Description
@@ -117,6 +120,7 @@ Synapse will display the pipeline editor where you can find:
117
120
---
118
121
119
122
## Pipeline JSON
123
+
120
124
Here is how a pipeline is defined in JSON format:
121
125
122
126
```json
@@ -147,9 +151,11 @@ concurrency | The maximum number of concurrent runs the pipeline can have. By de
147
151
annotations | A list of tags associated with the pipeline | Array | No
148
152
149
153
## Activity JSON
154
+
150
155
The **activities** section can have one or more activities defined within it. There are two main types of activities: Execution and Control Activities.
151
156
152
157
### Execution activities
158
+
153
159
Execution activities include [data movement](#data-movement-activities) and [data transformation activities](#data-transformation-activities). They have the following top-level structure:
154
160
155
161
```json
@@ -183,6 +189,7 @@ policy | Policies that affect the run-time behavior of the activity. This proper
183
189
dependsOn | This property is used to define activity dependencies, and how subsequent activities depend on previous activities. For more information, see [Activity dependency](#activity-dependency) | No
184
190
185
191
### Activity policy
192
+
186
193
Policies affect the run-time behavior of an activity, giving configuration options. Activity Policies are only available for execution activities.
187
194
188
195
### Activity policy JSON definition
@@ -221,6 +228,7 @@ retryIntervalInSeconds | The delay between retry attempts in seconds | Integer |
221
228
secureOutput | When set to true, the output from activity is considered as secure and aren't logged for monitoring. | Boolean | No. Default is false.
222
229
223
230
### Control activity
231
+
224
232
Control activities have the following top-level structure:
225
233
226
234
```json
@@ -246,6 +254,7 @@ typeProperties | Properties in the typeProperties section depend on each type of
246
254
dependsOn | This property is used to define Activity Dependency, and how subsequent activities depend on previous activities. For more information, see [activity dependency](#activity-dependency). | No
247
255
248
256
### Activity dependency
257
+
249
258
Activity Dependency defines how subsequent activities depend on previous activities, determining the condition of whether to continue executing the next task. An activity can depend on one or multiple previous activities with different dependency conditions.
250
259
251
260
The different dependency conditions are: Succeeded, Failed, Skipped, Completed.
@@ -299,6 +308,7 @@ For example, if a pipeline has Activity A -> Activity B, the different scenarios
299
308
```
300
309
301
310
## Sample copy pipeline
311
+
302
312
In the following sample pipeline, there is one activity of type **Copy** in the **activities** section. In this sample, the [copy activity](copy-activity-overview.md) copies data from an Azure Blob storage to a database in Azure SQL Database.
303
313
304
314
```json
@@ -348,6 +358,7 @@ Note the following points:
348
358
For a complete walkthrough of creating this pipeline, see [Quickstart: create a Data Factory](quickstart-create-data-factory-powershell.md).
349
359
350
360
## Sample transformation pipeline
361
+
351
362
In the following sample pipeline, there is one activity of type **HDInsightHive** in the **activities** section. In this sample, the [HDInsight Hive activity](transform-data-using-hadoop-hive.md) transforms data from an Azure Blob storage by running a Hive script file on an Azure HDInsight Hadoop cluster.
352
363
353
364
```json
@@ -397,11 +408,13 @@ The **typeProperties** section is different for each transformation activity. To
397
408
For a complete walkthrough of creating this pipeline, see [Tutorial: transform data using Spark](tutorial-transform-data-spark-powershell.md).
398
409
399
410
## Multiple activities in a pipeline
411
+
400
412
The previous two sample pipelines have only one activity in them. You can have more than one activity in a pipeline. If you have multiple activities in a pipeline and subsequent activities are not dependent on previous activities, the activities might run in parallel.
401
413
402
414
You can chain two activities by using [activity dependency](#activity-dependency), which defines how subsequent activities depend on previous activities, determining the condition whether to continue executing the next task. An activity can depend on one or more previous activities with different dependency conditions.
403
415
404
416
## Scheduling pipelines
417
+
405
418
Pipelines are scheduled by triggers. There are different types of triggers (Scheduler trigger, which allows pipelines to be triggered on a wall-clock schedule, as well as the manual trigger, which triggers pipelines on-demand). For more information about triggers, see [pipeline execution and triggers](concepts-pipeline-execution-triggers.md) article.
406
419
407
420
To have your trigger kick off a pipeline run, you must include a pipeline reference of the particular pipeline in the trigger definition. Pipelines & triggers have an n-m relationship. Multiple triggers can kick off a single pipeline, and the same trigger can kick off multiple pipelines. Once the trigger is defined, you must start the trigger to have it start triggering the pipeline. For more information about triggers, see [pipeline execution and triggers](concepts-pipeline-execution-triggers.md) article.
@@ -433,10 +446,7 @@ For example, say you have a Scheduler trigger, "Trigger A," that I wish to kick
433
446
```
434
447
435
448
## Related content
436
-
See the following tutorials for step-by-step instructions for creating pipelines with activities:
437
449
438
450
-[Build a pipeline with a copy activity](quickstart-create-data-factory-powershell.md)
439
451
-[Build a pipeline with a data transformation activity](tutorial-transform-data-spark-powershell.md)
440
-
441
-
How to achieve CI/CD (continuous integration and delivery) using Azure Data Factory
442
-
-[Continuous integration and delivery in Azure Data Factory](continuous-integration-delivery.md)
452
+
-[How to achieve CI/CD (continuous integration and delivery) using Azure Data Factory](continuous-integration-delivery.md)
0 commit comments