Skip to content

Commit b8bf783

Browse files
Merge pull request #287436 from jonburchel/2024-09-26-update-conceptual-to-concept-article
Update conceptual to concept article for linter validation
2 parents aae6b04 + 0db557a commit b8bf783

File tree

1 file changed

+18
-8
lines changed

1 file changed

+18
-8
lines changed

articles/data-factory/concepts-pipelines-activities.md

Lines changed: 18 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,13 @@
11
---
22
title: Pipelines and activities
33
titleSuffix: Azure Data Factory & Azure Synapse
4-
description: Learn about pipelines and activities in Azure Data Factory and Azure Synapse Analytics.
4+
description: Learn how to use pipelines and activities in Azure Data Factory and Azure Synapse Analytics to create data-driven workflows for data movement and processing scenarios.
5+
#customer intent: As a data engineer, I want to understand pipelines and activities so that I can create efficient data workflows.
56
author: dcstwh
67
ms.author: weetok
78
ms.subservice: orchestration
8-
ms.custom: synapse
9-
ms.topic: conceptual
9+
ms.custom: FY25Q1-Linter, synapse
10+
ms.topic: concept-article
1011
ms.date: 03/11/2024
1112
---
1213

@@ -19,6 +20,7 @@ ms.date: 03/11/2024
1920
This article helps you understand pipelines and activities in Azure Data Factory and Azure Synapse Analytics and use them to construct end-to-end data-driven workflows for your data movement and data processing scenarios.
2021

2122
## Overview
23+
2224
A Data Factory or Synapse Workspace can have one or more pipelines. A pipeline is a logical grouping of activities that together perform a task. For example, a pipeline could contain a set of activities that ingest and clean log data, and then kick off a mapping data flow to analyze the log data. The pipeline allows you to manage the activities as a set instead of each one individually. You deploy and schedule the pipeline instead of the activities independently.
2325

2426
The activities in a pipeline define actions to perform on your data. For example, you can use a copy activity to copy data from SQL Server to an Azure Blob Storage. Then, use a data flow activity or a Databricks Notebook activity to process and transform data from the blob storage to an Azure Synapse Analytics pool on top of which business intelligence reporting solutions are built.
@@ -32,7 +34,6 @@ An input dataset represents the input for an activity in the pipeline, and an ou
3234
> [!NOTE]
3335
> There is a default soft limit of maximum 80 activities per pipeline, which includes inner activities for containers.
3436
35-
3637
## Data movement activities
3738

3839
Copy Activity in Data Factory copies data from a source data store to a sink data store. Data Factory supports the data stores listed in the table in this section. Data from any source can be written to any sink.
@@ -44,6 +45,7 @@ Click a data store to learn how to copy data to and from that store.
4445
[!INCLUDE [data-factory-v2-supported-data-stores](includes/data-factory-v2-supported-data-stores.md)]
4546

4647
## Data transformation activities
48+
4749
Azure Data Factory and Azure Synapse Analytics support the following transformation activities that can be added either individually or chained with another activity.
4850

4951
For more information, see the [data transformation activities](transform-data.md) article.
@@ -66,6 +68,7 @@ Data transformation activity | Compute environment
6668
[Databricks Python Activity](transform-data-databricks-python.md) | Azure Databricks
6769

6870
## Control flow activities
71+
6972
The following control flow activities are supported:
7073

7174
Control activity | Description
@@ -117,6 +120,7 @@ Synapse will display the pipeline editor where you can find:
117120
---
118121

119122
## Pipeline JSON
123+
120124
Here is how a pipeline is defined in JSON format:
121125

122126
```json
@@ -147,9 +151,11 @@ concurrency | The maximum number of concurrent runs the pipeline can have. By de
147151
annotations | A list of tags associated with the pipeline | Array | No
148152

149153
## Activity JSON
154+
150155
The **activities** section can have one or more activities defined within it. There are two main types of activities: Execution and Control Activities.
151156

152157
### Execution activities
158+
153159
Execution activities include [data movement](#data-movement-activities) and [data transformation activities](#data-transformation-activities). They have the following top-level structure:
154160

155161
```json
@@ -183,6 +189,7 @@ policy | Policies that affect the run-time behavior of the activity. This proper
183189
dependsOn | This property is used to define activity dependencies, and how subsequent activities depend on previous activities. For more information, see [Activity dependency](#activity-dependency) | No
184190

185191
### Activity policy
192+
186193
Policies affect the run-time behavior of an activity, giving configuration options. Activity Policies are only available for execution activities.
187194

188195
### Activity policy JSON definition
@@ -221,6 +228,7 @@ retryIntervalInSeconds | The delay between retry attempts in seconds | Integer |
221228
secureOutput | When set to true, the output from activity is considered as secure and aren't logged for monitoring. | Boolean | No. Default is false.
222229

223230
### Control activity
231+
224232
Control activities have the following top-level structure:
225233

226234
```json
@@ -246,6 +254,7 @@ typeProperties | Properties in the typeProperties section depend on each type of
246254
dependsOn | This property is used to define Activity Dependency, and how subsequent activities depend on previous activities. For more information, see [activity dependency](#activity-dependency). | No
247255

248256
### Activity dependency
257+
249258
Activity Dependency defines how subsequent activities depend on previous activities, determining the condition of whether to continue executing the next task. An activity can depend on one or multiple previous activities with different dependency conditions.
250259

251260
The different dependency conditions are: Succeeded, Failed, Skipped, Completed.
@@ -299,6 +308,7 @@ For example, if a pipeline has Activity A -> Activity B, the different scenarios
299308
```
300309

301310
## Sample copy pipeline
311+
302312
In the following sample pipeline, there is one activity of type **Copy** in the **activities** section. In this sample, the [copy activity](copy-activity-overview.md) copies data from an Azure Blob storage to a database in Azure SQL Database.
303313

304314
```json
@@ -348,6 +358,7 @@ Note the following points:
348358
For a complete walkthrough of creating this pipeline, see [Quickstart: create a Data Factory](quickstart-create-data-factory-powershell.md).
349359

350360
## Sample transformation pipeline
361+
351362
In the following sample pipeline, there is one activity of type **HDInsightHive** in the **activities** section. In this sample, the [HDInsight Hive activity](transform-data-using-hadoop-hive.md) transforms data from an Azure Blob storage by running a Hive script file on an Azure HDInsight Hadoop cluster.
352363

353364
```json
@@ -397,11 +408,13 @@ The **typeProperties** section is different for each transformation activity. To
397408
For a complete walkthrough of creating this pipeline, see [Tutorial: transform data using Spark](tutorial-transform-data-spark-powershell.md).
398409

399410
## Multiple activities in a pipeline
411+
400412
The previous two sample pipelines have only one activity in them. You can have more than one activity in a pipeline. If you have multiple activities in a pipeline and subsequent activities are not dependent on previous activities, the activities might run in parallel.
401413

402414
You can chain two activities by using [activity dependency](#activity-dependency), which defines how subsequent activities depend on previous activities, determining the condition whether to continue executing the next task. An activity can depend on one or more previous activities with different dependency conditions.
403415

404416
## Scheduling pipelines
417+
405418
Pipelines are scheduled by triggers. There are different types of triggers (Scheduler trigger, which allows pipelines to be triggered on a wall-clock schedule, as well as the manual trigger, which triggers pipelines on-demand). For more information about triggers, see [pipeline execution and triggers](concepts-pipeline-execution-triggers.md) article.
406419

407420
To have your trigger kick off a pipeline run, you must include a pipeline reference of the particular pipeline in the trigger definition. Pipelines & triggers have an n-m relationship. Multiple triggers can kick off a single pipeline, and the same trigger can kick off multiple pipelines. Once the trigger is defined, you must start the trigger to have it start triggering the pipeline. For more information about triggers, see [pipeline execution and triggers](concepts-pipeline-execution-triggers.md) article.
@@ -433,10 +446,7 @@ For example, say you have a Scheduler trigger, "Trigger A," that I wish to kick
433446
```
434447

435448
## Related content
436-
See the following tutorials for step-by-step instructions for creating pipelines with activities:
437449

438450
- [Build a pipeline with a copy activity](quickstart-create-data-factory-powershell.md)
439451
- [Build a pipeline with a data transformation activity](tutorial-transform-data-spark-powershell.md)
440-
441-
How to achieve CI/CD (continuous integration and delivery) using Azure Data Factory
442-
- [Continuous integration and delivery in Azure Data Factory](continuous-integration-delivery.md)
452+
- [How to achieve CI/CD (continuous integration and delivery) using Azure Data Factory](continuous-integration-delivery.md)

0 commit comments

Comments
 (0)