You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/data-factory/tutorial-copy-data-portal.md
+62-35Lines changed: 62 additions & 35 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,14 +1,16 @@
1
1
---
2
-
title: Use the Azure portal to create a data factory pipeline
3
-
description: This tutorial provides step-by-step instructions for using the Azure portal to create a data factory with a pipeline. The pipeline uses the copy activity to copy data from Azure Blob storage to Azure SQL Database.
2
+
title: 'Use the Azure portal to create a data factory pipeline'
3
+
description: This tutorial provides instructions to create a data factory with a pipeline with a copy activity to copy data from Azure Blob storage to Azure SQL Database.
4
4
author: jianleishen
5
5
ms.topic: tutorial
6
-
ms.date: 10/03/2024
6
+
ms.date: 04/25/2025
7
7
ms.subservice: data-movement
8
8
ms.author: jianleishen
9
+
10
+
#customer intent: As a new Azure Data Factory user I want to create a data factory and quickly create my first pipeline to move data between resources, so I can apply it to my own needs.
9
11
---
10
12
11
-
# Copy data from Azure Blob storage to a database in Azure SQL Database by using Azure Data Factory
13
+
# Tutorial: Copy data from Azure Blob storage to a database in Azure SQL Database by using Azure Data Factory
@@ -20,14 +22,16 @@ In this tutorial, you create a data factory by using the Azure Data Factory user
20
22
In this tutorial, you perform the following steps:
21
23
22
24
> [!div class="checklist"]
23
-
> * Create a data factory.
24
-
> * Create a pipeline with a copy activity.
25
+
> *[Create a data factory.](#create-a-data-factory)
26
+
> *[Create a pipeline with a copy activity.](#create-a-pipeline)
25
27
> * Test run the pipeline.
26
-
> * Trigger the pipeline manually.
27
-
> * Trigger the pipeline on a schedule.
28
+
> *[Trigger the pipeline manually.](#trigger-the-pipeline-manually)
29
+
> *[Trigger the pipeline on a schedule.](#trigger-the-pipeline-on-a-schedule)
28
30
> * Monitor the pipeline and activity runs.
31
+
> *[Disable or delete your scheduled trigger.](#disable-trigger)
29
32
30
33
## Prerequisites
34
+
31
35
***Azure subscription**. If you don't have an Azure subscription, create a [free Azure account](https://azure.microsoft.com/free/) before you begin.
32
36
***Azure storage account**. You use Blob storage as a *source* data store. If you don't have a storage account, see [Create an Azure storage account](../storage/common/storage-account-create.md) for steps to create one.
33
37
***Azure SQL Database**. You use the database as a *sink* data store. If you don't have a database in Azure SQL Database, see the [Create a database in Azure SQL Database](/azure/azure-sql/database/single-database-create-quickstart) for steps to create one.
@@ -38,15 +42,16 @@ Now, prepare your Blob storage and SQL database for the tutorial by performing t
38
42
39
43
#### Create a source blob
40
44
41
-
1. Launch Notepad. Copy the following text, and save it as an **emp.txt** file on your disk:
45
+
1. Launch Notepad. Copy the following text, and save it as an **emp.txt** file:
42
46
43
47
```
44
48
FirstName,LastName
45
49
John,Doe
46
50
Jane,Doe
47
51
```
48
52
49
-
1. Create a container named **adftutorial** in your Blob storage. Create a folder named **input** in this container. Then, upload the **emp.txt** file to the **input** folder. Use the Azure portal or tools such as [Azure Storage Explorer](https://storageexplorer.com/) to do these tasks.
53
+
1. Move that file into a folder called input.
54
+
1. Create a container named **adftutorial** in your Blob storage. Upload your **input** folder with the **emp.txt** file to this container. You can use the Azure portal or tools such as [Azure Storage Explorer](https://storageexplorer.com/) to do these tasks.
50
55
51
56
#### Create a sink SQL table
52
57
@@ -64,13 +69,14 @@ Now, prepare your Blob storage and SQL database for the tutorial by performing t
64
69
CREATE CLUSTERED INDEX IX_emp_ID ON dbo.emp (ID);
65
70
```
66
71
67
-
1. Allow Azure services to access SQL Server. Ensure that **Allow access to Azure services** is turned **ON** for your SQL Server so that Data Factory can write data to your SQL Server. To verify and turn on this setting, go to logical SQL server > Overview > Set server firewall> set the **Allow access to Azure services** option to **ON**.
72
+
1. Allow Azure services to access SQL Server. Ensure that **Allow access to Azure services** is turned **ON** for your SQL Server so that Data Factory can write data to your SQL Server. To verify and turn on this setting, go to your SQL Server in the Azure portal, select **Security** > **Networking** > enable **Selected networks**> check **Allow Azure services and resources to access this server** under the **Exceptions**.
68
73
69
74
## Create a data factory
75
+
70
76
In this step, you create a data factory and start the Data Factory UI to create a pipeline in the data factory.
71
77
72
78
1. Open **Microsoft Edge** or **Google Chrome**. Currently, Data Factory UI is supported only in Microsoft Edge and Google Chrome web browsers.
73
-
2. On the left menu, select **Create a resource** > **Integration** > **Data Factory**.
79
+
2. On the left menu, select **Create a resource** > **Analytics** > **Data Factory**.
74
80
3. On the **Create Data Factory** page, under **Basics** tab, select the Azure **Subscription** in which you want to create the data factory.
75
81
4. For **Resource Group**, take one of the following steps:
76
82
@@ -79,28 +85,20 @@ In this step, you create a data factory and start the Data Factory UI to create
79
85
b. Select **Create new**, and enter the name of a new resource group.
80
86
81
87
To learn about resource groups, see [Use resource groups to manage your Azure resources](../azure-resource-manager/management/overview.md).
82
-
5. Under **Region**, select a location for the data factory. Only locations that are supported are displayed in the drop-down list. The data stores (for example, Azure Storage and SQL Database) and computes (for example, Azure HDInsight) used by the data factory can be in other regions.
83
-
6. Under **Name**, enter **ADFTutorialDataFactory**.
84
-
85
-
The name of the Azure data factory must be *globally unique*. If you receive an error message about the name value, enter a different name for the data factory. (for example, yournameADFTutorialDataFactory). For naming rules for Data Factory artifacts, see [Data Factory naming rules](naming-rules.md).
88
+
5. Under **Region**, select a location for the data factory. Your data stores can be in a different region than your data factory, if they need to be.
89
+
6. Under **Name**, the name of the Azure data factory must be *globally unique*. If you receive an error message about the name value, enter a different name for the data factory. (for example, yournameADFDemo). For naming rules for Data Factory artifacts, see [Data Factory naming rules](naming-rules.md).
86
90
87
91
:::image type="content" source="./media/doc-common-process/name-not-available-error.png" alt-text="New data factory error message for duplicate name.":::
88
92
89
93
7. Under **Version**, select **V2**.
90
94
8. Select **Git configuration** tab on the top, and select the **Configure Git later** check box.
91
95
9. Select **Review + create**, and select **Create** after the validation is passed.
92
96
10. After the creation is finished, you see the notice in Notifications center. Select **Go to resource** to navigate to the Data factory page.
93
-
11. Select **Open** on the **Open Azure Data Factory Studio** tile to launch the Azure Data Factory UI in a separate tab.
94
-
97
+
11. Select **Launch Studio** on the **Azure Data Factory Studio** tile.
95
98
96
99
## Create a pipeline
97
-
In this step, you create a pipeline with a copy activity in the data factory. The copy activity copies data from Blob storage to SQL Database. In the [Quickstart tutorial](quickstart-create-data-factory-portal.md), you created a pipeline by following these steps:
98
-
99
-
1. Create the linked service.
100
-
1. Create input and output datasets.
101
-
1. Create a pipeline.
102
100
103
-
In this tutorial, you start with creating the pipeline. Then you create linked services and datasets when you need them to configure the pipeline.
101
+
In this step, you create a pipeline with a copy activity in the data factory. The copy activity copies data from Blob storage to SQL Database.
104
102
105
103
1. On the home page, select **Orchestrate**.
106
104
@@ -115,14 +113,14 @@ In this tutorial, you start with creating the pipeline. Then you create linked s
115
113
### Configure source
116
114
117
115
>[!TIP]
118
-
>In this tutorial, you use *Account key* as the authentication type for your source data store, but you can choose other supported authentication methods: *SAS URI*,*Service Principal* and *Managed Identity* if needed. Refer to corresponding sections in [this article](./connector-azure-blob-storage.md#linked-service-properties) for details.
116
+
>In this tutorial, you use *Account key* as the authentication type for your source data store, but you can choose other supported authentication methods: *SAS URI*,*Service Principal*, and *Managed Identity* if needed. Refer to corresponding sections in [this article](./connector-azure-blob-storage.md#linked-service-properties) for details.
119
117
>To store secrets for data stores securely, it's also recommended to use an Azure Key Vault. Refer to [this article](./store-credentials-in-key-vault.md) for detailed illustrations.
120
118
121
119
1. Go to the **Source** tab. Select **+ New** to create a source dataset.
122
120
123
121
1. In the **New Dataset** dialog box, select **Azure Blob Storage**, and then select **Continue**. The source data is in Blob storage, so you select **Azure Blob Storage** for the source dataset.
124
122
125
-
1. In the **Select Format** dialog box, choose the format type of your data, and then select **Continue**.
123
+
1. In the **Select Format** dialog box, choose **Delimited Text**, and then select **Continue**.
126
124
127
125
1. In the **Set Properties** dialog box, enter **SourceBlobDataset** for Name. Select the checkbox for **First row as header**. Under the **Linked service** text box, select **+ New**.
128
126
@@ -137,15 +135,16 @@ In this tutorial, you start with creating the pipeline. Then you create linked s
>In this tutorial, you use *SQL authentication* as the authentication type for your sink data store, but you can choose other supported authentication methods: *Service Principal* and *Managed Identity* if needed. Refer to corresponding sections in [this article](./connector-azure-sql-database.md#linked-service-properties) for details.
142
141
>To store secrets for data stores securely, it's also recommended to use an Azure Key Vault. Refer to [this article](./store-credentials-in-key-vault.md) for detailed illustrations.
143
142
144
143
1. Go to the **Sink** tab, and select **+ New** to create a sink dataset.
145
144
146
-
1. In the **New Dataset** dialog box, input "SQL" in the search box to filter the connectors, select **Azure SQL Database**, and then select **Continue**. In this tutorial, you copy data to a SQL database.
145
+
1. In the **New Dataset** dialog box, input "SQL" in the search box to filter the connectors, select **Azure SQL Database**, and then select **Continue**.
147
146
148
-
1. In the **Set Properties** dialog box, enter **OutputSqlDataset** for Name. From the **Linked service** dropdown list, select **+ New**. A dataset must be associated with a linked service. The linked service has the connection string that Data Factory uses to connect to SQL Database at runtime. The dataset specifies the container, folder, and the file (optional) to which the data is copied.
147
+
1. In the **Set Properties** dialog box, enter **OutputSqlDataset** for Name. From the **Linked service** dropdown list, select **+ New**. A dataset must be associated with a linked service. The linked service has the connection string that Data Factory uses to connect to SQL Database at runtime, and specifies where the data will be copied to.
149
148
150
149
1. In the **New Linked Service (Azure SQL Database)** dialog box, take the following steps:
151
150
@@ -165,7 +164,7 @@ In this tutorial, you start with creating the pipeline. Then you create linked s
165
164
166
165
:::image type="content" source="./media/tutorial-copy-data-portal/new-azure-sql-linked-service-window.png" alt-text="Save new linked service":::
167
166
168
-
1. It automatically navigates to the **Set Properties** dialog box. In **Table**, select **[dbo].[emp]**. Then select **OK**.
167
+
1. It automatically navigates to the **Set Properties** dialog box. In **Table**, select **Enter manually**, and enter **[dbo].[emp]**. Then select **OK**.
169
168
170
169
1. Go to the tab with the pipeline, and in **Sink Dataset**, confirm that **OutputSqlDataset** is selected.
171
170
@@ -174,42 +173,49 @@ In this tutorial, you start with creating the pipeline. Then you create linked s
174
173
You can optionally map the schema of the source to corresponding schema of destination by following [Schema mapping in copy activity](copy-activity-schema-and-type-mapping.md).
175
174
176
175
## Validate the pipeline
176
+
177
177
To validate the pipeline, select **Validate** from the tool bar.
178
178
179
179
You can see the JSON code associated with the pipeline by clicking **Code** on the upper right.
180
180
181
181
## Debug and publish the pipeline
182
+
182
183
You can debug a pipeline before you publish artifacts (linked services, datasets, and pipeline) to Data Factory or your own Azure Repos Git repository.
183
184
184
185
1. To debug the pipeline, select **Debug** on the toolbar. You see the status of the pipeline run in the **Output** tab at the bottom of the window.
185
186
186
187
1. Once the pipeline can run successfully, in the top toolbar, select **Publish all**. This action publishes entities (datasets, and pipelines) you created to Data Factory.
187
188
188
-
1. Wait until you see the **Successfully published** message. To see notification messages, click the **Show Notifications** on the top-right (bell button).
189
+
1. Wait until you see the **Successfully published** notification message. To see notification messages, select the **Show Notifications** on the top-right (bell button).
189
190
190
191
## Trigger the pipeline manually
192
+
191
193
In this step, you manually trigger the pipeline you published in the previous step.
192
194
193
-
1. Select **Trigger** on the toolbar, and then select **Trigger Now**. On the **Pipeline Run** page, select **OK**.
195
+
1. Select **Add trigger** on the toolbar, and then select **Trigger Now**.
196
+
197
+
1. On the **Pipeline Run** page, select **OK**.
194
198
195
199
1. Go to the **Monitor** tab on the left. You see a pipeline run that is triggered by a manual trigger. You can use links under the **PIPELINE NAME** column to view activity details and to rerun the pipeline.
1. To see activity runs associated with the pipeline run, select the **CopyPipeline** link under the **PIPELINE NAME** column. In this example, there's only one activity, so you see only one entry in the list. For details about the copy operation, select the **Details** link (eyeglasses icon) under the **ACTIVITY NAME** column. Select **All pipeline runs** at the top to go back to the Pipeline Runs view. To refresh the view, select **Refresh**.
203
+
1. To see activity runs associated with the pipeline run, select the **CopyPipeline** link under the **PIPELINE NAME** column. In this example, there's only one activity, so you see only one entry in the list. For details about the copy operation, hover over the activity and
204
+
1. select the **Details** link (eyeglasses icon) under the **ACTIVITY NAME** column. Select **All pipeline runs** at the top to go back to the Pipeline Runs view. To refresh the view, select **Refresh**.
1. Verify that two more rows are added to the **emp** table in the database.
204
209
205
210
## Trigger the pipeline on a schedule
211
+
206
212
In this schedule, you create a schedule trigger for the pipeline. The trigger runs the pipeline on the specified schedule, such as hourly or daily. Here you set the trigger to run every minute until the specified end datetime.
207
213
208
214
1. Go to the **Author** tab on the left above the monitor tab.
209
215
210
-
1. Go to your pipeline, click **Trigger** on the tool bar, and select **New/Edit**.
216
+
1. Go to your pipeline, select **Trigger** on the tool bar, and select **New/Edit**.
211
217
212
-
1. In the **Add triggers** dialog box, select **+ New** for **Choose trigger** area.
218
+
1. In the **Add triggers** dialog box, select **Choose trigger** and select **+ New**.
213
219
214
220
1. In the **New Trigger** window, take the following steps:
215
221
@@ -232,7 +238,7 @@ In this schedule, you create a schedule trigger for the pipeline. The trigger ru
232
238
233
239
1. On the **Edit trigger** page, review the warning, and then select **Save**. The pipeline in this example doesn't take any parameters.
234
240
235
-
1. Click **Publish all** to publish the change.
241
+
1. Select **Publish all** to publish the change.
236
242
237
243
1. Go to the **Monitor** tab on the left to see the triggered pipeline runs.
238
244
@@ -244,7 +250,22 @@ In this schedule, you create a schedule trigger for the pipeline. The trigger ru
244
250
245
251
1. Verify that two rows per minute (for each pipeline run) are inserted into the **emp** table until the specified end time.
246
252
253
+
## Disable trigger
254
+
255
+
To disable your every minute trigger that you created, follow these steps:
256
+
257
+
1. Select the **Manage** pane on the left side.
258
+
259
+
1. Under **Author** select **Triggers**.
260
+
261
+
1. Hover over the **RunEveryMinute** trigger you created.
262
+
1. Select the **Stop** button to disable the trigger from running.
263
+
1. Select the **Delete** button to disable and delete the trigger.
264
+
265
+
1. Select **Publish all** to save your changes.
266
+
247
267
## Related content
268
+
248
269
The pipeline in this sample copies data from one location to another location in Blob storage. You learned how to:
249
270
250
271
> [!div class="checklist"]
@@ -254,9 +275,15 @@ The pipeline in this sample copies data from one location to another location in
254
275
> * Trigger the pipeline manually.
255
276
> * Trigger the pipeline on a schedule.
256
277
> * Monitor the pipeline and activity runs.
278
+
> * Disable or delete your scheduled trigger.
257
279
258
280
259
281
Advance to the following tutorial to learn how to copy data from on-premises to the cloud:
260
282
261
283
> [!div class="nextstepaction"]
262
284
>[Copy data from on-premises to the cloud](tutorial-hybrid-copy-portal.md)
285
+
286
+
For more information on copying data to or from Azure Blob Storage and Azure SQL Database, see these connector guides:
287
+
288
+
- [Copy and transform data in Azure Blob Storage](connector-azure-blob-storage.md)
289
+
- [Copy and transform data in Azure SQL Database](connector-azure-sql-database.md)
0 commit comments