Skip to content

Commit 2ea21e5

Browse files
authored
Merge pull request #110353 from v-albemi/tutorial-incremental-copy-lastmodified-copy-data-tool
edit pass: tutorial-incremental-copy-lastmodified-copy-data-tool
2 parents 0005ced + c4477d5 commit 2ea21e5

File tree

1 file changed

+58
-58
lines changed

1 file changed

+58
-58
lines changed
Lines changed: 58 additions & 58 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
---
2-
title: Data tool to copy new and updated files incrementally
2+
title: Data tool to copy new and updated files incrementally
33
description: Create an Azure data factory and then use the Copy Data tool to incrementally load new files based on LastModifiedDate.
44
services: data-factory
55
author: dearandyxu
@@ -18,14 +18,14 @@ ms.date: 3/18/2020
1818

1919
[!INCLUDE[appliesto-adf-xxx-md](includes/appliesto-adf-xxx-md.md)]
2020

21-
In this tutorial, you'll use the Azure portal to create a data factory. Then, you'll use the Copy Data tool to create a pipeline that incrementally copies new and changed files only, based on their **LastModifiedDate** from Azure Blob storage to Azure Blob storage.
21+
In this tutorial, you'll use the Azure portal to create a data factory. You'll then use the Copy Data tool to create a pipeline that incrementally copies new and changed files only, from Azure Blob storage to Azure Blob storage. It uses `LastModifiedDate` to determine which files to copy.
2222

23-
By doing so, ADF will scan all the files from the source store, apply the file filter by their LastModifiedDate, and copy the new and updated file only since last time to the destination store. Please note that if you let ADF scan huge amounts of files but only copy a few files to destination, you would still expect the long duration due to file scanning is time consuming as well.
23+
After you complete the steps here, Azure Data Factory will scan all the files in the source store, apply the file filter by `LastModifiedDate`, and copy to the destination store only files that are new or have been updated since last time. Note that if Data Factory scans large numbers of files, you should still expect long durations. File scanning is time consuming, even when the amount of data copied is reduced.
2424

2525
> [!NOTE]
26-
> If you're new to Azure Data Factory, see [Introduction to Azure Data Factory](introduction.md).
26+
> If you're new to Data Factory, see [Introduction to Azure Data Factory](introduction.md).
2727
28-
In this tutorial, you will perform the following tasks:
28+
In this tutorial, you'll complete these tasks:
2929

3030
> [!div class="checklist"]
3131
> * Create a data factory.
@@ -35,101 +35,101 @@ In this tutorial, you will perform the following tasks:
3535
## Prerequisites
3636

3737
* **Azure subscription**: If you don't have an Azure subscription, create a [free account](https://azure.microsoft.com/free/) before you begin.
38-
* **Azure storage account**: Use Blob storage as the _source_ and _sink_ data store. If you don't have an Azure storage account, see the instructions in [Create a storage account](../storage/common/storage-account-create.md).
38+
* **Azure Storage account**: Use Blob storage for the source and sink data stores. If you don't have an Azure Storage account, follow the instructions in [Create a storage account](../storage/common/storage-account-create.md).
3939

40-
### Create two containers in Blob storage
40+
## Create two containers in Blob storage
4141

42-
Prepare your Blob storage for the tutorial by performing these steps.
42+
Prepare your Blob storage for the tutorial by completing these steps:
4343

44-
1. Create a container named **source**. You can use various tools to perform this task, such as [Azure Storage Explorer](https://storageexplorer.com/).
44+
1. Create a container named **source**. You can use various tools to perform this task, like [Azure Storage Explorer](https://storageexplorer.com/).
4545

4646
2. Create a container named **destination**.
4747

4848
## Create a data factory
4949

50-
1. On the left menu, select **Create a resource** > **Data + Analytics** > **Data Factory**:
50+
1. In the left pane, select **Create a resource**. Select **Analytics** > **Data Factory**:
5151

52-
![Data Factory selection in the "New" pane](./media/doc-common-process/new-azure-data-factory-menu.png)
52+
![Select Data Factory](./media/doc-common-process/new-azure-data-factory-menu.png)
5353

5454
2. On the **New data factory** page, under **Name**, enter **ADFTutorialDataFactory**.
5555

56-
The name for your data factory must be _globally unique_. You might receive the following error message:
56+
The name for your data factory must be globally unique. You might receive this error message:
5757

58-
![New data factory error message](./media/doc-common-process/name-not-available-error.png)
58+
![Name not available error message](./media/doc-common-process/name-not-available-error.png)
5959

6060
If you receive an error message about the name value, enter a different name for the data factory. For example, use the name _**yourname**_**ADFTutorialDataFactory**. For the naming rules for Data Factory artifacts, see [Data Factory naming rules](naming-rules.md).
61-
3. Select the Azure **subscription** in which you'll create the new data factory.
62-
4. For **Resource Group**, take one of the following steps:
61+
3. Under **Subscription**, select the Azure subscription in which you'll create the new data factory.
62+
4. Under **Resource Group**, take one of these steps:
6363

64-
* Select **Use existing** and select an existing resource group from the drop-down list.
64+
* Select **Use existing** and then select an existing resource group in the list.
6565

66-
* Select **Create new** and enter the name of a resource group.
66+
* Select **Create new** and then enter a name for the resource group.
6767

6868
To learn about resource groups, see [Use resource groups to manage your Azure resources](../azure-resource-manager/management/overview.md).
6969

70-
5. Under **version**, select **V2**.
71-
6. Under **location**, select the location for the data factory. Only supported locations are displayed in the drop-down list. The data stores (for example, Azure Storage and SQL Database) and computes (for example, Azure HDInsight) that your data factory uses can be in other locations and regions.
70+
5. Under **Version**, select **V2**.
71+
6. Under **Location**, select the location for the data factory. Only supported locations appear in the list. The data stores (for example, Azure Storage and Azure SQL Database) and computes (for example, Azure HDInsight) that your data factory uses can be in other locations and regions.
7272
8. Select **Create**.
73-
9. After creation is finished, the **Data Factory** home page is displayed.
74-
10. To open the Azure Data Factory user interface (UI) on a separate tab, select the **Author & Monitor** tile.
73+
9. After the data factory is created, the data factory home page appears.
74+
10. To open the Azure Data Factory user interface (UI) on a separate tab, select the **Author & Monitor** tile:
7575

7676
![Data factory home page](./media/doc-common-process/data-factory-home-page.png)
7777

7878
## Use the Copy Data tool to create a pipeline
7979

80-
1. On the **Let's get started** page, select the **Copy Data** title to open the Copy Data tool.
80+
1. On the **Let's get started** page, select the **Copy Data** tile to open the Copy Data tool:
8181

82-
![Copy Data tool tile](./media/doc-common-process/get-started-page.png)
82+
![Copy Data tile](./media/doc-common-process/get-started-page.png)
8383

8484
2. On the **Properties** page, take the following steps:
8585

8686
a. Under **Task name**, enter **DeltaCopyFromBlobPipeline**.
8787

88-
b. Under **Task cadence** or **Task schedule**, select **Run regularly on schedule**.
88+
b. Under **Task cadence or Task schedule**, select **Run regularly on schedule**.
8989

90-
c. Under **Trigger Type**, select **Tumbling Window**.
90+
c. Under **Trigger type**, select **Tumbling window**.
9191

9292
d. Under **Recurrence**, enter **15 Minute(s)**.
9393

9494
e. Select **Next**.
9595

96-
The Data Factory UI creates a pipeline with the specified task name.
96+
Data Factory creates a pipeline with the specified task name.
9797

98-
![Properties page](./media/tutorial-incremental-copy-lastmodified-copy-data-tool/copy-data-tool-properties-page.png)
98+
![Copy data properties page](./media/tutorial-incremental-copy-lastmodified-copy-data-tool/copy-data-tool-properties-page.png)
9999

100-
3. On the **Source data store** page, complete the following steps:
100+
3. On the **Source data store** page, complete these steps:
101101

102-
a. Select **+ Create new connection**, to add a connection.
102+
a. Select **Create new connection** to add a connection.
103103

104-
b. Select **Azure Blob Storage** from the gallery, and then select **Continue**.
104+
b. Select **Azure Blob Storage** from the gallery, and then select **Continue**:
105105

106-
![Source data store page](./media/tutorial-incremental-copy-lastmodified-copy-data-tool/source-data-store-page-select-blob.png)
106+
![Select Azure Blog Storage](./media/tutorial-incremental-copy-lastmodified-copy-data-tool/source-data-store-page-select-blob.png)
107107

108-
c. On the **New Linked Service(Azure Blob Storage)** page, select your storage account from the **Storage account name** list. Test connection and then select **Create**.
108+
c. On the **New Linked Service (Azure Blob Storage)** page, select your storage account from the **Storage account name** list. Test the connection and then select **Create**.
109109

110-
d. Select the newly created linked service and then select **Next**.
110+
d. Select the new linked service and then select **Next**:
111111

112-
![Source data store page](./media/tutorial-incremental-copy-lastmodified-copy-data-tool/source-data-store-page-select-linkedservice.png)
112+
![Select the new linked service](./media/tutorial-incremental-copy-lastmodified-copy-data-tool/source-data-store-page-select-linkedservice.png)
113113

114114
4. On the **Choose the input file or folder** page, complete the following steps:
115115

116-
a. Browse and select the **source** folder, and then select **Choose**.
116+
a. Browse for and select the **source** folder, and then select **Choose**.
117117

118118
![Choose the input file or folder](./media/tutorial-incremental-copy-lastmodified-copy-data-tool/choose-input-file-folder.png)
119119

120120
b. Under **File loading behavior**, select **Incremental load: LastModifiedDate**.
121121

122-
c. Check **Binary copy** and select **Next**.
122+
c. Select **Binary copy** and then select **Next**:
123123

124-
![Choose the input file or folder](./media/tutorial-incremental-copy-lastmodified-copy-data-tool/check-binary-copy.png)
124+
![Choose the input file or folder page](./media/tutorial-incremental-copy-lastmodified-copy-data-tool/check-binary-copy.png)
125125

126-
5. On the **Destination data store** page, select the **AzureBlobStorage** that you created. This is the same storage account as the source data store. Then select **Next**.
126+
5. On the **Destination data store** page, select the **AzureBlobStorage** service that you created. This is the same storage account as the source data store. Then select **Next**.
127127

128128
6. On the **Choose the output file or folder** page, complete the following steps:
129129

130-
a. Browse and select the **destination** folder, and then select **Choose**.
130+
a. Browse for and select the **destination** folder, and then select **Choose**:
131131

132-
![Choose the output file or folder](./media/tutorial-incremental-copy-lastmodified-copy-data-tool/choose-output-file-folder.png)
132+
![Choose the output file or folder page](./media/tutorial-incremental-copy-lastmodified-copy-data-tool/choose-output-file-folder.png)
133133

134134
b. Select **Next**.
135135

@@ -143,43 +143,43 @@ Prepare your Blob storage for the tutorial by performing these steps.
143143

144144
![Deployment page](./media/tutorial-incremental-copy-lastmodified-copy-data-tool/deployment-page.png)
145145

146-
10. Notice that the **Monitor** tab on the left is automatically selected. The application switches to the **Monitor** tab. You see the status of the pipeline. Select **Refresh** to refresh the list. Click the link under **PIPELINE NAME** to view activity run details or rerun the pipeline.
146+
10. Notice that the **Monitor** tab on the left is automatically selected. The application switches to the **Monitor** tab. You see the status of the pipeline. Select **Refresh** to refresh the list. Select the link under **PIPELINE NAME** to view activity run details or to run the pipeline again.
147147

148-
![Refresh list and select View Activity Runs](./media/tutorial-incremental-copy-lastmodified-copy-data-tool/monitor-pipeline-runs1.png)
148+
![Refresh the list and view activity run details](./media/tutorial-incremental-copy-lastmodified-copy-data-tool/monitor-pipeline-runs1.png)
149149

150-
11. There's only one activity (the copy activity) in the pipeline, so you see only one entry. For details about the copy operation, select the **Details** link (eyeglasses icon) under the **ACTIVITY NAME** column. For details about the properties, see [Copy Activity overview](copy-activity-overview.md).
150+
11. There's only one activity (the copy activity) in the pipeline, so you see only one entry. For details about the copy operation, select the **Details** link (the eyeglasses icon) in the **ACTIVITY NAME** column. For details about the properties, see [Copy activity overview](copy-activity-overview.md).
151151

152-
![Copy activity is in pipeline](./media/tutorial-incremental-copy-lastmodified-copy-data-tool/monitor-pipeline-runs2.png)
152+
![Copy activity in the pipeline](./media/tutorial-incremental-copy-lastmodified-copy-data-tool/monitor-pipeline-runs2.png)
153153

154-
Because there is no file in the **source** container in your Blob storage account, you will not see any file copied to the **destination** container in your Blob storage account.
154+
Because there are no files in the source container in your Blob storage account, you won't see any files copied to the destination container in the account:
155155

156-
![No file in source container or destination container](./media/tutorial-incremental-copy-lastmodified-copy-data-tool/monitor-pipeline-runs3.png)
156+
![No files in source container or destination container](./media/tutorial-incremental-copy-lastmodified-copy-data-tool/monitor-pipeline-runs3.png)
157157

158-
12. Create an empty text file and name it **file1.txt**. Upload this text file to the **source** container in your storage account. You can use various tools to perform these tasks, such as [Azure Storage Explorer](https://storageexplorer.com/).
158+
12. Create an empty text file and name it **file1.txt**. Upload this text file to the source container in your storage account. You can use various tools to perform these tasks, like [Azure Storage Explorer](https://storageexplorer.com/).
159159

160-
![Create file1.txt and upload to source container](./media/tutorial-incremental-copy-lastmodified-copy-data-tool/monitor-pipeline-runs3-1.png)
160+
![Create file1.txt and upload it to the source container](./media/tutorial-incremental-copy-lastmodified-copy-data-tool/monitor-pipeline-runs3-1.png)
161161

162-
13. To go back to the **Pipeline Runs** view, select **All pipeline runs**, and wait for the same pipeline to be triggered again automatically.
162+
13. To go back to the **Pipeline runs** view, select **All pipeline runs**, and wait for the same pipeline to be automatically triggered again.
163163

164-
![Select All Pipeline Runs](./media/tutorial-incremental-copy-lastmodified-copy-data-tool/monitor-pipeline-runs4.png)
164+
![Select All pipeline runs](./media/tutorial-incremental-copy-lastmodified-copy-data-tool/monitor-pipeline-runs4.png)
165165

166-
14. When the second pipeline run completes, follow the same steps mentioned above to review the activity run details.
166+
14. When the second pipeline run completes, follow the same steps mentioned previously to review the activity run details.
167167

168-
You will see that one file (file1.txt) has been copied from the **source** container to the **destination** container of your Blob storage account.
168+
You'll see that one file (file1.txt) has been copied from the source container to the destination container of your Blob storage account:
169169

170-
![File1.txt has been copied from source container to destination container](./media/tutorial-incremental-copy-lastmodified-copy-data-tool/monitor-pipeline-runs6.png)
170+
![file1.txt has been copied from the source container to the destination container](./media/tutorial-incremental-copy-lastmodified-copy-data-tool/monitor-pipeline-runs6.png)
171171

172-
15. Create another empty text file and name it **file2.txt**. Upload this text file to the **source** container in your Blob storage account.
172+
15. Create another empty text file and name it **file2.txt**. Upload this text file to the source container in your Blob storage account.
173173

174-
16. Repeat steps 13 and 14 for this second text file. You will see that only the new file (file2.txt) has been copied from the **source** container to the **destination** container of your storage account in the next pipeline run.
174+
16. Repeat steps 13 and 14 for the second text file. You'll see that only the new file (file2.txt) was copied from the source container to the destination container of your storage account during this pipeline run.
175175

176-
You can also verify this by using [Azure Storage Explorer](https://storageexplorer.com/) to scan the files.
176+
You can also verify that only one file has been copied by using [Azure Storage Explorer](https://storageexplorer.com/) to scan the files:
177177

178-
![Scan files using Azure Storage Explorer](./media/tutorial-incremental-copy-lastmodified-copy-data-tool/monitor-pipeline-runs8.png)
178+
![Scan files by using Azure Storage Explorer](./media/tutorial-incremental-copy-lastmodified-copy-data-tool/monitor-pipeline-runs8.png)
179179

180180

181181
## Next steps
182-
Advance to the following tutorial to learn about transforming data by using an Apache Spark cluster on Azure:
182+
Go to the following tutorial to learn how to transform data by using an Apache Spark cluster on Azure:
183183

184184
> [!div class="nextstepaction"]
185185
>[Transform data in the cloud by using an Apache Spark cluster](tutorial-transform-data-spark-portal.md)

0 commit comments

Comments
 (0)