You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In this tutorial, you'll use the Azure portal to create a data factory. Then, you'll use the Copy Data tool to create a pipeline that incrementally copies new and changed files only, based on their **LastModifiedDate**from Azure Blob storage to Azure Blob storage.
21
+
In this tutorial, you'll use the Azure portal to create a data factory. You'll then use the Copy Data tool to create a pipeline that incrementally copies new and changed files only, from Azure Blob storage to Azure Blob storage. It uses `LastModifiedDate` to determine which files to copy.
22
22
23
-
By doing so, ADF will scan all the files from the source store, apply the file filter by their LastModifiedDate, and copy the new and updated file only since last time to the destination store. Please note that if you let ADF scan huge amounts of files but only copy a few files to destination, you would still expect the long duration due to file scanning is time consuming as well.
23
+
After you complete the steps here, Azure Data Factory will scan all the files in the source store, apply the file filter by `LastModifiedDate`, and copy to the destination store only files that are new or have been updated since last time. Note that if Data Factory scans large numbers of files, you should still expect long durations. File scanning is time consuming, even when the amount of data copied is reduced.
24
24
25
25
> [!NOTE]
26
-
> If you're new to Azure Data Factory, see [Introduction to Azure Data Factory](introduction.md).
26
+
> If you're new to Data Factory, see [Introduction to Azure Data Factory](introduction.md).
27
27
28
-
In this tutorial, you will perform the following tasks:
28
+
In this tutorial, you'll complete these tasks:
29
29
30
30
> [!div class="checklist"]
31
31
> * Create a data factory.
@@ -35,101 +35,101 @@ In this tutorial, you will perform the following tasks:
35
35
## Prerequisites
36
36
37
37
***Azure subscription**: If you don't have an Azure subscription, create a [free account](https://azure.microsoft.com/free/) before you begin.
38
-
***Azure storage account**: Use Blob storage as the _source_ and _sink_ data store. If you don't have an Azure storage account, see the instructions in [Create a storage account](../storage/common/storage-account-create.md).
38
+
***Azure Storage account**: Use Blob storage for the source and sink data stores. If you don't have an Azure Storage account, follow the instructions in [Create a storage account](../storage/common/storage-account-create.md).
39
39
40
-
###Create two containers in Blob storage
40
+
## Create two containers in Blob storage
41
41
42
-
Prepare your Blob storage for the tutorial by performing these steps.
42
+
Prepare your Blob storage for the tutorial by completing these steps:
43
43
44
-
1. Create a container named **source**. You can use various tools to perform this task, such as[Azure Storage Explorer](https://storageexplorer.com/).
44
+
1. Create a container named **source**. You can use various tools to perform this task, like[Azure Storage Explorer](https://storageexplorer.com/).
45
45
46
46
2. Create a container named **destination**.
47
47
48
48
## Create a data factory
49
49
50
-
1.On the left menu, select **Create a resource** >**Data + Analytics** > **Data Factory**:
50
+
1.In the left pane, select **Create a resource**. Select**Analytics** > **Data Factory**:
51
51
52
-

52
+

53
53
54
54
2. On the **New data factory** page, under **Name**, enter **ADFTutorialDataFactory**.
55
55
56
-
The name for your data factory must be _globally unique_. You might receive the following error message:
56
+
The name for your data factory must be globally unique. You might receive this error message:
57
57
58
-

58
+

59
59
60
60
If you receive an error message about the name value, enter a different name for the data factory. For example, use the name _**yourname**_**ADFTutorialDataFactory**. For the naming rules for Data Factory artifacts, see [Data Factory naming rules](naming-rules.md).
61
-
3.Select the Azure **subscription** in which you'll create the new data factory.
62
-
4.For**Resource Group**, take one of the following steps:
61
+
3.Under **Subscription**, select the Azure subscription in which you'll create the new data factory.
62
+
4.Under**Resource Group**, take one of these steps:
63
63
64
-
* Select **Use existing** and select an existing resource group from the drop-down list.
64
+
* Select **Use existing** and then select an existing resource group in the list.
65
65
66
-
* Select **Create new** and enter the name of a resource group.
66
+
* Select **Create new** and then enter a name for the resource group.
67
67
68
68
To learn about resource groups, see [Use resource groups to manage your Azure resources](../azure-resource-manager/management/overview.md).
69
69
70
-
5. Under **version**, select **V2**.
71
-
6. Under **location**, select the location for the data factory. Only supported locations are displayed in the drop-down list. The data stores (for example, Azure Storage and SQL Database) and computes (for example, Azure HDInsight) that your data factory uses can be in other locations and regions.
70
+
5. Under **Version**, select **V2**.
71
+
6. Under **Location**, select the location for the data factory. Only supported locations appear in the list. The data stores (for example, Azure Storage and Azure SQL Database) and computes (for example, Azure HDInsight) that your data factory uses can be in other locations and regions.
72
72
8. Select **Create**.
73
-
9. After creation is finished, the **Data Factory** home page is displayed.
74
-
10. To open the Azure Data Factory user interface (UI) on a separate tab, select the **Author & Monitor** tile.
73
+
9. After the data factory is created, the data factory home page appears.
74
+
10. To open the Azure Data Factory user interface (UI) on a separate tab, select the **Author & Monitor** tile:
75
75
76
76

77
77
78
78
## Use the Copy Data tool to create a pipeline
79
79
80
-
1. On the **Let's get started** page, select the **Copy Data**title to open the Copy Data tool.
80
+
1. On the **Let's get started** page, select the **Copy Data**tile to open the Copy Data tool:
81
81
82
-

82
+

83
83
84
84
2. On the **Properties** page, take the following steps:
85
85
86
86
a. Under **Task name**, enter **DeltaCopyFromBlobPipeline**.
87
87
88
-
b. Under **Task cadence** or **Task schedule**, select **Run regularly on schedule**.
88
+
b. Under **Task cadence or Task schedule**, select **Run regularly on schedule**.
89
89
90
-
c. Under **Trigger Type**, select **Tumbling Window**.
90
+
c. Under **Trigger type**, select **Tumbling window**.
91
91
92
92
d. Under **Recurrence**, enter **15 Minute(s)**.
93
93
94
94
e. Select **Next**.
95
95
96
-
The Data Factory UI creates a pipeline with the specified task name.
96
+
Data Factory creates a pipeline with the specified task name.

99
99
100
-
3. On the **Source data store** page, complete the following steps:
100
+
3. On the **Source data store** page, complete these steps:
101
101
102
-
a. Select **+ Create new connection**, to add a connection.
102
+
a. Select **Create new connection** to add a connection.
103
103
104
-
b. Select **Azure Blob Storage** from the gallery, and then select **Continue**.
104
+
b. Select **Azure Blob Storage** from the gallery, and then select **Continue**:
105
105
106
-

106
+

107
107
108
-
c. On the **New Linked Service(Azure Blob Storage)** page, select your storage account from the **Storage account name** list. Test connection and then select **Create**.
108
+
c. On the **New Linked Service(Azure Blob Storage)** page, select your storage account from the **Storage account name** list. Test the connection and then select **Create**.
109
109
110
-
d. Select the newly created linked service and then select **Next**.
110
+
d. Select the new linked service and then select **Next**:
111
111
112
-

112
+

113
113
114
114
4. On the **Choose the input file or folder** page, complete the following steps:
115
115
116
-
a. Browse and select the **source** folder, and then select **Choose**.
116
+
a. Browse for and select the **source** folder, and then select **Choose**.
117
117
118
118

119
119
120
120
b. Under **File loading behavior**, select **Incremental load: LastModifiedDate**.
121
121
122
-
c. Check**Binary copy** and select **Next**.
122
+
c. Select**Binary copy** and then select **Next**:
123
123
124
-

124
+

125
125
126
-
5. On the **Destination data store** page, select the **AzureBlobStorage** that you created. This is the same storage account as the source data store. Then select **Next**.
126
+
5. On the **Destination data store** page, select the **AzureBlobStorage**service that you created. This is the same storage account as the source data store. Then select **Next**.
127
127
128
128
6. On the **Choose the output file or folder** page, complete the following steps:
129
129
130
-
a. Browse and select the **destination** folder, and then select **Choose**.
130
+
a. Browse for and select the **destination** folder, and then select **Choose**:
131
131
132
-

132
+

133
133
134
134
b. Select **Next**.
135
135
@@ -143,43 +143,43 @@ Prepare your Blob storage for the tutorial by performing these steps.
10. Notice that the **Monitor** tab on the left is automatically selected. The application switches to the **Monitor** tab. You see the status of the pipeline. Select **Refresh** to refresh the list. Click the link under **PIPELINE NAME** to view activity run details or rerun the pipeline.
146
+
10. Notice that the **Monitor** tab on the left is automatically selected. The application switches to the **Monitor** tab. You see the status of the pipeline. Select **Refresh** to refresh the list. Select the link under **PIPELINE NAME** to view activity run details or to run the pipeline again.
147
147
148
-

148
+

149
149
150
-
11. There's only one activity (the copy activity) in the pipeline, so you see only one entry. For details about the copy operation, select the **Details** link (eyeglasses icon) under the **ACTIVITY NAME** column. For details about the properties, see [Copy Activity overview](copy-activity-overview.md).
150
+
11. There's only one activity (the copy activity) in the pipeline, so you see only one entry. For details about the copy operation, select the **Details** link (the eyeglasses icon) in the **ACTIVITY NAME** column. For details about the properties, see [Copy activity overview](copy-activity-overview.md).
151
151
152
-

152
+

153
153
154
-
Because there is no file in the **source** container in your Blob storage account, you will not see any file copied to the **destination** container in your Blob storage account.
154
+
Because there are no files in the source container in your Blob storage account, you won't see any files copied to the destination container in the account:
155
155
156
-

156
+

157
157
158
-
12. Create an empty text file and name it **file1.txt**. Upload this text file to the **source** container in your storage account. You can use various tools to perform these tasks, such as[Azure Storage Explorer](https://storageexplorer.com/).
158
+
12. Create an empty text file and name it **file1.txt**. Upload this text file to the source container in your storage account. You can use various tools to perform these tasks, like[Azure Storage Explorer](https://storageexplorer.com/).
159
159
160
-

160
+

161
161
162
-
13. To go back to the **Pipeline Runs** view, select **All pipeline runs**, and wait for the same pipeline to be triggered again automatically.
162
+
13. To go back to the **Pipeline runs** view, select **All pipeline runs**, and wait for the same pipeline to be automatically triggered again.
163
163
164
-

164
+

165
165
166
-
14. When the second pipeline run completes, follow the same steps mentioned above to review the activity run details.
166
+
14. When the second pipeline run completes, follow the same steps mentioned previously to review the activity run details.
167
167
168
-
You will see that one file (file1.txt) has been copied from the **source** container to the **destination** container of your Blob storage account.
168
+
You'll see that one file (file1.txt) has been copied from the source container to the destination container of your Blob storage account:
169
169
170
-

170
+

171
171
172
-
15. Create another empty text file and name it **file2.txt**. Upload this text file to the **source** container in your Blob storage account.
172
+
15. Create another empty text file and name it **file2.txt**. Upload this text file to the source container in your Blob storage account.
173
173
174
-
16. Repeat steps 13 and 14 for this second text file. You will see that only the new file (file2.txt) has been copied from the **source** container to the **destination** container of your storage account in the next pipeline run.
174
+
16. Repeat steps 13 and 14 for the second text file. You'll see that only the new file (file2.txt) was copied from the source container to the destination container of your storage account during this pipeline run.
175
175
176
-
You can also verify this by using [Azure Storage Explorer](https://storageexplorer.com/) to scan the files.
176
+
You can also verify that only one file has been copied by using [Azure Storage Explorer](https://storageexplorer.com/) to scan the files:
177
177
178
-

178
+

179
179
180
180
181
181
## Next steps
182
-
Advance to the following tutorial to learn about transforming data by using an Apache Spark cluster on Azure:
182
+
Go to the following tutorial to learn how to transform data by using an Apache Spark cluster on Azure:
183
183
184
184
> [!div class="nextstepaction"]
185
185
>[Transform data in the cloud by using an Apache Spark cluster](tutorial-transform-data-spark-portal.md)
0 commit comments