You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/batch/tutorial-run-python-batch-azure-data-factory.md
+8-8Lines changed: 8 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -9,7 +9,7 @@ ms.custom: mvc, devx-track-python
9
9
10
10
# Tutorial: Use Batch Explorer, Storage Explorer, and Python to run a Batch job through Data Factory
11
11
12
-
This tutorial walks you through creating and running an Azure Data Factory pipeline that runs an Azure Batch data manipulation workload. A Python script runs on the Batch nodes to get comma-separated value (CSV) input from an Azure Blob Storage container, manipulate the data, and write the output to a different storage container. You use Batch Explorer to create a Batch pool and nodes, and Azure Storage Explorer to work with storage containers.
12
+
This tutorial walks you through creating and running an Azure Data Factory pipeline that runs an Azure Batch data manipulation workload. A Python script runs on the Batch nodes to get comma-separated value (CSV) input from an Azure Blob Storage container, manipulate the data, and write the output to a different storage container. You use Batch Explorer to create a Batch pool and nodes, and Azure Storage Explorer to work with storage containers and files.
13
13
14
14
In this tutorial, you learn how to:
15
15
@@ -38,7 +38,7 @@ Use Batch Explorer to create a pool of compute nodes to run your workload.
38
38
1. Select your Batch account.
39
39
1. Select **Pools** on the left sidebar, and then select the **+** icon to add a pool.
40
40
41
-
[](media/run-python-batch-azure-data-factory/batch-explorer-add-pool.png#lightbox)
41
+
[](media/run-python-batch-azure-data-factory/batch-explorer-add-pool.png#lightbox)
42
42
43
43
1. Complete the **Add a pool to the account** form as follows:
44
44
@@ -53,7 +53,7 @@ Use Batch Explorer to create a pool of compute nodes to run your workload.
53
53
54
54
## Use Storage Explorer to create blob containers
55
55
56
-
Use Storage Explorer to create blob containers to store input and output files, and upload your input files.
56
+
Use Storage Explorer to create blob containers to store input and output files, and then upload your input files.
57
57
58
58
1. Sign in to Storage Explorer with your Azure credentials.
59
59
1. In the left sidebar, locate and expand the storage account that's linked to your Batch account.
@@ -68,7 +68,7 @@ Use Storage Explorer to create blob containers to store input and output files,
68
68
[](media/run-python-batch-azure-data-factory/storage-explorer.png#lightbox)
69
69
## Develop a Python script
70
70
71
-
The following Python script loads the *iris.csv* dataset from your Storage Explorer **input** container, manipulates the data, and saves the results to the **output** container.
71
+
The following Python script loads the *iris.csv* dataset file from your Storage Explorer **input** container, manipulates the data, and saves the results to the **output** container.
72
72
73
73
The script needs to use the connection string for the Azure Storage account that's linked to your Batch account. To get the connection string:
74
74
@@ -113,7 +113,7 @@ Run the script locally to test and validate functionality.
113
113
python main.py
114
114
```
115
115
116
-
The script should produce an output file named *iris_setosa.csv* that contains only the data records that have Species = setosa. When the script works correctly, upload the *main.py* script file to your Storage Explorer **input** container.
116
+
The script should produce an output file named *iris_setosa.csv* that contains only the data records that have Species = setosa. After you verify that it works correctly, upload the *main.py* script file to your Storage Explorer **input** container.
117
117
118
118
## Set up a Data Factory pipeline
119
119
@@ -148,7 +148,7 @@ The Data Factory pipeline uses your Batch and Storage account names, account key
148
148

149
149
150
150
1. Select the **Azure Batch** tab, and then select **New**.
151
-
1. Complete the **New linked service**screen as follows:
151
+
1. Complete the **New linked service**form as follows:
152
152
153
153
-**Name**: Enter a name for the linked service, such as **AzureBatch1**.
154
154
-**Access key**: Enter the primary access key you copied from your Batch account.
@@ -178,7 +178,7 @@ The Data Factory pipeline uses your Batch and Storage account names, account key
178
178
179
179
## Use Batch Explorer to view log files
180
180
181
-
If your pipeline produces warnings or errors, you can use Batch Explorer to look at the *stdout.txt* and *stderr.txt* output files for more information.
181
+
If running your pipeline produces warnings or errors, you can use Batch Explorer to look at the *stdout.txt* and *stderr.txt* output files for more information.
182
182
183
183
1. In Batch Explorer, select **Jobs** from the left sidebar.
184
184
1. Select the **adfv2-custom-activity-pool** job.
@@ -187,7 +187,7 @@ If your pipeline produces warnings or errors, you can use Batch Explorer to look
187
187
188
188
## Clean up resources
189
189
190
-
Batch accounts, jobs, and tasks are free, but compute nodes incur charges even if they're not running Batch jobs. It's best to allocate node pools only as needed, and delete the pools when you're done with them. Deleting pools deletes all task output on the nodes, and the nodes themselves.
190
+
Batch accounts, jobs, and tasks are free, but compute nodes incur charges even when they're not running jobs. It's best to allocate node pools only as needed, and delete the pools when you're done with them. Deleting pools deletes all task output on the nodes, and the nodes themselves.
191
191
192
192
Input and output files remain in the storage account and can incur charges. When you no longer need the files, you can delete the files or containers. When you no longer need your Batch account or linked storage account, you can delete them.
0 commit comments