You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/machine-learning/batch-inference/how-to-access-data-batch-endpoints-jobs.md
+16-7Lines changed: 16 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -15,15 +15,15 @@ ms.custom: devplatv2
15
15
16
16
# Accessing data from batch endpoints jobs
17
17
18
-
Batch endpoints can be used to perform batch scoring on large amounts of data. Such data can be placed in different places. In this tutorial we'll cover the different places where batch endpoints can read data from to.
18
+
Batch endpoints can be used to perform batch scoring on large amounts of data. Such data can be placed in different places. In this tutorial we'll cover the different places where batch endpoints can read data from and how to reference it.
19
19
20
20
## Prerequisites
21
21
22
22
* This example assumes that you've a model correctly deployed as a batch endpoint. Particularly, we're using the *heart condition classifier* created in the tutorial [Using MLflow models in batch deployments](how-to-mlflow-batch.md).
23
23
24
24
## Supported data inputs
25
25
26
-
Batch endpoints support reading files or folders that are located in different locations:
26
+
Batch endpoints support reading files located in tje following storage options:
27
27
28
28
* Azure Machine Learning Data Stores. The following stores are supported:
29
29
* Azure Blob Storage
@@ -47,7 +47,7 @@ Batch endpoints support reading files or folders that are located in different l
47
47
48
48
## Reading data from data stores
49
49
50
-
We're going to first upload some data to the default data store in the Azure Machine Learning workspace and then run a batch deployment on it. Follow these steps to run a batch endpoint job using data stored in a data store:
50
+
Data from Azure Machine Learning registered data stores can be directly referenced by batch deployments jobs. In this example, we're going to first upload some data to the default data store in the Azure Machine Learning workspace and then run a batch deployment on it. Follow these steps to run a batch endpoint job using data stored in a data store:
51
51
52
52
1. Let's get access to the default data store in the Azure Machine Learning workspace. If your data is in a different store, you can use that store instead. There's no requirement of using the default data store.
53
53
@@ -136,10 +136,10 @@ We're going to first upload some data to the default data store in the Azure Mac
136
136
137
137
## Reading data from a data asset
138
138
139
-
Follow these steps to run a batch endpoint job using data stored in a registered data asset in Azure Machine Learning:
139
+
Azure Machine Learning data assets (formaly known as datasets) are supported as inputs for jobs. Follow these steps to run a batch endpoint job using data stored in a registered data asset in Azure Machine Learning:
140
140
141
141
> [!WARNING]
142
-
> Data assets of type Table (`MLTable`) isn't currently supported.
142
+
> Data assets of type Table (`MLTable`) aren't currently supported.
143
143
144
144
1. Let's create the data asset first. This data asset consists of a folder with multiple CSV files that we want to process in parallel using batch endpoints. You can skip this step is your data is already registered as a data asset.
145
145
@@ -243,7 +243,10 @@ Follow these steps to run a batch endpoint job using data stored in a registered
243
243
244
244
## Reading data from Azure Storage Accounts
245
245
246
-
Azure Machine Learning batch endpoints can read data from cloud locations in Azure Storage Accounts. Both public and private cloud locations are supported. Use the following steps to run a batch endpoint job using data stored in a storage account:
246
+
Azure Machine Learning batch endpoints can read data from cloud locations in Azure Storage Accounts, both public and private. Use the following steps to run a batch endpoint job using data stored in a storage account:
247
+
248
+
> [!NOTE]
249
+
> Check the section [Security considerations when reading data](#security-considerations-when-reading-data) for learn more about additional configuration required to successfully read data from storage accoutns.
247
250
248
251
1. Create a data input:
249
252
@@ -335,7 +338,7 @@ Batch endpoints ensure that only authorized users are able to invoke batch deplo
335
338
| Data store | Yes | Data store's credentials in the workspace | Credentials |
336
339
| Data store | No | Identity of the job | Depends on type |
337
340
| Data asset | Yes | Data store's credentials in the workspace | Credentials |
338
-
| Data asset | No | Identity of the job + Managed identity of the compute cluster| Depends on store |
341
+
| Data asset | No | Identity of the job | Depends on store |
339
342
| Azure Blob Storage | Not apply | Identity of the job + Managed identity of the compute cluster | RBAC |
340
343
| Azure Data Lake Storage Gen1 | Not apply | Identity of the job + Managed identity of the compute cluster | POSIX |
341
344
| Azure Data Lake Storage Gen2 | Not apply | Identity of the job + Managed identity of the compute cluster | POSIX and RBAC |
@@ -344,3 +347,9 @@ The managed identity of the compute cluster is used for mounting and configuring
344
347
345
348
> [!NOTE]
346
349
> To assign an identity to the compute used by a batch deployment, follow the instructions at [Set up authentication between Azure ML and other services](../how-to-identity-based-service-authentication.md#compute-cluster). Configure the identity on the compute cluster associated with the deployment. Notice that all the jobs running on such compute are affected by this change. However, different deployments (even under the same deployment) can be configured to run under different clusters so you can administer the permissions accordingly depending on your requirements.
0 commit comments