You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/machine-learning/how-to-access-data-batch-endpoints-jobs.md
+99-99Lines changed: 99 additions & 99 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -25,15 +25,15 @@ Batch endpoints can be used to perform batch scoring on large amounts of data. S
25
25
26
26
Batch endpoints support reading files located in the following storage options:
27
27
28
-
* Azure Machine Learning Data Stores. The following stores are supported:
29
-
* Azure Blob Storage
30
-
* Azure Data Lake Storage Gen1
31
-
* Azure Data Lake Storage Gen2
32
-
* Azure Machine Learning Data Assets. The following types are supported:
28
+
*[Azure Machine Learning Data Assets](#input-data-from-a-data-asset). The following types are supported:
33
29
* Data assets of type Folder (`uri_folder`).
34
30
* Data assets of type File (`uri_file`).
35
31
* Datasets of type `FileDataset` (Deprecated).
36
-
* Azure Storage Accounts. The following storage containers are supported:
32
+
*[Azure Machine Learning Data Stores](#input-data-from-data-stores). The following stores are supported:
33
+
* Azure Blob Storage
34
+
* Azure Data Lake Storage Gen1
35
+
* Azure Data Lake Storage Gen2
36
+
*[Azure Storage Accounts](#input-data-from-azure-storage-accounts). The following storage containers are supported:
37
37
* Azure Data Lake Storage Gen1
38
38
* Azure Data Lake Storage Gen2
39
39
* Azure Blob Storage
@@ -45,54 +45,77 @@ Batch endpoints support reading files located in the following storage options:
45
45
> __Deprecation notice__: Datasets of type `FileDataset` (V1) are deprecated and will be retired in the future. Existing batch endpoints relying on this functionality will continue to work but batch endpoints created with GA CLIv2 (2.4.0 and newer) or GA REST API (2022-05-01 and newer) will not support V1 dataset.
46
46
47
47
48
-
## Reading data from data stores
48
+
## Input data from a data asset
49
49
50
-
Data from Azure Machine Learning registered data stores can be directly referenced by batch deployments jobs. In this example, we're going to first upload some data to the default data store in the Azure Machine Learning workspace and then run a batch deployment on it. Follow these steps to run a batch endpoint job using data stored in a data store:
50
+
Azure Machine Learning data assets (formerly known as datasets) are supported as inputs for jobs. Follow these steps to run a batch endpoint job using data stored in a registered data asset in Azure Machine Learning:
51
51
52
-
1. Let's get access to the default data store in the Azure Machine Learning workspace. If your data is in a different store, you can use that store instead. There's no requirement of using the default data store.
52
+
> [!WARNING]
53
+
> Data assets of type Table (`MLTable`) aren't currently supported.
53
54
54
-
# [Azure CLI](#tab/cli)
55
+
1. Let's create the data asset first. This data asset consists of a folder with multiple CSV files that we want to process in parallel using batch endpoints. You can skip this step is your data is already registered as a data asset.
55
56
56
-
```azurecli
57
-
DATASTORE_ID=$(az ml datastore show -n workspaceblobstore | jq -r '.id')
description: An unlabeled dataset for heart classification.
66
+
type: uri_folder
67
+
path: heart-classifier-mlflow/data
58
68
```
59
-
60
-
> [!NOTE]
61
-
> Data stores ID would look like `/subscriptions/<subscription>/resourceGroups/<resource-group>/providers/Microsoft.MachineLearningServices/workspaces/<workspace>/datastores/<data-store>`.
62
-
69
+
70
+
Then, create the data asset:
71
+
72
+
```bash
73
+
az ml data create -f heart-dataset-unlabeled.yml
74
+
```
75
+
63
76
# [Python](#tab/sdk)
64
-
77
+
65
78
```python
66
-
default_ds = ml_client.datastores.get_default()
79
+
data_path = "heart-classifier-mlflow/data"
80
+
dataset_name = "heart-dataset-unlabeled"
81
+
82
+
heart_dataset_unlabeled = Data(
83
+
path=data_path,
84
+
type=AssetTypes.URI_FOLDER,
85
+
description="An unlabeled dataset for heart classification",
Use the Azure ML CLI, Azure ML SDK for Python, or Studio to get the data store information.
72
-
73
-
---
74
-
75
-
> [!TIP]
76
-
> The default blob data store in a workspace is called __workspaceblobstore__. You can skip this step if you already know the resource ID of the default data store in your workspace.
104
+
Use the Azure ML CLI, Azure ML SDK for Python, or Studio to get the location (region), workspace, and data asset name and version. You will need them later.
77
105
78
-
1. We'll need to upload some sample data to it. This example assumes you've uploaded the sample data included in the repo in the folder `sdk/python/endpoints/batch/heart-classifier/data` in the folder `heart-classifier/data` in the blob storage account. Ensure you have done that before moving forward.
79
106
80
107
1. Create a data input:
81
108
82
109
# [Azure CLI](#tab/cli)
83
110
84
-
Let's place the file path in the following variable:
85
-
86
111
```azurecli
87
-
DATA_PATH="heart-disease-uci-unlabeled"
88
-
INPUT_PATH="$DATASTORE_ID/paths/$DATA_PATH"
112
+
DATASET_ID=$(az ml data show -n heart-dataset-unlabeled --label latest --query id)
> See how the path `paths` is appended to the resource id of the data store to indicate that what follows is a path inside of it.
140
+
> Data assets ID would look like `/subscriptions/<subscription>/resourcegroups/<resource-group>/providers/Microsoft.MachineLearningServices/workspaces/<workspace>/data/<data-asset>/versions/<version>`.
118
141
119
-
> [!TIP]
120
-
> You can also use `azureml://datastores/<data-store>/paths/<data-path>` as a way to indicate the input.
121
142
122
143
1. Run the deployment:
123
144
124
145
# [Azure CLI](#tab/cli)
125
146
126
147
```bash
127
-
INVOKE_RESPONSE = $(az ml batch-endpoint invoke --name $ENDPOINT_NAME --input $INPUT_PATH)
148
+
INVOKE_RESPONSE = $(az ml batch-endpoint invoke --name $ENDPOINT_NAME --input $DATASET_ID)
128
149
```
129
-
150
+
151
+
> [!TIP]
152
+
> You can also use `--input azureml:/<dataasset_name>@latest` as a way to indicate the input.
153
+
130
154
# [Python](#tab/sdk)
131
155
132
156
```python
@@ -136,9 +160,9 @@ Data from Azure Machine Learning registered data stores can be directly referenc
136
160
)
137
161
```
138
162
139
-
# [REST](#tab/rest)
163
+
# [REST](#tab/rest)
140
164
141
-
__Request__
165
+
__Request__
142
166
143
167
```http
144
168
POST jobs HTTP/1.1
@@ -147,77 +171,54 @@ Data from Azure Machine Learning registered data stores can be directly referenc
147
171
Content-Type: application/json
148
172
```
149
173
150
-
## Reading data from a data asset
174
+
## Input data from data stores
151
175
152
-
Azure Machine Learning data assets (formerly known as datasets) are supported as inputs for jobs. Follow these steps to run a batch endpoint job using data stored in a registered data asset in Azure Machine Learning:
153
-
154
-
> [!WARNING]
155
-
> Data assets of type Table (`MLTable`) aren't currently supported.
176
+
Data from Azure Machine Learning registered data stores can be directly referenced by batch deployments jobs. In this example, we're going to first upload some data to the default data store in the Azure Machine Learning workspace and then run a batch deployment on it. Follow these steps to run a batch endpoint job using data stored in a data store:
156
177
157
-
1. Let's create the data asset first. This data asset consists of a folder with multiple CSV files that we want to process in parallel using batch endpoints. You can skip this step is your data is already registered as a data asset.
178
+
1. Let's get access to the default data store in the Azure Machine Learning workspace. If your data is in a different store, you can use that store instead. There's no requirement of using the default data store.
DATASTORE_ID=$(az ml datastore show -n workspaceblobstore | jq -r '.id')
196
184
```
197
185
198
-
To get the newly created data asset, use:
199
-
186
+
> [!NOTE]
187
+
> Data stores ID would look like `/subscriptions/<subscription>/resourceGroups/<resource-group>/providers/Microsoft.MachineLearningServices/workspaces/<workspace>/datastores/<data-store>`.
Use the Azure ML CLI, Azure ML SDK for Python, or Studio to get the location (region), workspace, and data asset name and version. You will need them later.
197
+
Use the Azure ML CLI, Azure ML SDK for Python, or Studio to get the data store information.
198
+
199
+
---
200
+
201
+
> [!TIP]
202
+
> The default blob data store in a workspace is called __workspaceblobstore__. You can skip this step if you already know the resource ID of the default data store in your workspace.
207
203
204
+
1. We'll need to upload some sample data to it. This example assumes you've uploaded the sample data included in the repo in the folder `sdk/python/endpoints/batch/heart-classifier/data` in the folder `heart-classifier/data` in the blob storage account. Ensure you have done that before moving forward.
208
205
209
206
1. Create a data input:
210
207
211
208
# [Azure CLI](#tab/cli)
212
209
210
+
Let's place the file path in the following variable:
211
+
213
212
```azurecli
214
-
DATASET_ID=$(az ml data show -n heart-dataset-unlabeled --label latest --query id)
> Data assets ID would look like `/subscriptions/<subscription>/resourcegroups/<resource-group>/providers/Microsoft.MachineLearningServices/workspaces/<workspace>/data/<data-asset>/versions/<version>`.
243
+
> See how the path `paths` is appended to the resource id of the data store to indicate that what follows is a path inside of it.
243
244
245
+
> [!TIP]
246
+
> You can also use `azureml://datastores/<data-store>/paths/<data-path>` as a way to indicate the input.
244
247
245
248
1. Run the deployment:
246
249
247
250
# [Azure CLI](#tab/cli)
248
251
249
252
```bash
250
-
INVOKE_RESPONSE = $(az ml batch-endpoint invoke --name $ENDPOINT_NAME --input $DATASET_ID)
253
+
INVOKE_RESPONSE = $(az ml batch-endpoint invoke --name $ENDPOINT_NAME --input $INPUT_PATH)
251
254
```
252
-
253
-
> [!TIP]
254
-
> You can also use `--input azureml:/<dataasset_name>@latest` as a way to indicate the input.
255
-
255
+
256
256
# [Python](#tab/sdk)
257
257
258
258
```python
@@ -262,9 +262,9 @@ Azure Machine Learning data assets (formerly known as datasets) are supported as
262
262
)
263
263
```
264
264
265
-
# [REST](#tab/rest)
265
+
# [REST](#tab/rest)
266
266
267
-
__Request__
267
+
__Request__
268
268
269
269
```http
270
270
POST jobs HTTP/1.1
@@ -273,7 +273,7 @@ Azure Machine Learning data assets (formerly known as datasets) are supported as
273
273
Content-Type: application/json
274
274
```
275
275
276
-
## Reading data from Azure Storage Accounts
276
+
## Input data from Azure Storage Accounts
277
277
278
278
Azure Machine Learning batch endpoints can read data from cloud locations in Azure Storage Accounts, both public and private. Use the following steps to run a batch endpoint job using data stored in a storage account:
0 commit comments