Skip to content

Commit ced23e7

Browse files
authored
Merge pull request #199057 from Blackmist/batch-day-of
Batch day of //Build update
2 parents ba96f79 + 067905d commit ced23e7

File tree

3 files changed

+222
-45
lines changed

3 files changed

+222
-45
lines changed

articles/machine-learning/concept-endpoints.md

Lines changed: 11 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ ms.author: seramasu
1010
author: rsethur
1111
ms.reviewer: larryfr
1212
ms.custom: devplatv2, ignite-fall-2021, event-tier1-build-2022
13-
ms.date: 04/29/2022
13+
ms.date: 05/24/2022
1414
#Customer intent: As an MLOps administrator, I want to understand what a managed endpoint is and why I need it.
1515
---
1616

@@ -89,7 +89,7 @@ Traffic allocation can be used to do safe rollout blue/green deployments by bala
8989

9090
:::image type="content" source="media/concept-endpoints/endpoint-concept.png" alt-text="Diagram showing an endpoint splitting traffic to two deployments.":::
9191

92-
Traffic to one deployment can also be mirrored (copied) to another deployment. Mirroring is useful when you want to test for things like response latency or error conditions without impacting live clients. For example, a blue/green deployment where 100% of the traffic is routed to blue and a 10% is mirrored to green. With mirroring, the results of the traffic to the green deployment aren't returned to the clients but metrics and logs are collected. Mirror traffic functionality is a __preview__ feature.
92+
Traffic to one deployment can also be mirrored (copied) to another deployment. Mirroring is useful when you want to test for things like response latency or error conditions without impacting live clients. For example, a blue/green deployment where 100% of the traffic is routed to blue and a 10% is mirrored to the green deployment. With mirroring, the results of the traffic to the green deployment aren't returned to the clients but metrics and logs are collected. Mirror traffic functionality is a __preview__ feature.
9393

9494
:::image type="content" source="media/concept-endpoints/endpoint-concept-mirror.png" alt-text="Diagram showing an endpoint mirroring traffic to a deployment.":::
9595

@@ -204,13 +204,15 @@ You can [override compute resource settings](how-to-use-batch-endpoint.md#config
204204

205205
You can use the following options for input data when invoking a batch endpoint:
206206

207-
- Azure Machine Learning registered datasets - for more information, see [Create Azure Machine Learning datasets](how-to-train-with-datasets.md), which uses SDK v1.
207+
- Cloud data - Either a path on Azure Machine Learning registered datastore, a reference to Azure Machine Learning registered V2 data asset, or a public URI. For more information, see [Connect to data with the Azure Machine Learning studio](how-to-connect-data-ui.md)
208+
- Data stored locally - it will be automatically uploaded to the Azure ML registered datastore and passed to the batch endpoint.
208209

209-
> [!NOTE]
210-
> Currently V1 FileDataset is supported for batch endpoint, and we will enable V2 data assets in the future. For more information on V2 data assets, see [Work with data using SDK v2 (preview)](how-to-use-data.md). For more information on the new V2 experience, see [What is v2](concept-v2.md).
210+
> [!NOTE]
211+
> - If you are using existing V1 FileDataset for batch endpoint, we recommend migrating them to V2 data assets and refer to them directly when invoking batch endpoints. Currently only data assets of type `uri_folder` or `uri_file` are supported. Batch endpoints created with GA CLIv2 (2.4.0 and newer) or GA REST API (2022-05-01 and newer) will not support V1 Dataset.
212+
> - You can also extract the URI or path on datastore extracted from V1 FileDataset by using `az ml dataset show` command with `--query` parameter and use that information for invoke.
213+
> - While Batch endpoints created with earlier APIs will continue to support V1 FileDataset, we will be adding further V2 data assets support with the latest API versions for even more usability and flexibility. For more information on V2 data assets, see [Work with data using SDK v2 (preview)](how-to-use-data.md). For more information on the new V2 experience, see [What is v2](concept-v2.md).
211214
212-
- Cloud data - Either a public data URI or data path in datastore. For more information, see [Connect to data with the Azure Machine Learning studio](how-to-connect-data-ui.md)
213-
- Data stored locally
215+
For more information on supported input options, see [Batch scoring with batch endpoint](how-to-use-batch-endpoint.md#invoke-the-batch-endpoint-with-different-input-options).
214216

215217
For more information on supported input options, see [Batch scoring with batch endpoint](how-to-use-batch-endpoint.md#invoke-the-batch-endpoint-with-different-input-options).
216218

@@ -219,7 +221,8 @@ Specify the storage output location to any datastore and path. By default, batch
219221
### Security
220222

221223
- Authentication: Azure Active Directory Tokens
222-
- SSL by default for endpoint invocation
224+
- SSL: enabled by default for endpoint invocation
225+
- VNET support: Batch endpoints support ingress protection. A batch endpoint with ingress protection will accept scoring requests only from hosts inside a virtual network but not from the public internet. A batch endpoint that is created in a private-link enabled workspace will have ingress protection. To create a private-link enabled workspace, see [Create a secure workspace](tutorial-create-secure-workspace.md).
223226

224227
## Next steps
225228

articles/machine-learning/how-to-deploy-batch-with-rest.md

Lines changed: 187 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
---
2-
title: "Deploy models using batch endpoints with REST APIs (preview)"
2+
title: "Deploy models using batch endpoints with REST APIs"
33
titleSuffix: Azure Machine Learning
44
description: Learn how to deploy models using batch endpoints with REST APIs.
55
services: machine-learning
@@ -8,16 +8,16 @@ ms.subservice: core
88
ms.topic: how-to
99
author: dem108
1010
ms.author: sehan
11-
ms.date: 04/29/2022
12-
ms.reviewer: nibaccam
11+
ms.date: 05/24/2022
12+
ms.reviewer: larryfr
1313
ms.custom: devplatv2, event-tier1-build-2022
1414
---
1515

16-
# Deploy models with REST (preview) for batch scoring
16+
# Deploy models with REST for batch scoring
1717

1818
[!INCLUDE [cli v2](../../includes/machine-learning-cli-v2.md)]
1919

20-
Learn how to use the Azure Machine Learning REST API to deploy models for batch scoring (preview).
20+
Learn how to use the Azure Machine Learning REST API to deploy models for batch scoring.
2121

2222

2323

@@ -55,7 +55,7 @@ In this article, you learn how to use the new REST APIs to:
5555

5656
[Batch endpoints](concept-endpoints.md#what-are-batch-endpoints) simplify the process of hosting your models for batch scoring, so you can focus on machine learning, not infrastructure. In this article, you'll create a batch endpoint and deployment, and invoking it to start a batch scoring job. But first you'll have to register the assets needed for deployment, including model, code, and environment.
5757

58-
There are many ways to create an Azure Machine Learning batch endpoint, [including the Azure CLI](how-to-use-batch-endpoint.md), and visually with [the studio](how-to-use-batch-endpoints-studio.md). The following example creates a batch endpoint and deployment with the REST API.
58+
There are many ways to create an Azure Machine Learning batch endpoint, including [the Azure CLI](how-to-use-batch-endpoint.md), and visually with [the studio](how-to-use-batch-endpoints-studio.md). The following example creates a batch endpoint and a batch deployment with the REST API.
5959

6060
## Create machine learning assets
6161

@@ -104,7 +104,7 @@ Once you upload your code, you can specify your code with a PUT request:
104104

105105
### Upload and register model
106106

107-
Similar to the code, Upload the model files:
107+
Similar to the code, upload the model files:
108108

109109
:::code language="rest-api" source="~/azureml-examples-main/cli/batch-score-rest.sh" id="upload_model":::
110110

@@ -125,7 +125,7 @@ Now, run the following snippet to create an environment:
125125

126126
## Deploy with batch endpoints
127127

128-
Next, create the batch endpoint, a deployment, and set the default deployment.
128+
Next, create a batch endpoint, a batch deployment, and set the default deployment for the endpoint.
129129

130130
### Create batch endpoint
131131

@@ -151,6 +151,8 @@ Invoking a batch endpoint triggers a batch scoring job. A job `id` is returned i
151151

152152
### Invoke the batch endpoint to start a batch scoring job
153153

154+
#### Getting the Scoring URI and access token
155+
154156
Get the scoring uri and access token to invoke the batch endpoint. First get the scoring uri:
155157

156158
:::code language="rest-api" source="~/azureml-examples-main/cli/batch-score-rest.sh" id="get_endpoint":::
@@ -159,26 +161,192 @@ Get the batch endpoint access token:
159161

160162
:::code language="rest-api" source="~/azureml-examples-main/cli/batch-score-rest.sh" id="get_access_token":::
161163

162-
Now, invoke the batch endpoint to start a batch scoring job. The following example scores data publicly available in the cloud:
164+
#### Invoke the batch endpoint with different input options
163165

164-
:::code language="rest-api" source="~/azureml-examples-main/cli/batch-score-rest.sh" id="score_endpoint_with_data_in_cloud":::
166+
It's time to invoke the batch endpoint to start a batch scoring job. If your data is a folder (potentially with multiple files) publicly available from the web, you can use the following snippet:
165167

166-
If your data is stored in an Azure Machine Learning registered datastore, you can invoke the batch endpoint with a dataset. The following code creates a new dataset:
168+
```rest-api
169+
response=$(curl --location --request POST $SCORING_URI \
170+
--header "Authorization: Bearer $SCORING_TOKEN" \
171+
--header "Content-Type: application/json" \
172+
--data-raw "{
173+
\"properties\": {
174+
\"InputData\": {
175+
\"mnistinput\": {
176+
\"JobInputType\" : \"UriFolder\",
177+
\"Uri\": \"https://pipelinedata.blob.core.windows.net/sampledata/mnist\"
178+
}
179+
}
180+
}
181+
}")
167182
168-
:::code language="rest-api" source="~/azureml-examples-main/cli/batch-score-rest.sh" id="create_dataset":::
183+
JOB_ID=$(echo $response | jq -r '.id')
184+
JOB_ID_SUFFIX=$(echo ${JOB_ID##/*/})
185+
```
169186

170-
Next, reference the dataset when invoking the batch endpoint:
187+
Now, let's look at other options for invoking the batch endpoint. When it comes to input data, there are multiple scenarios you can choose from, depending on the input type (whether you are specifying a folder or a single file), and the URI type (whether you are using a path on Azure Machine Learning registered datastore, a reference to Azure Machine Learning registered V2 data asset, or a public URI).
171188

172-
:::code language="rest-api" source="~/azureml-examples-main/cli/batch-score-rest.sh" id="score_endpoint_with_dataset":::
189+
- An `InputData` property has `JobInputType` and `Uri` keys. When you are specifying a single file, use `"JobInputType": "UriFile"`, and when you are specifying a folder, use `'JobInputType": "UriFolder"`.
173190

174-
In the previous code snippet, a custom output location is provided by using `datastoreId`, `path`, and `outputFileName`. These settings allow you to configure where to store the batch scoring results.
191+
- When the file or folder is on Azure ML registered datastore, the syntax for the `Uri` is `azureml://datastores/<datastore-name>/paths/<path-on-datastore>` for folder, and `azureml://datastores/<datastore-name>/paths/<path-on-datastore>/<file-name>` for a specific file. You can also use the longer form to represent the same path, such as `azureml://subscriptions/<subscription_id>/resourceGroups/<resource-group-name>/workspaces/<workspace-name>/datastores/<datastore-name>/paths/<path-on-datastore>/`.
175192

176-
> [!IMPORTANT]
177-
> You must provide a unique output location. If the output file already exists, the batch scoring job will fail.
193+
- When the file or folder is registered as V2 data asset as `uri_folder` or `uri_file`, the syntax for the `Uri` is `\"azureml://data/<data-name>/versions/<data-version>/\"` (short form) or `\"azureml://subscriptions/<subscription_id>/resourceGroups/<resource-group-name>/workspaces/<workspace-name>/data/<data-name>/versions/<data-version>/\"` (long form).
178194

179-
For this example, the output is stored in the default blob storage for the workspace. The folder name is the same as the endpoint name, and the file name is randomly generated by the following code:
195+
- When the file or folder is a publicly accessible path, the syntax for the URI is `https://<public-path>` for folder, `https://<public-path>/<file-name>` for a specific file.
196+
197+
> [!NOTE]
198+
> For more information about data URI, see [Azure Machine Learning data reference URI](reference-yaml-core-syntax.md#azure-ml-data-reference-uri).
199+
200+
Below are some examples using different types of input data.
201+
202+
- If your data is a folder on the Azure ML registered datastore, you can either:
203+
204+
- Use the short form to represent the URI:
205+
206+
```rest-api
207+
response=$(curl --location --request POST $SCORING_URI \
208+
--header "Authorization: Bearer $SCORING_TOKEN" \
209+
--header "Content-Type: application/json" \
210+
--data-raw "{
211+
\"properties\": {
212+
\"InputData\": {
213+
\"mnistInput\": {
214+
\"JobInputType\" : \"UriFolder\",
215+
\"Uri": \"azureml://datastores/workspaceblobstore/paths/$ENDPOINT_NAME/mnist\"
216+
}
217+
}
218+
}
219+
}")
220+
221+
JOB_ID=$(echo $response | jq -r '.id')
222+
JOB_ID_SUFFIX=$(echo ${JOB_ID##/*/})
223+
```
224+
225+
- Or use the long form for the same URI:
226+
227+
```rest-api
228+
response=$(curl --location --request POST $SCORING_URI \
229+
--header "Authorization: Bearer $SCORING_TOKEN" \
230+
--header "Content-Type: application/json" \
231+
--data-raw "{
232+
\"properties\": {
233+
\"InputData\": {
234+
\"mnistinput\": {
235+
\"JobInputType\" : \"UriFolder\",
236+
\"Uri\": \"azureml://subscriptions/$SUBSCRIPTION_ID/resourceGroups/$RESOURCE_GROUP/workspaces/$WORKSPACE/datastores/workspaceblobstore/paths/$ENDPOINT_NAME/mnist\"
237+
}
238+
}
239+
}
240+
}")
241+
242+
JOB_ID=$(echo $response | jq -r '.id')
243+
JOB_ID_SUFFIX=$(echo ${JOB_ID##/*/})
244+
```
245+
246+
- If you want to manage your data as Azure ML registered V2 data asset as `uri_folder`, you can follow the two steps below:
247+
248+
1. Create the V2 data asset:
249+
250+
```rest-api
251+
DATA_NAME="mnist"
252+
DATA_VERSION=$RANDOM
253+
254+
response=$(curl --location --request PUT https://management.azure.com/subscriptions/$SUBSCRIPTION_ID/resourceGroups/$RESOURCE_GROUP/providers/Microsoft.MachineLearningServices/workspaces/$WORKSPACE/data/$DATA_NAME/versions/$DATA_VERSION?api-version=$API_VERSION \
255+
--header "Content-Type: application/json" \
256+
--header "Authorization: Bearer $TOKEN" \
257+
--data-raw "{
258+
\"properties\": {
259+
\"dataType\": \"uri_folder\",
260+
\"dataUri\": \"https://pipelinedata.blob.core.windows.net/sampledata/mnist\",
261+
\"description\": \"Mnist data asset\"
262+
}
263+
}")
264+
```
265+
266+
2. Reference the data asset in the batch scoring job:
267+
268+
```rest-api
269+
response=$(curl --location --request POST $SCORING_URI \
270+
--header "Authorization: Bearer $SCORING_TOKEN" \
271+
--header "Content-Type: application/json" \
272+
--data-raw "{
273+
\"properties\": {
274+
\"InputData\": {
275+
\"mnistInput\": {
276+
\"JobInputType\" : \"UriFolder\",
277+
\"Uri": \"azureml://data/$DATA_NAME/versions/$DATA_VERSION/\"
278+
}
279+
}
280+
}
281+
}")
282+
283+
JOB_ID=$(echo $response | jq -r '.id')
284+
JOB_ID_SUFFIX=$(echo ${JOB_ID##/*/})
285+
```
286+
287+
- If your data is a single file publicly available from the web, you can use the following snippet:
288+
289+
```rest-api
290+
response=$(curl --location --request POST $SCORING_URI \
291+
--header "Authorization: Bearer $SCORING_TOKEN" \
292+
--header "Content-Type: application/json" \
293+
--data-raw "{
294+
\"properties\": {
295+
\"InputData\": {
296+
\"mnistInput\": {
297+
\"JobInputType\" : \"UriFile\",
298+
\"Uri": \"https://pipelinedata.blob.core.windows.net/sampledata/mnist/0.png\"
299+
}
300+
}
301+
}
302+
}")
303+
304+
JOB_ID=$(echo $response | jq -r '.id')
305+
JOB_ID_SUFFIX=$(echo ${JOB_ID##/*/})
306+
```
180307
181-
:::code language="azurecli" source="~/azureml-examples-main/cli/batch-score-rest.sh" ID="unique_output" :::
308+
> [!NOTE]
309+
> We strongly recommend using the latest REST API version for batch scoring.
310+
> - If you want to use local data, you can upload it to Azure Machine Learning registered datastore and use REST API for Cloud data.
311+
> - If you are using existing V1 FileDataset for batch endpoint, we recommend migrating them to V2 data assets and refer to them directly when invoking batch endpoints. Currently only data assets of type `uri_folder` or `uri_file` are supported. Batch endpoints created with GA CLIv2 (2.4.0 and newer) or GA REST API (2022-05-01 and newer) will not support V1 Dataset.
312+
> - You can also extract the URI or path on datastore extracted from V1 FileDataset by using `az ml dataset show` command with `--query` parameter and use that information for invoke.
313+
> - While Batch endpoints created with earlier APIs will continue to support V1 FileDataset, we will be adding further V2 data assets support with the latest API versions for even more usability and flexibility. For more information on V2 data assets, see [Work with data using SDK v2 (preview)](how-to-use-data.md). For more information on the new V2 experience, see [What is v2](concept-v2.md).
314+
315+
#### Configure the output location and overwrite settings
316+
317+
The batch scoring results are by default stored in the workspace's default blob store within a folder named by job name (a system-generated GUID). You can configure where to store the scoring outputs when you invoke the batch endpoint. Use `OutputData` to configure the output file path on an Azure Machine Learning registered datastore. `OutputData` has `JobOutputType` and `Uri` keys. `UriFile` is the only supported value for `JobOutputType`. The syntax for `Uri` is the same as that of `InputData`, i.e., `azureml://datastores/<datastore-name>/paths/<path-on-datastore>/<file-name>`.
318+
319+
Following is the example snippet for configuring the output location for the batch scoring results.
320+
321+
```rest-api
322+
response=$(curl --location --request POST $SCORING_URI \
323+
--header "Authorization: Bearer $SCORING_TOKEN" \
324+
--header "Content-Type: application/json" \
325+
--data-raw "{
326+
\"properties\": {
327+
\"InputData\":
328+
{
329+
\"mnistInput\": {
330+
\"JobInputType\" : \"UriFolder\",
331+
\"Uri": \"azureml://datastores/workspaceblobstore/paths/$ENDPOINT_NAME/mnist\"
332+
}
333+
},
334+
\"OutputData\":
335+
{
336+
\"mnistOutput\": {
337+
\"JobOutputType\": \"UriFile\",
338+
\"Uri\": \"azureml://datastores/workspaceblobstore/paths/$ENDPOINT_NAME/mnistOutput/$OUTPUT_FILE_NAME\"
339+
}
340+
}
341+
}
342+
}")
343+
344+
JOB_ID=$(echo $response | jq -r '.id')
345+
JOB_ID_SUFFIX=$(echo ${JOB_ID##/*/})
346+
```
347+
348+
> [!IMPORTANT]
349+
> You must use a unique output location. If the output file exists, the batch scoring job will fail.
182350
183351
### Check the batch scoring job
184352

0 commit comments

Comments
 (0)