Skip to content

Commit 6c670a1

Browse files
authored
Update how-to-use-batch-endpoint.md
1 parent 60aa5d1 commit 6c670a1

File tree

1 file changed

+14
-21
lines changed

1 file changed

+14
-21
lines changed

articles/machine-learning/how-to-use-batch-endpoint.md

Lines changed: 14 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -468,11 +468,6 @@ A deployment is a set of resources required for hosting the model that does the
468468
:::image type="content" source="./media/how-to-use-batch-endpoints-studio/review-batch-wizard.png" alt-text="Screenshot of batch endpoints/deployment review screen.":::
469469
470470
---
471-
472-
> [!NOTE]
473-
> __How is work distributed?__:
474-
>
475-
> Batch deployments distribute work at the file level, which means that a folder containing 100 files with mini-batches of 10 files will generate 10 batches of 10 files each. Notice that this will happen regardless of the size of the files involved. If your files are too big to be processed in large mini-batches we suggest to either split the files in smaller files to achieve a higher level of parallelism or to decrease the number of files per mini-batch. At this moment, batch deployment can't account for skews in the file's size distribution.
476471
477472
1. Check batch endpoint and deployment details.
478473
@@ -481,7 +476,6 @@ A deployment is a set of resources required for hosting the model that does the
481476
Use `show` to check endpoint and deployment details. To check a batch deployment, run the following code:
482477
483478
:::code language="azurecli" source="~/azureml-examples-main/cli/endpoints/batch/deploy-models/mnist-classifier/deploy-and-run.sh" ID="check_batch_deployment_detail" :::
484-
485479
486480
# [Python](#tab/python)
487481
@@ -505,7 +499,14 @@ A deployment is a set of resources required for hosting the model that does the
505499
506500
## Run batch endpoints and access results
507501
508-
Invoking a batch endpoint triggers a batch scoring job. A job `name` will be returned from the invoke response and can be used to track the batch scoring progress. The batch scoring job runs for some time. It splits the entire inputs into multiple `mini_batch` and processes in parallel on the compute cluster. The batch scoring job outputs will be stored in cloud storage, either in the workspace's default blob storage, or the storage you specified.
502+
Invoking a batch endpoint triggers a batch scoring job. A job `name` will be returned from the invoke response and can be used to track the batch scoring progress. When running models for scoring in Batch Endpoints, you need to indicate the input data path where the endpoints should look for the data you want to score. Batch endpoints support reading files or folders that are located in different locations. To learn more about how the supported types and how to specify them read [Accessing data from batch endpoints jobs](how-to-access-data-batch-endpoints-jobs.md). The job outputs will be stored in cloud storage, either in the workspace's default blob storage, or the storage you specified.
503+
504+
> [!NOTE]
505+
> __How is work distributed?__:
506+
>
507+
> Batch deployments distribute work at the file level, which means that a folder containing 100 files with mini-batches of 10 files will generate 10 batches of 10 files each. Notice that this will happen regardless of the size of the files involved. If your files are too big to be processed in large mini-batches we suggest to either split the files in smaller files to achieve a higher level of parallelism or to decrease the number of files per mini-batch. At this moment, batch deployment can't account for skews in the file's size distribution.
508+
509+
In this scenario, we are starting a new job over a sample data of the MNIST dataset stored in an Azure Storage Account:
509510
510511
# [Azure CLI](#tab/azure-cli)
511512
@@ -546,6 +547,12 @@ job = ml_client.batch_endpoints.invoke(
546547

547548
---
548549

550+
> [!TIP]
551+
> Local data folders/files can be used when executing batch endpoints from the Azure Machine Learning CLI or Azure Machine Learning SDK for Python. However, that operation will result in the local data to be uploaded to the default Azure Machine Learning Data Store of the workspace you are working on.
552+
553+
> [!IMPORTANT]
554+
> __Deprecation notice__: Datasets of type `FileDataset` (V1) are deprecated and will be retired in the future. Existing batch endpoints relying on this functionality will continue to work but batch endpoints created with GA CLIv2 (2.4.0 and newer) or GA REST API (2022-05-01 and newer) will not support V1 dataset.
555+
549556
### Monitor batch job execution progress
550557

551558
Batch scoring jobs usually take some time to process the entire set of inputs.
@@ -604,20 +611,6 @@ Follow the following steps to view the scoring results in Azure Storage Explorer
604611

605612
:::image type="content" source="media/how-to-use-batch-endpoint/scoring-view.png" alt-text="Screenshot of the scoring output." lightbox="media/how-to-use-batch-endpoint/scoring-view.png":::
606613

607-
## Configure job's inputs, outputs, and execution
608-
609-
Batch Endpoints require only one data input, which is the data you want to score. However, you can indicate also the outputs, and some other parameters about the execution.
610-
611-
### Configure job's inputs
612-
613-
Batch endpoints support reading files or folders that are located in different locations. To learn more about how the supported types and how to specify them read [Accessing data from batch endpoints jobs](how-to-access-data-batch-endpoints-jobs.md).
614-
615-
> [!TIP]
616-
> Local data folders/files can be used when executing batch endpoints from the Azure Machine Learning CLI or Azure Machine Learning SDK for Python. However, that operation will result in the local data to be uploaded to the default Azure Machine Learning Data Store of the workspace you are working on.
617-
618-
> [!IMPORTANT]
619-
> __Deprecation notice__: Datasets of type `FileDataset` (V1) are deprecated and will be retired in the future. Existing batch endpoints relying on this functionality will continue to work but batch endpoints created with GA CLIv2 (2.4.0 and newer) or GA REST API (2022-05-01 and newer) will not support V1 dataset.
620-
621614
### Configure the output location
622615

623616
The batch scoring results are by default stored in the workspace's default blob store within a folder named by job name (a system-generated GUID). You can configure where to store the scoring outputs when you invoke the batch endpoint.

0 commit comments

Comments
 (0)