You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/machine-learning/how-to-use-batch-endpoint.md
+14-21Lines changed: 14 additions & 21 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -468,11 +468,6 @@ A deployment is a set of resources required for hosting the model that does the
468
468
:::image type="content" source="./media/how-to-use-batch-endpoints-studio/review-batch-wizard.png" alt-text="Screenshot of batch endpoints/deployment review screen.":::
469
469
470
470
---
471
-
472
-
> [!NOTE]
473
-
> __How is work distributed?__:
474
-
>
475
-
> Batch deployments distribute work at the file level, which means that a folder containing 100 files with mini-batches of 10 files will generate 10 batches of 10 files each. Notice that this will happen regardless of the size of the files involved. If your files are too big to be processed in large mini-batches we suggest to either split the files in smaller files to achieve a higher level of parallelism or to decrease the number of files per mini-batch. At this moment, batch deployment can't account for skews in the file's size distribution.
476
471
477
472
1. Check batch endpoint and deployment details.
478
473
@@ -481,7 +476,6 @@ A deployment is a set of resources required for hosting the model that does the
481
476
Use `show` to check endpoint and deployment details. To check a batch deployment, run the following code:
@@ -505,7 +499,14 @@ A deployment is a set of resources required for hosting the model that does the
505
499
506
500
## Run batch endpoints and access results
507
501
508
-
Invoking a batch endpoint triggers a batch scoring job. A job `name` will be returned from the invoke response and can be used to track the batch scoring progress. The batch scoring job runs for some time. It splits the entire inputs into multiple `mini_batch` and processes in parallel on the compute cluster. The batch scoring job outputs will be stored in cloud storage, either in the workspace's default blob storage, or the storage you specified.
502
+
Invoking a batch endpoint triggers a batch scoring job. A job `name` will be returned from the invoke response and can be used to track the batch scoring progress. When running models for scoring in Batch Endpoints, you need to indicate the input data path where the endpoints should look for the data you want to score. Batch endpoints support reading files or folders that are located in different locations. To learn more about how the supported types and how to specify them read [Accessing data from batch endpoints jobs](how-to-access-data-batch-endpoints-jobs.md). The job outputs will be stored in cloud storage, either in the workspace's default blob storage, or the storage you specified.
503
+
504
+
> [!NOTE]
505
+
> __How is work distributed?__:
506
+
>
507
+
> Batch deployments distribute work at the file level, which means that a folder containing 100 files with mini-batches of 10 files will generate 10 batches of 10 files each. Notice that this will happen regardless of the size of the files involved. If your files are too big to be processed in large mini-batches we suggest to either split the files in smaller files to achieve a higher level of parallelism or to decrease the number of files per mini-batch. At this moment, batch deployment can't account for skews in the file's size distribution.
508
+
509
+
In this scenario, we are starting a new job over a sample data of the MNIST dataset stored in an Azure Storage Account:
> Local data folders/files can be used when executing batch endpoints from the Azure Machine Learning CLI or Azure Machine Learning SDK for Python. However, that operation will result in the local data to be uploaded to the default Azure Machine Learning Data Store of the workspace you are working on.
552
+
553
+
> [!IMPORTANT]
554
+
> __Deprecation notice__: Datasets of type `FileDataset` (V1) are deprecated and will be retired in the future. Existing batch endpoints relying on this functionality will continue to work but batch endpoints created with GA CLIv2 (2.4.0 and newer) or GA REST API (2022-05-01 and newer) will not support V1 dataset.
555
+
549
556
### Monitor batch job execution progress
550
557
551
558
Batch scoring jobs usually take some time to process the entire set of inputs.
@@ -604,20 +611,6 @@ Follow the following steps to view the scoring results in Azure Storage Explorer
604
611
605
612
:::image type="content" source="media/how-to-use-batch-endpoint/scoring-view.png" alt-text="Screenshot of the scoring output." lightbox="media/how-to-use-batch-endpoint/scoring-view.png":::
606
613
607
-
## Configure job's inputs, outputs, and execution
608
-
609
-
Batch Endpoints require only one data input, which is the data you want to score. However, you can indicate also the outputs, and some other parameters about the execution.
610
-
611
-
### Configure job's inputs
612
-
613
-
Batch endpoints support reading files or folders that are located in different locations. To learn more about how the supported types and how to specify them read [Accessing data from batch endpoints jobs](how-to-access-data-batch-endpoints-jobs.md).
614
-
615
-
> [!TIP]
616
-
> Local data folders/files can be used when executing batch endpoints from the Azure Machine Learning CLI or Azure Machine Learning SDK for Python. However, that operation will result in the local data to be uploaded to the default Azure Machine Learning Data Store of the workspace you are working on.
617
-
618
-
> [!IMPORTANT]
619
-
> __Deprecation notice__: Datasets of type `FileDataset` (V1) are deprecated and will be retired in the future. Existing batch endpoints relying on this functionality will continue to work but batch endpoints created with GA CLIv2 (2.4.0 and newer) or GA REST API (2022-05-01 and newer) will not support V1 dataset.
620
-
621
614
### Configure the output location
622
615
623
616
The batch scoring results are by default stored in the workspace's default blob store within a folder named by job name (a system-generated GUID). You can configure where to store the scoring outputs when you invoke the batch endpoint.
0 commit comments