You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/machine-learning/how-to-use-batch-model-deployments.md
+73-4Lines changed: 73 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -33,6 +33,30 @@ Use batch endpoints for model deployment when:
33
33
34
34
In this article, you use a batch endpoint to deploy a machine learning model that solves the classic MNIST (Modified National Institute of Standards and Technology) digit recognition problem. Your deployed model then performs batch inferencing over large amounts of data—in this case, image files. You begin by creating a batch deployment of a model that was created using Torch. This deployment becomes the default one in the endpoint. Later, you [create a second deployment](#add-deployments-to-an-endpoint) of a mode that was created with TensorFlow (Keras), test the second deployment, and then set it as the endpoint's default deployment.
35
35
36
+
## Quick reference: Inputs, outputs, and configuration options
37
+
38
+
Before diving into the details, here's a quick reference to help you understand the key concepts:
39
+
40
+
### Data flow overview
41
+
-__Input data__: Files or folders in Azure Storage (blob storage, data lake, or registered datasets)
42
+
-__Processing__: Your model processes data in configurable mini-batches
|`mini_batch_size`| Files per batch (affects memory usage) | 10-50 for large files, 100-500 for small files |
50
+
|`instance_count`| Number of compute instances | 1-10 depending on data volume |
51
+
|`max_concurrency_per_instance`| Parallel processes per instance | 1 for memory-intensive models, 2-4 for lightweight models |
52
+
|`timeout`| Max time per mini-batch (seconds) | 30-300 depending on model complexity |
53
+
|`output_action`| How to organize results |`append_row` (combine all results) or `summary_only`|
54
+
55
+
### Common parameters for job invocation
56
+
-__Azure CLI__: Use `--input`, `--output-path`, `--set` for overrides
57
+
-__Python SDK__: Use `Input()` for data, `params_override` for settings
58
+
-__Studio__: Use the web interface to configure inputs, outputs, and deployment settings
59
+
36
60
To follow along with the code samples and files needed to run the commands in this article locally, see the __[Clone the examples repository](#clone-the-examples-repository)__ section. The code samples and files are contained in the [azureml-examples](https://github.com/azure/azureml-examples) repository.
37
61
38
62
## Prerequisites
@@ -448,6 +472,21 @@ A model deployment is a set of resources required for hosting the model that doe
448
472
449
473
## Run batch endpoints and access results
450
474
475
+
### Understanding the data flow
476
+
477
+
Before running your batch endpoint, it's important to understand how data flows through the system:
478
+
479
+
__Inputs__: Data you want to process (score). This can be:
480
+
- Files stored in Azure Storage (blob storage, data lake)
481
+
- Folders containing multiple files
482
+
- Registered datasets in Azure Machine Learning
483
+
484
+
__Processing__: Your deployed model processes the input data in batches (mini-batches) and generates predictions.
485
+
486
+
__Outputs__: Results from your model, stored as files in Azure Storage. By default, outputs are saved to your workspace's default blob storage, but you can specify a different location.
487
+
488
+
### Invoke a batch endpoint
489
+
451
490
Invoking a batch endpoint triggers a batch scoring job. The job `name`is returned from the invoke response and can be used to track the batch scoring progress. When running models for scoring in batch endpoints, you need to specify the path to the input data so that the endpoints can find the data you want to score. The following example shows how to start a new job over a sample data of the MNIST dataset stored in an Azure Storage Account.
452
491
453
492
You can run and invoke a batch endpoint using Azure CLI, Azure Machine Learning SDK, orREST endpoints. For more details about these options, see [Create jobs andinput data for batch endpoints](how-to-access-data-batch-endpoints-jobs.md).
@@ -571,6 +610,8 @@ Use `output-path` to configure any folder in an Azure Machine Learning registere
571
610
572
611
# [Python](#tab/python)
573
612
613
+
**Understanding `params_override`**: The `params_override` parameter allows you to modify deployment settings for a specific job without changing the deployment configuration permanently. This is useful for adjusting settings like output location, mini-batch size, or instance count for individual jobs.
614
+
574
615
Use `params_override` to configure any folder in an Azure Machine Learning registered data store. Only registered data stores are supported as output paths. In this example you use the default data store:
> Use `params_override` when you need different settings for different jobs without modifying your deployment. This is especially useful for handling varying data sizes or experimenting with performance settings.
640
+
582
641
# [Studio](#tab/azure-studio)
583
642
584
643
1. Navigate to the __Endpoints__ tab on the side menu.
@@ -621,11 +680,21 @@ Once you've identified the data store you want to use, configure the output as f
621
680
622
681
## Overwrite deployment configuration for each job
623
682
624
-
When you invoke a batch endpoint, some settings can be overwritten to make best use of the compute resources and to improve performance. The following settings can be configured on a per-job basis:
683
+
When you invoke a batch endpoint, some settings can be overwritten to make best use of the compute resources and to improve performance. This is useful when you need different settings for different jobs without modifying the deployment permanently.
684
+
685
+
### Which settings can be overridden?
686
+
687
+
The following settings can be configured on a per-job basis:
688
+
689
+
| Setting | When to use | Example scenario |
690
+
|---------|-------------|-------------------|
691
+
| __Instance count__ | When you have varying data volumes | Use more instances for larger datasets (e.g., 10 instances for1M files vs 2 instances for100K files) |
692
+
| __Mini-batch size__ | When you need to balance throughput and memory | Smaller batches (10-50 files) for large images, larger batches (100-500 files) for small text files |
693
+
| __Max retries__ | When data quality varies | Higher retries (5-10) for noisy data, lower retries (1-3) for clean data |
694
+
| __Timeout__ | When processing time varies by data type| Longer timeout (300s) forcomplex models, shorter timeout (30s) for simple models |
695
+
| __Error threshold__ | When you need different failure tolerance | Strict threshold (-1) for critical jobs, lenient threshold (10%) for experimental jobs |
625
696
626
-
* __Instance count__: use this setting to overwrite the number of instances to request from the compute cluster. For example, for larger volume of data inputs, you might want to use more instances to speed up the end to end batch scoring.
627
-
* __Mini-batch size__: use this setting to overwrite the number of files to include in each mini-batch. The number of mini batches is decided by the total inputfile counts and mini-batch size. A smaller mini-batch size generates more mini batches. Mini batches can be run in parallel, but there might be extra scheduling and invocation overhead.
628
-
* Other settings, such as __max retries__, __timeout__, and __error threshold__ can be overwritten. These settings might impact the end-to-end batch scoring time for different workloads.
0 commit comments