Skip to content

Commit 9375f3c

Browse files
authored
Update how-to-batch-scoring-script.md
1 parent b87994b commit 9375f3c

File tree

1 file changed

+15
-3
lines changed

1 file changed

+15
-3
lines changed

articles/machine-learning/how-to-batch-scoring-script.md

Lines changed: 15 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -66,14 +66,16 @@ def run(mini_batch):
6666
The method receives a list of file paths as a parameter (`mini_batch`). You can use this list to either iterate over each file and process it one by one, or to read the entire batch and process it at once. The best option will depend on your compute memory and the throughput you need to achieve. For an example of how to read entire batches of data at once see [High throughput deployments](how-to-image-processing-batch.md#high-throughput-deployments).
6767

6868
> [!NOTE]
69-
> __How is work distributed?__:
69+
> __How is work distributed?__
7070
>
7171
> Batch deployments distribute work at the file level, which means that a folder containing 100 files with mini-batches of 10 files will generate 10 batches of 10 files each. Notice that this will happen regardless of the size of the files involved. If your files are too big to be processed in large mini-batches we suggest to either split the files in smaller files to achieve a higher level of parallelism or to decrease the number of files per mini-batch. At this moment, batch deployment can't account for skews in the file's size distribution.
7272
73-
The `run()` method should return a pandas DataFrame or an array/list. Each returned output element indicates one successful run of an input element in the input `mini_batch`. For file datasets, each row/element will represent a single file processed. For a tabular dataset, each row/element will represent a row in a processed file.
73+
#### Generating predictions
74+
75+
The `run()` method should return a Pandas `DataFrame` or an array/list. Each returned output element indicates one successful run of an input element in the input `mini_batch`. For file datasets, each row/element will represent a single file processed. For a tabular dataset, each row/element will represent a row in a processed file.
7476

7577
> [!IMPORTANT]
76-
> __How to write predictions?__:
78+
> __How to write predictions?__
7779
>
7880
> Use __arrays__ when you need to output a single prediction. Use __pandas DataFrames__ when you need to return multiple pieces of information. For instance, for tabular data, you may want to append your predictions to the original record. Use a pandas DataFrame for this case. For file datasets, __we still recommend to output a pandas DataFrame__ as they provide a more robust approach to read the results.
7981
>
@@ -84,6 +86,16 @@ The `run()` method should return a pandas DataFrame or an array/list. Each retur
8486
8587
The resulting DataFrame or array is appended to the output file indicated. There's no requirement on the cardinality of the results (1 file can generate 1 or many rows/elements in the output). All elements in the result DataFrame or array will be written to the output file as-is (considering the `output_action` isn't `summary_only`).
8688

89+
#### Python packages for scoring
90+
91+
Any library that your scoring script requires to run needs to be indicated in the environment where your batch deployment runs. As for scoring scripts, environments are indicated per deployment. Usually, you will indicate your requirements using a `conda.yml` dependencies file which may look as follows:
92+
93+
__mnist/environment/conda.yml__
94+
95+
:::code language="yaml" source="~/azureml-examples-main/cli/endpoints/batch/mnist/environment/conda.yml":::
96+
97+
Refer to [Create a batch deployment](how-to-use-batch-endpoint.md#create-a-batch-deployment) for more details about how to indicate the environment for your model.
98+
8799
## Writing predictions in a different way
88100

89101
By default, the batch deployment will write the model's predictions in a single file as indicated in the deployment. However, there are some cases where you need to write the predictions in multiple files. For instance, if the input data is partitioned, you typically would want to generate your output partitioned too. On those cases you can [Customize outputs in batch deployments](how-to-deploy-model-custom-output.md) to indicate:

0 commit comments

Comments
 (0)