Skip to content

Commit bff06e8

Browse files
Merge pull request #223775 from santiagxf/santiagxf/batch-scoring-dependencies
Update how-to-batch-scoring-script.md
2 parents af2c3d7 + 8b25906 commit bff06e8

File tree

1 file changed

+13
-3
lines changed

1 file changed

+13
-3
lines changed

articles/machine-learning/how-to-batch-scoring-script.md

Lines changed: 13 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -66,14 +66,14 @@ def run(mini_batch):
6666
The method receives a list of file paths as a parameter (`mini_batch`). You can use this list to either iterate over each file and process it one by one, or to read the entire batch and process it at once. The best option will depend on your compute memory and the throughput you need to achieve. For an example of how to read entire batches of data at once see [High throughput deployments](how-to-image-processing-batch.md#high-throughput-deployments).
6767

6868
> [!NOTE]
69-
> __How is work distributed?__:
69+
> __How is work distributed?__
7070
>
7171
> Batch deployments distribute work at the file level, which means that a folder containing 100 files with mini-batches of 10 files will generate 10 batches of 10 files each. Notice that this will happen regardless of the size of the files involved. If your files are too big to be processed in large mini-batches we suggest to either split the files in smaller files to achieve a higher level of parallelism or to decrease the number of files per mini-batch. At this moment, batch deployment can't account for skews in the file's size distribution.
7272
73-
The `run()` method should return a pandas DataFrame or an array/list. Each returned output element indicates one successful run of an input element in the input `mini_batch`. For file datasets, each row/element will represent a single file processed. For a tabular dataset, each row/element will represent a row in a processed file.
73+
The `run()` method should return a Pandas `DataFrame` or an array/list. Each returned output element indicates one successful run of an input element in the input `mini_batch`. For file datasets, each row/element will represent a single file processed. For a tabular dataset, each row/element will represent a row in a processed file.
7474

7575
> [!IMPORTANT]
76-
> __How to write predictions?__:
76+
> __How to write predictions?__
7777
>
7878
> Use __arrays__ when you need to output a single prediction. Use __pandas DataFrames__ when you need to return multiple pieces of information. For instance, for tabular data, you may want to append your predictions to the original record. Use a pandas DataFrame for this case. For file datasets, __we still recommend to output a pandas DataFrame__ as they provide a more robust approach to read the results.
7979
>
@@ -84,6 +84,16 @@ The `run()` method should return a pandas DataFrame or an array/list. Each retur
8484
8585
The resulting DataFrame or array is appended to the output file indicated. There's no requirement on the cardinality of the results (1 file can generate 1 or many rows/elements in the output). All elements in the result DataFrame or array will be written to the output file as-is (considering the `output_action` isn't `summary_only`).
8686

87+
#### Python packages for scoring
88+
89+
Any library that your scoring script requires to run needs to be indicated in the environment where your batch deployment runs. As for scoring scripts, environments are indicated per deployment. Usually, you will indicate your requirements using a `conda.yml` dependencies file which may look as follows:
90+
91+
__mnist/environment/conda.yml__
92+
93+
:::code language="yaml" source="~/azureml-examples-main/cli/endpoints/batch/mnist/environment/conda.yml":::
94+
95+
Refer to [Create a batch deployment](how-to-use-batch-endpoint.md#create-a-batch-deployment) for more details about how to indicate the environment for your model.
96+
8797
## Writing predictions in a different way
8898

8999
By default, the batch deployment will write the model's predictions in a single file as indicated in the deployment. However, there are some cases where you need to write the predictions in multiple files. For instance, if the input data is partitioned, you typically would want to generate your output partitioned too. On those cases you can [Customize outputs in batch deployments](how-to-deploy-model-custom-output.md) to indicate:

0 commit comments

Comments
 (0)