Skip to content

Commit 3c05c75

Browse files
Merge pull request #239127 from santiagxf/santiagxf/azureml-batch-patch
Update how-to-batch-scoring-script.md
2 parents 4a20746 + e33a061 commit 3c05c75

File tree

1 file changed

+10
-8
lines changed

1 file changed

+10
-8
lines changed

articles/machine-learning/how-to-batch-scoring-script.md

Lines changed: 10 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -33,15 +33,17 @@ The scoring script is a Python file (`.py`) that contains the logic about how to
3333

3434
__deployment.yml__
3535

36-
:::code language="yaml" source="~/azureml-examples-main/cli/endpoints/batch/deploy-models/mnist-classifier/deployment-torch/deployment.yml" range="8-10":::
36+
:::code language="yaml" source="~/azureml-examples-main/cli/endpoints/batch/deploy-models/mnist-classifier/deployment-torch/deployment.yml" range="9-11":::
3737

3838
# [Python](#tab/python)
3939

4040
```python
41-
deployment = BatchDeployment(
41+
deployment = ModelBatchDeployment(
4242
...
43-
code_path="code",
44-
scoring_script="batch_driver.py",
43+
code_configuration=CodeConfiguration(
44+
code="src",
45+
scoring_script="batch_driver.py"
46+
),
4547
...
4648
)
4749
```
@@ -102,7 +104,7 @@ The method receives a list of file paths as a parameter (`mini_batch`). You can
102104
>
103105
> Batch deployments distribute work at the file level, which means that a folder containing 100 files with mini-batches of 10 files will generate 10 batches of 10 files each. Notice that this will happen regardless of the size of the files involved. If your files are too big to be processed in large mini-batches we suggest to either split the files in smaller files to achieve a higher level of parallelism or to decrease the number of files per mini-batch. At this moment, batch deployment can't account for skews in the file's size distribution.
104106
105-
The `run()` method should return a Pandas `DataFrame` or an array/list. Each returned output element indicates one successful run of an input element in the input `mini_batch`. For file datasets, each row/element represents a single file processed. For a tabular dataset, each row/element represents a row in a processed file.
107+
The `run()` method should return a Pandas `DataFrame` or an array/list. Each returned output element indicates one successful run of an input element in the input `mini_batch`. For file or folder data assets, each row/element returned represents a single file processed. For a tabular data asset, each row/element returned represents a row in a processed file.
106108

107109
> [!IMPORTANT]
108110
> __How to write predictions?__
@@ -112,7 +114,7 @@ The `run()` method should return a Pandas `DataFrame` or an array/list. Each ret
112114
> If you need to write predictions in a different way, you can [customize outputs in batch deployments](how-to-deploy-model-custom-output.md).
113115
114116
> [!WARNING]
115-
> Do not not output complex data types (or lists of complex data types) in the `run` function. Those outputs will be transformed to string and they will be hard to read.
117+
> Do not not output complex data types (or lists of complex data types) rather than `pandas.DataFrame` in the `run` function. Those outputs will be transformed to string and they will be hard to read.
116118
117119
The resulting DataFrame or array is appended to the output file indicated. There's no requirement on the cardinality of the results (1 file can generate 1 or many rows/elements in the output). All elements in the result DataFrame or array are written to the output file as-is (considering the `output_action` isn't `summary_only`).
118120

@@ -131,7 +133,7 @@ Refer to [Create a batch deployment](how-to-use-batch-endpoint.md#create-a-batch
131133
By default, the batch deployment writes the model's predictions in a single file as indicated in the deployment. However, there are some cases where you need to write the predictions in multiple files. For instance, if the input data is partitioned, you typically would want to generate your output partitioned too. On those cases you can [Customize outputs in batch deployments](how-to-deploy-model-custom-output.md) to indicate:
132134

133135
> [!div class="checklist"]
134-
> * The file format used (CSV, parquet, json, etc).
136+
> * The file format used (CSV, parquet, json, etc) to write predictions.
135137
> * The way data is partitioned in the output.
136138
137139
Read the article [Customize outputs in batch deployments](how-to-deploy-model-custom-output.md) for an example about how to achieve it.
@@ -190,7 +192,7 @@ For an example about how to achieve it see [Text processing with batch deploymen
190192

191193
### Using models that are folders
192194

193-
The environment variable `AZUREML_MODEL_DIR` contains the path to where the selected model is located and it is typically used in the `init()` function to load the model into memory. However, some models may contain its files inside of a folder. When reading the files in this variable, you may need to account for that. You can identify the folder where your MLflow model is placed as follows:
195+
The environment variable `AZUREML_MODEL_DIR` contains the path to where the selected model is located and it is typically used in the `init()` function to load the model into memory. However, some models may contain their files inside of a folder and you may need to account for that when loading them. You can identify the folder structure of your model as follows:
194196

195197
1. Go to [Azure Machine Learning portal](https://ml.azure.com).
196198

0 commit comments

Comments
 (0)