Merge pull request #239127 from santiagxf/santiagxf/azureml-batch-patch

prmerger-automator[bot] · web-flow · commit 3c05c751ae6d · 2023-05-24T15:24:50.000Z
Update how-to-batch-scoring-script.md
diff --git a/articles/machine-learning/how-to-batch-scoring-script.md b/articles/machine-learning/how-to-batch-scoring-script.md
@@ -33,15 +33,17 @@ The scoring script is a Python file (`.py`) that contains the logic about how to
 
 __deployment.yml__
 
-:::code language="yaml" source="~/azureml-examples-main/cli/endpoints/batch/deploy-models/mnist-classifier/deployment-torch/deployment.yml" range="8-10":::
+:::code language="yaml" source="~/azureml-examples-main/cli/endpoints/batch/deploy-models/mnist-classifier/deployment-torch/deployment.yml" range="9-11":::
 
 # [Python](#tab/python)
 
 ```python
-deployment = BatchDeployment(
+deployment = ModelBatchDeployment(
     ...
-    code_path="code",
-    scoring_script="batch_driver.py",
+    code_configuration=CodeConfiguration(
+        code="src",
+        scoring_script="batch_driver.py"
+    ),
     ...
 )
 ```
@@ -102,7 +104,7 @@ The method receives a list of file paths as a parameter (`mini_batch`). You can
 > 
 > Batch deployments distribute work at the file level, which means that a folder containing 100 files with mini-batches of 10 files will generate 10 batches of 10 files each. Notice that this will happen regardless of the size of the files involved. If your files are too big to be processed in large mini-batches we suggest to either split the files in smaller files to achieve a higher level of parallelism or to decrease the number of files per mini-batch. At this moment, batch deployment can't account for skews in the file's size distribution.
 
-The `run()` method should return a Pandas `DataFrame` or an array/list. Each returned output element indicates one successful run of an input element in the input `mini_batch`. For file datasets, each row/element represents a single file processed. For a tabular dataset, each row/element represents a row in a processed file.
+The `run()` method should return a Pandas `DataFrame` or an array/list. Each returned output element indicates one successful run of an input element in the input `mini_batch`. For file or folder data assets, each row/element returned represents a single file processed. For a tabular data asset, each row/element returned represents a row in a processed file.
 
 > [!IMPORTANT]
 > __How to write predictions?__
@@ -112,7 +114,7 @@ The `run()` method should return a Pandas `DataFrame` or an array/list. Each ret
 > If you need to write predictions in a different way, you can [customize outputs in batch deployments](how-to-deploy-model-custom-output.md).
 
 > [!WARNING]
-> Do not not output complex data types (or lists of complex data types) in the `run` function. Those outputs will be transformed to string and they will be hard to read.
+> Do not not output complex data types (or lists of complex data types) rather than `pandas.DataFrame` in the `run` function. Those outputs will be transformed to string and they will be hard to read.
 
 The resulting DataFrame or array is appended to the output file indicated. There's no requirement on the cardinality of the results (1 file can generate 1 or many rows/elements in the output). All elements in the result DataFrame or array are written to the output file as-is (considering the `output_action` isn't `summary_only`).
 
@@ -131,7 +133,7 @@ Refer to [Create a batch deployment](how-to-use-batch-endpoint.md#create-a-batch
 By default, the batch deployment writes the model's predictions in a single file as indicated in the deployment. However, there are some cases where you need to write the predictions in multiple files. For instance, if the input data is partitioned, you typically would want to generate your output partitioned too. On those cases you can [Customize outputs in batch deployments](how-to-deploy-model-custom-output.md) to indicate:
 
 > [!div class="checklist"]
-> * The file format used (CSV, parquet, json, etc).
+> * The file format used (CSV, parquet, json, etc) to write predictions.
 > * The way data is partitioned in the output.
 
 Read the article [Customize outputs in batch deployments](how-to-deploy-model-custom-output.md) for an example about how to achieve it.
@@ -190,7 +192,7 @@ For an example about how to achieve it see [Text processing with batch deploymen
 
 ### Using models that are folders
 
-The environment variable `AZUREML_MODEL_DIR` contains the path to where the selected model is located and it is typically used in the `init()` function to load the model into memory. However, some models may contain its files inside of a folder. When reading the files in this variable, you may need to account for that. You can identify the folder where your MLflow model is placed as follows:
+The environment variable `AZUREML_MODEL_DIR` contains the path to where the selected model is located and it is typically used in the `init()` function to load the model into memory. However, some models may contain their files inside of a folder and you may need to account for that when loading them. You can identify the folder structure of your model as follows:
 
 1. Go to [Azure Machine Learning portal](https://ml.azure.com).