Skip to content

Commit d136ac1

Browse files
committed
Fix wording per feedback.
1 parent a261bf7 commit d136ac1

File tree

1 file changed

+8
-5
lines changed

1 file changed

+8
-5
lines changed

articles/machine-learning/how-to-use-parallel-run-step.md

Lines changed: 8 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -11,13 +11,13 @@ ms.reviewer: trbye, jmartens, larryfr
1111
ms.author: tracych
1212
author: tracychms
1313
ms.date: 04/15/2020
14-
ms.custom: Build2019
14+
ms.custom: Build2020
1515
---
1616

1717
# Run batch inference on large amounts of data by using Azure Machine Learning
1818
[!INCLUDE [applies-to-skus](../../includes/aml-applies-to-basic-enterprise-sku.md)]
1919

20-
Learn how to run batch inference on large amounts of data asynchronously and in parallel by using Azure Machine Learning. The ParallelRunStep is a high-performance and high-throughput way to generate inferences and processing data. It provides parallelism capabilities out of the box.
20+
Learn how to run batch inference on large amounts of data asynchronously and in parallel by using Azure Machine Learning. The ParallelRunStep provides parallelism capabilities out of the box.
2121

2222
With ParallelRunStep, it's straightforward to scale offline inferences to large clusters of machines on terabytes of structured or unstructured data with improved productivity and optimized cost.
2323

@@ -55,6 +55,9 @@ from azureml.core import Workspace
5555
ws = Workspace.from_config()
5656
```
5757

58+
> [!IMPORTANT]
59+
> This code snippet expects the workspace configuration to be saved in the current directory or its parent. For more information on creating a workspace, see [Create and manage Azure Machine Learning workspaces](how-to-manage-workspace.md). For more information on saving the configuration to file, see [Create a workspace configuration file](how-to-configure-environment.md#workspace).
60+
5861
### Create a compute target
5962

6063
In Azure Machine Learning, *compute* (or *compute target*) refers to the machines or clusters that perform the computational steps in your machine learning pipeline. Run the following code to create a CPU based [AmlCompute](https://docs.microsoft.com/python/api/azureml-core/azureml.core.compute.amlcompute.amlcompute?view=azure-ml-py) target.
@@ -64,7 +67,7 @@ from azureml.core.compute import AmlCompute, ComputeTarget
6467
from azureml.core.compute_target import ComputeTargetException
6568

6669
# choose a name for your cluster
67-
compute_name = os.environ.get("AML_COMPUTE_CLUSTER_NAME", "e2ecpucluster")
70+
compute_name = os.environ.get("AML_COMPUTE_CLUSTER_NAME", "cpucluster")
6871
compute_min_nodes = os.environ.get("AML_COMPUTE_CLUSTER_MIN_NODES", 0)
6972
compute_max_nodes = os.environ.get("AML_COMPUTE_CLUSTER_MAX_NODES", 4)
7073

@@ -125,7 +128,7 @@ def_data_store = ws.get_default_datastore()
125128

126129
### Create the data inputs
127130

128-
The inputs for batch inference is the data that you want to partition for parallel processing. A batch inference pipeline accepts data inputs through [`Dataset`](https://docs.microsoft.com/python/api/azureml-core/azureml.core.dataset.dataset?view=azure-ml-py).
131+
The inputs for batch inference are the data that you want to partition for parallel processing. A batch inference pipeline accepts data inputs through [`Dataset`](https://docs.microsoft.com/python/api/azureml-core/azureml.core.dataset.dataset?view=azure-ml-py).
129132

130133
`Dataset` is for exploring, transforming, and managing data in Azure Machine Learning. There are two types: [`TabularDataset`](https://docs.microsoft.com/python/api/azureml-core/azureml.data.tabulardataset?view=azure-ml-py) and [`FileDataset`](https://docs.microsoft.com/python/api/azureml-core/azureml.data.filedataset?view=azure-ml-py). In this example, you'll use `FileDataset` as the inputs. `FileDataset` provides you with the ability to download or mount the files to your compute. By creating a dataset, you create a reference to the data source location. If you applied any subsetting transformations to the dataset, they will be stored in the dataset as well. The data remains in its existing location, so no extra storage cost is incurred.
131134

@@ -140,7 +143,7 @@ path_on_datastore = mnist_blob.path('mnist/')
140143
input_mnist_ds = Dataset.File.from_files(path=path_on_datastore, validate=False)
141144
```
142145

143-
In order to use dynamic data inputs when run the batch inference pipeline, you can define the inputs `Dataset` as a [`PipelineParameter`](https://docs.microsoft.com/python/api/azureml-pipeline-core/azureml.pipeline.core.graph.pipelineparameter?view=azure-ml-py). You can specify the inputs dataset each time when you resubmit a batch inference pipeline run.
146+
In order to use dynamic data inputs when running the batch inference pipeline, you can define the inputs `Dataset` as a [`PipelineParameter`](https://docs.microsoft.com/python/api/azureml-pipeline-core/azureml.pipeline.core.graph.pipelineparameter?view=azure-ml-py). You can specify the inputs dataset each time you resubmit a batch inference pipeline run.
144147

145148
```python
146149
from azureml.data.dataset_consumption_config import DatasetConsumptionConfig

0 commit comments

Comments
 (0)