Skip to content

Commit c3f8850

Browse files
author
Larry O'Brien
committed
Rollback edits belonging to another branch
Rolling back unintentional 1693963 WIP.
1 parent fde3dfd commit c3f8850

File tree

1 file changed

+9
-9
lines changed

1 file changed

+9
-9
lines changed

articles/machine-learning/how-to-create-your-first-pipeline.md

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ The ML pipelines you create are visible to the members of your Azure Machine Lea
2727

2828
ML pipelines use remote compute targets for computation and the storage of the intermediate and final data associated with that pipeline. They can read and write data to and from supported [Azure Storage](https://docs.microsoft.com/azure/storage/) locations.
2929

30-
If you don't have an Azure subscription, create a free account before you begin. Try the [free or paid version of Azure Machine Learning](https://aka.ms/AMLFree).
30+
If you dont have an Azure subscription, create a free account before you begin. Try the [free or paid version of Azure Machine Learning](https://aka.ms/AMLFree).
3131

3232
## Prerequisites
3333

@@ -51,7 +51,7 @@ Create the resources required to run an ML pipeline:
5151

5252
* Set up a datastore used to access the data needed in the pipeline steps.
5353

54-
* Configure a `Dataset` object to point to data that lives in, or is accessible in, a datastore.
54+
* Configure a `DataReference` object to point to data that lives in, or is accessible in, a datastore.
5555

5656
* Set up the [compute targets](concept-azure-machine-learning-architecture.md#compute-targets) on which your pipeline steps will run.
5757

@@ -82,13 +82,13 @@ def_blob_store.upload_files(
8282
overwrite=True)
8383
```
8484

85-
A pipeline consists of one or more steps. A step is a unit run on a compute target. Steps might consume data sources and produce "intermediate" data. A step can create data such as a model, a directory with model and dependent files, or temporary data. This data is then available for other steps later in the pipeline.
85+
A pipeline consists of one or more steps. A step is a unit run on a compute target. Steps might consume data sources and produce intermediate data. A step can create data such as a model, a directory with model and dependent files, or temporary data. This data is then available for other steps later in the pipeline.
8686

8787
To learn more about connecting your pipeline to your data, see the articles [How to Access Data](how-to-access-data.md) and [How to Register Datasets](how-to-create-register-datasets.md).
8888

89-
~~### Configure data set~~
89+
### Configure data reference
9090

91-
~~You just created a data source that can be referenced in a pipeline as an input to a step. The preferred way to provide data to a pipeline is a [Dataset](https://docs.microsoft.com/python/api/azureml-core/azureml.core.dataset.Dataset) object. The `Dataset` object points to data that lives in or is accessible from a datastore or at a Web URL. The `Dataset` class is abstract, so you will create an instance of either a `FileDataset` (referring to one or more files) or a `TabularDataset` that's created by parsing into a table the data in one or more files.~~
91+
You just created a data source that can be referenced in a pipeline as an input to a step. A data source in a pipeline is represented by a [DataReference](https://docs.microsoft.com/python/api/azureml-core/azureml.data.data_reference.datareference) object. The `DataReference` object points to data that lives in or is accessible from a datastore.
9292

9393
```python
9494
from azureml.data.data_reference import DataReference
@@ -99,7 +99,7 @@ blob_input_data = DataReference(
9999
path_on_datastore="20newsgroups/20news.pkl")
100100
```
101101

102-
~~Intermediate data (or output of a step) is represented by a [PipelineData](https://docs.microsoft.com/python/api/azureml-pipeline-core/azureml.pipeline.core.pipelinedata?view=azure-ml-py) object. `output_data1` is produced as the output of a step, and used as the input of one or more future steps. `PipelineData` introduces a data dependency between steps, and creates an implicit execution order in the pipeline. This object will be used later when creating pipeline steps.~~
102+
Intermediate data (or output of a step) is represented by a [PipelineData](https://docs.microsoft.com/python/api/azureml-pipeline-core/azureml.pipeline.core.pipelinedata?view=azure-ml-py) object. `output_data1` is produced as the output of a step, and used as the input of one or more future steps. `PipelineData` introduces a data dependency between steps, and creates an implicit execution order in the pipeline. This object will be used later when creating pipeline steps.
103103

104104
```python
105105
from azureml.pipeline.core import PipelineData
@@ -109,7 +109,7 @@ output_data1 = PipelineData(
109109
datastore=def_blob_store,
110110
output_name="output_data1")
111111
```
112-
~~ KILL ALL ABOVE ~~
112+
113113
### Configure data using datasets
114114

115115
If you have tabular data stored in a file or set of files, a [TabularDataset](https://docs.microsoft.com/python/api/azureml-core/azureml.data.tabulardataset?view=azure-ml-py) is an efficient alternative to a `DataReference`. `TabularDataset` objects support versioning, diffs, and summary statistics. `TabularDataset`s are lazily evaluated (like Python generators) and it's efficient to subset them by splitting or filtering. The `FileDataset` class provides similar lazily-evaluated data representing one or more files.
@@ -286,7 +286,7 @@ from azureml.pipeline.steps import PythonScriptStep
286286
trainStep = PythonScriptStep(
287287
script_name="train.py",
288288
arguments=["--input", blob_input_data, "--output", output_data1],
289-
inputs=[my_dataset.as_named_input('input')],
289+
inputs=[blob_input_data],
290290
outputs=[output_data1],
291291
compute_target=compute_target,
292292
source_directory=project_folder
@@ -385,7 +385,7 @@ When you first run a pipeline, Azure Machine Learning:
385385
* Downloads the Docker image for each step to the compute target from the container registry.
386386
* Mounts the datastore if a `DataReference` object is specified in a step. If mount is not supported, the data is instead copied to the compute target.
387387
* Runs the step in the compute target specified in the step definition.
388-
* Creates artifacts, such as logs, stdout and stderr, metrics, and output specified by the step. These artifacts are then uploaded and kept in the user's default datastore.
388+
* Creates artifacts, such as logs, stdout and stderr, metrics, and output specified by the step. These artifacts are then uploaded and kept in the users default datastore.
389389

390390
![Diagram of running an experiment as a pipeline](./media/how-to-create-your-first-pipeline/run_an_experiment_as_a_pipeline.png)
391391

0 commit comments

Comments
 (0)