You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/machine-learning/how-to-create-your-first-pipeline.md
+9-9Lines changed: 9 additions & 9 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -27,7 +27,7 @@ The ML pipelines you create are visible to the members of your Azure Machine Lea
27
27
28
28
ML pipelines use remote compute targets for computation and the storage of the intermediate and final data associated with that pipeline. They can read and write data to and from supported [Azure Storage](https://docs.microsoft.com/azure/storage/) locations.
29
29
30
-
If you don't have an Azure subscription, create a free account before you begin. Try the [free or paid version of Azure Machine Learning](https://aka.ms/AMLFree).
30
+
If you don’t have an Azure subscription, create a free account before you begin. Try the [free or paid version of Azure Machine Learning](https://aka.ms/AMLFree).
31
31
32
32
## Prerequisites
33
33
@@ -51,7 +51,7 @@ Create the resources required to run an ML pipeline:
51
51
52
52
* Set up a datastore used to access the data needed in the pipeline steps.
53
53
54
-
* Configure a `Dataset` object to point to data that lives in, or is accessible in, a datastore.
54
+
* Configure a `DataReference` object to point to data that lives in, or is accessible in, a datastore.
55
55
56
56
* Set up the [compute targets](concept-azure-machine-learning-architecture.md#compute-targets) on which your pipeline steps will run.
57
57
@@ -82,13 +82,13 @@ def_blob_store.upload_files(
82
82
overwrite=True)
83
83
```
84
84
85
-
A pipeline consists of one or more steps. A step is a unit run on a compute target. Steps might consume data sources and produce "intermediate" data. A step can create data such as a model, a directory with model and dependent files, or temporary data. This data is then available for other steps later in the pipeline.
85
+
A pipeline consists of one or more steps. A step is a unit run on a compute target. Steps might consume data sources and produce “intermediate” data. A step can create data such as a model, a directory with model and dependent files, or temporary data. This data is then available for other steps later in the pipeline.
86
86
87
87
To learn more about connecting your pipeline to your data, see the articles [How to Access Data](how-to-access-data.md) and [How to Register Datasets](how-to-create-register-datasets.md).
88
88
89
-
~~### Configure data set~~
89
+
### Configure data reference
90
90
91
-
~~You just created a data source that can be referenced in a pipeline as an input to a step. The preferred way to provide data to a pipeline is a [Dataset](https://docs.microsoft.com/python/api/azureml-core/azureml.core.dataset.Dataset) object. The `Dataset` object points to data that lives in or is accessible from a datastore or at a Web URL. The `Dataset` class is abstract, so you will create an instance of either a `FileDataset` (referring to one or more files) or a `TabularDataset` that's created by parsing into a table the data in one or more files.~~
91
+
You just created a data source that can be referenced in a pipeline as an input to a step. A data source in a pipeline is represented by a [DataReference](https://docs.microsoft.com/python/api/azureml-core/azureml.data.data_reference.datareference) object. The `DataReference` object points to data that lives in or is accessible from a datastore.
92
92
93
93
```python
94
94
from azureml.data.data_reference import DataReference
~~Intermediate data (or output of a step) is represented by a [PipelineData](https://docs.microsoft.com/python/api/azureml-pipeline-core/azureml.pipeline.core.pipelinedata?view=azure-ml-py) object. `output_data1` is produced as the output of a step, and used as the input of one or more future steps. `PipelineData` introduces a data dependency between steps, and creates an implicit execution order in the pipeline. This object will be used later when creating pipeline steps.~~
102
+
Intermediate data (or output of a step) is represented by a [PipelineData](https://docs.microsoft.com/python/api/azureml-pipeline-core/azureml.pipeline.core.pipelinedata?view=azure-ml-py) object. `output_data1` is produced as the output of a step, and used as the input of one or more future steps. `PipelineData` introduces a data dependency between steps, and creates an implicit execution order in the pipeline. This object will be used later when creating pipeline steps.
103
103
104
104
```python
105
105
from azureml.pipeline.core import PipelineData
@@ -109,7 +109,7 @@ output_data1 = PipelineData(
109
109
datastore=def_blob_store,
110
110
output_name="output_data1")
111
111
```
112
-
~~ KILL ALL ABOVE ~~
112
+
113
113
### Configure data using datasets
114
114
115
115
If you have tabular data stored in a file or set of files, a [TabularDataset](https://docs.microsoft.com/python/api/azureml-core/azureml.data.tabulardataset?view=azure-ml-py) is an efficient alternative to a `DataReference`. `TabularDataset` objects support versioning, diffs, and summary statistics. `TabularDataset`s are lazily evaluated (like Python generators) and it's efficient to subset them by splitting or filtering. The `FileDataset` class provides similar lazily-evaluated data representing one or more files.
@@ -286,7 +286,7 @@ from azureml.pipeline.steps import PythonScriptStep
@@ -385,7 +385,7 @@ When you first run a pipeline, Azure Machine Learning:
385
385
* Downloads the Docker image for each step to the compute target from the container registry.
386
386
* Mounts the datastore if a `DataReference` object is specified in a step. If mount is not supported, the data is instead copied to the compute target.
387
387
* Runs the step in the compute target specified in the step definition.
388
-
* Creates artifacts, such as logs, stdout and stderr, metrics, and output specified by the step. These artifacts are then uploaded and kept in the user's default datastore.
388
+
* Creates artifacts, such as logs, stdout and stderr, metrics, and output specified by the step. These artifacts are then uploaded and kept in the user’s default datastore.
389
389
390
390

0 commit comments