Skip to content

Commit 900c86a

Browse files
authored
Merge pull request #107524 from lobrien/1677809-EstimatorStep
Added section on EstimatorStep and DataTransferStep
2 parents 34aba26 + de3b2e9 commit 900c86a

File tree

1 file changed

+6
-0
lines changed

1 file changed

+6
-0
lines changed

articles/machine-learning/concept-ml-pipelines.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -200,6 +200,12 @@ The key advantages of using pipelines for your machine learning workflows are:
200200
| **Modularity** | Separating areas of concerns and isolating changes allows software to evolve at a faster rate with higher quality. |
201201
|**Collaboration**|Pipelines allow data scientists to collaborate across all areas of the machine learning design process, while being able to concurrently work on pipeline steps.|
202202

203+
### Choosing the proper PipelineStep subclass
204+
205+
The `PythonScriptStep` is the most flexible subclass of the abstract `PipelineStep`. Other subclasses, such as `EstimatorStep` subclasses and `DataTransferStep` can accomplish specific tasks with less code. For instance, an `EstimatorStep` can be created by simply passing in a name for the step, an `Estimator`, and a compute target. Or, you can override inputs and outputs, pipeline parameters, and arguments. For more information, see [Train models with Azure Machine Learning using estimator](how-to-train-ml-models.md).
206+
207+
The `DataTransferStep` makes it easy to move data between data sources and sinks. The code to do this manually is straightforward but repetitive. Instead, you can just create a `DataTransferStep` with a name, references to a data source and a data sink, and a compute target. The notebook [Azure Machine Learning Pipeline with DataTransferStep](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-data-transfer.ipynb) demonstrates this flexibility.
208+
203209
## Modules
204210

205211
While pipeline steps allow the reuse of the results of a previous run, in many cases the construction of the step assumes that the scripts and dependent files required must be locally available. If a data scientist wants to build on top of existing code, the scripts and dependencies often must be cloned from a separate repository.

0 commit comments

Comments
 (0)