You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/machine-learning/concept-ml-pipelines.md
+8-7Lines changed: 8 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -35,19 +35,20 @@ Learn how to [create your first pipeline](how-to-create-your-first-pipeline.md).
35
35
36
36
The Azure cloud provides several other pipelines, each with a different purpose. The following table lists the different pipelines and what they are used for:
37
37
38
-
| Pipeline | What it does | Canonical pipe |
39
-
| ---- | ---- | ---- |
40
-
| Azure Machine Learning pipelines | Defines reusable machine learning workflows that can be used as a template for your machine learning scenarios. | Data -> model |
41
-
|[Azure Data Factory pipelines](https://docs.microsoft.com/azure/data-factory/concepts-pipelines-activities)| Groups data movement, transformation, and control activities needed to perform a task. | Data -> data |
42
-
|[Azure Pipelines](https://azure.microsoft.com/services/devops/pipelines/)| Continuous integration and delivery of your application to any platform/any cloud | Code -> app/service |
| Model orchestration (Machine learning) | Data scientist | Azure Machine Learning Pipelines | Kubeflow Pipelines | Data -> Model | Distribution, caching, code-first, reuse |
41
+
| Data orchestration (Data prep) | Data engineer |[Azure Data Factory pipelines](https://docs.microsoft.com/azure/data-factory/concepts-pipelines-activities)| Apache Airflow | Data -> Data | Strongly-typed movement. Data-centric activities. |
42
+
| Code & app orchestration (CI/CD) | App Developer / Ops |[Azure DevOps Pipelines](https://azure.microsoft.com/services/devops/pipelines/)| Jenkins | Code + Model -> App/Service | Most open and flexibile activity support, approval queues, phases with gating |
43
+
43
44
44
45
## What can Azure ML pipelines do?
45
46
46
47
An Azure Machine Learning pipeline is an independently executable workflow of a complete machine learning task. Subtasks are encapsulated as a series of steps within the pipeline. An Azure Machine Learning pipeline can be as simple as one that calls a Python script, so _may_ do just about anything. Pipelines _should_ focus on machine learning tasks such as:
47
48
48
49
+ Data preparation including importing, validating and cleaning, munging and transformation, normalization, and staging
49
50
+ Training configuration including parameterizing arguments, filepaths, and logging / reporting configurations
50
-
+ Training and validating efficiently and repeatedly, which might include specifying specific data subsets, different hardware compute resources, distributed processing, and progress monitoring
51
+
+ Training and validating efficiently and repeatedly. Efficiency might come from specifying specific data subsets, different hardware compute resources, distributed processing, and progress monitoring
51
52
+ Deployment, including versioning, scaling, provisioning, and access control
52
53
53
54
Independent steps allow multiple data scientists to work on the same pipeline at the same time without over-taxing compute resources. Separate steps also make it easy to use different compute types/sizes for each step.
@@ -64,7 +65,7 @@ In short, all of the complex tasks of the machine learning lifecycle can be help
64
65
65
66
An Azure ML pipeline performs a complete logical workflow with an ordered sequence of steps. Each step is a discrete processing action. Pipelines run in the context of an Azure Machine Learning [Experiment](https://docs.microsoft.com/python/api/azureml-core/azureml.core.experiment.experiment?view=azure-ml-py).
66
67
67
-
In the very early stages of an ML project, it's fine to have a single Jupyter notebook or Python script that does all the work of Azure workspace and resource configuration, data preparation, run configuration, training, and validation. But just as functions and classes quickly become preferable to a single imperative block of code, ML workflows quickly become preferable to a monolithic notebook or script.
68
+
In the early stages of an ML project, it's fine to have a single Jupyter notebook or Python script that does all the work of Azure workspace and resource configuration, data preparation, run configuration, training, and validation. But just as functions and classes quickly become preferable to a single imperative block of code, ML workflows quickly become preferable to a monolithic notebook or script.
68
69
69
70
By modularizing ML tasks, pipelines support the Computer Science imperative that a component should "do (only) one thing well." Modularity is clearly vital to project success when programming in teams, but even when working alone, even a small ML project involves separate tasks, each with a good amount of complexity. Tasks include: workspace configuration and data access, data preparation, model definition and configuration, and deployment. While the outputs of one or more tasks form the inputs to another, the exact implementation details of any one task are, at best, irrelevant distractions in the next. At worst, the computational state of one task can cause a bug in another.
0 commit comments