Skip to content

Commit 7cc689c

Browse files
authored
Merge pull request #104334 from lobrien/1670266-Pipeline-Conceptual-Table
Update pipelines conceptual article w improved 'what pipeline tech?' table
2 parents a7e49e4 + 715e10f commit 7cc689c

File tree

1 file changed

+8
-7
lines changed

1 file changed

+8
-7
lines changed

articles/machine-learning/concept-ml-pipelines.md

Lines changed: 8 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -35,19 +35,20 @@ Learn how to [create your first pipeline](how-to-create-your-first-pipeline.md).
3535

3636
The Azure cloud provides several other pipelines, each with a different purpose. The following table lists the different pipelines and what they are used for:
3737

38-
| Pipeline | What it does | Canonical pipe |
39-
| ---- | ---- | ---- |
40-
| Azure Machine Learning pipelines | Defines reusable machine learning workflows that can be used as a template for your machine learning scenarios. | Data -> model |
41-
| [Azure Data Factory pipelines](https://docs.microsoft.com/azure/data-factory/concepts-pipelines-activities) | Groups data movement, transformation, and control activities needed to perform a task. | Data -> data |
42-
| [Azure Pipelines](https://azure.microsoft.com/services/devops/pipelines/) | Continuous integration and delivery of your application to any platform/any cloud | Code -> app/service |
38+
| Scenario | Primary persona | Azure offering | OSS offering | Canonical pipe | Strengths |
39+
| -------- | --------------- | -------------- | ------------ | -------------- | --------- |
40+
| Model orchestration (Machine learning) | Data scientist | Azure Machine Learning Pipelines | Kubeflow Pipelines | Data -> Model | Distribution, caching, code-first, reuse |
41+
| Data orchestration (Data prep) | Data engineer | [Azure Data Factory pipelines](https://docs.microsoft.com/azure/data-factory/concepts-pipelines-activities) | Apache Airflow | Data -> Data | Strongly-typed movement. Data-centric activities. |
42+
| Code & app orchestration (CI/CD) | App Developer / Ops | [Azure DevOps Pipelines](https://azure.microsoft.com/services/devops/pipelines/) | Jenkins | Code + Model -> App/Service | Most open and flexibile activity support, approval queues, phases with gating |
43+
4344

4445
## What can Azure ML pipelines do?
4546

4647
An Azure Machine Learning pipeline is an independently executable workflow of a complete machine learning task. Subtasks are encapsulated as a series of steps within the pipeline. An Azure Machine Learning pipeline can be as simple as one that calls a Python script, so _may_ do just about anything. Pipelines _should_ focus on machine learning tasks such as:
4748

4849
+ Data preparation including importing, validating and cleaning, munging and transformation, normalization, and staging
4950
+ Training configuration including parameterizing arguments, filepaths, and logging / reporting configurations
50-
+ Training and validating efficiently and repeatedly, which might include specifying specific data subsets, different hardware compute resources, distributed processing, and progress monitoring
51+
+ Training and validating efficiently and repeatedly. Efficiency might come from specifying specific data subsets, different hardware compute resources, distributed processing, and progress monitoring
5152
+ Deployment, including versioning, scaling, provisioning, and access control
5253

5354
Independent steps allow multiple data scientists to work on the same pipeline at the same time without over-taxing compute resources. Separate steps also make it easy to use different compute types/sizes for each step.
@@ -64,7 +65,7 @@ In short, all of the complex tasks of the machine learning lifecycle can be help
6465

6566
An Azure ML pipeline performs a complete logical workflow with an ordered sequence of steps. Each step is a discrete processing action. Pipelines run in the context of an Azure Machine Learning [Experiment](https://docs.microsoft.com/python/api/azureml-core/azureml.core.experiment.experiment?view=azure-ml-py).
6667

67-
In the very early stages of an ML project, it's fine to have a single Jupyter notebook or Python script that does all the work of Azure workspace and resource configuration, data preparation, run configuration, training, and validation. But just as functions and classes quickly become preferable to a single imperative block of code, ML workflows quickly become preferable to a monolithic notebook or script.
68+
In the early stages of an ML project, it's fine to have a single Jupyter notebook or Python script that does all the work of Azure workspace and resource configuration, data preparation, run configuration, training, and validation. But just as functions and classes quickly become preferable to a single imperative block of code, ML workflows quickly become preferable to a monolithic notebook or script.
6869

6970
By modularizing ML tasks, pipelines support the Computer Science imperative that a component should "do (only) one thing well." Modularity is clearly vital to project success when programming in teams, but even when working alone, even a small ML project involves separate tasks, each with a good amount of complexity. Tasks include: workspace configuration and data access, data preparation, model definition and configuration, and deployment. While the outputs of one or more tasks form the inputs to another, the exact implementation details of any one task are, at best, irrelevant distractions in the next. At worst, the computational state of one task can cause a bug in another.
7071

0 commit comments

Comments
 (0)