You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
description: In this exercise you'll use the automated machine learning feature in Azure Machine Learning to train and evaluate a machine learning model. You'll then deploy and test the trained model.
- content: "Which compute would you recommend to train the model?"
21
-
choices:
22
-
- content: "Spark cluster"
23
-
isCorrect: false
24
-
explanation: "Incorrect. Since the data scientists aren't familiar with Spark, it's not the preferred solution right now."
25
-
- content: "CPU"
26
-
isCorrect: true
27
-
explanation: "Correct. CPU compute is the best option for training the model. There can be a need for GPU or Spark when model training takes too long, but for now it's best to start with CPUs."
28
-
- content: "GPU"
29
-
isCorrect: false
30
-
explanation: "Incorrect. There's no indication yet that GPUs are needed to train the model, and using GPUs over CPUs now, only incurs unnecessary extra costs."
31
-
- content: "What type of predictions does the mobile application need to generate?"
32
-
choices:
33
-
- content: "Real-time predictions"
34
-
isCorrect: true
35
-
explanation: "Correct. There's a need for immediate predictions for individual patients."
36
-
- content: "Batch predictions"
37
-
isCorrect: false
38
-
explanation: "Incorrect. Batch predictions would take too long. We need immediate predictions."
39
-
- content: "Local predictions"
40
-
isCorrect: false
41
-
explanation: "Incorrect. There's a need for immediate predictions for individual patients so we need real-time predictions."
- content: "What does automated machine learning in Azure Machine Learning enable you to do?"
21
+
choices:
22
+
- content: "Automatically deploy new versions of a model as they're trained"
23
+
isCorrect: false
24
+
explanation: "Incorrect. Automated machine learning does not automatically deploy models."
25
+
- content: "Automatically provision Azure Machine Learning workspaces for new data scientists in an organization"
26
+
isCorrect: false
27
+
explanation: "Incorrect. Automated machine learning does not automate workspace provisioning."
28
+
- content: "Automatically run multiple training jobs using different algorithms and parameters to find the best model"
29
+
isCorrect: true
30
+
explanation: "Correct. Automated machine learning runs multiple training jobs, varying algorithms and parameters, to find the best model for your data."
Copy file name to clipboardExpand all lines: learn-pr/wwl-data-ai/design-machine-learning-model-training-solution/includes/1-introduction.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,6 @@
1
1
Thoughtfully designed machine learning solutions form the foundation of today's AI applications. From predictive analytics to personalized recommendations and beyond, machine learning solutions support the latest technological advances in society by using existing data to produce new insights.
2
2
3
-
As a data scientist, you can make decisions to tackle machine learning problems in different ways. The decisions you make affect the cost, speed, quality, and longevity of the solution.
3
+
Data scientists make decisions to tackle machine learning problems in different ways. The decisions they make affect the cost, speed, quality, and longevity of the solution.
4
4
5
5
In this module, you learn how to design an end-to-end machine learning solution with Microsoft Azure that can be used in an enterprise setting. Using the following six steps as a framework, we explore how to plan, train, deploy, and monitor machine learning solutions.
Copy file name to clipboardExpand all lines: learn-pr/wwl-data-ai/design-machine-learning-model-training-solution/includes/2a-prepare-data.md
-10Lines changed: 0 additions & 10 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -19,16 +19,6 @@ First, you need to identify your data source and its current data format.
19
19
20
20
Then, you need to decide what data you need to train your model, and in what format you want that data to be served to the model.
21
21
22
-
## Choose how to serve data
23
-
24
-
To access data when training machine learning models, you want to serve the data by storing it in a cloud data service. By storing data *separately from your compute*, you minimize costs and are more flexible. It’s a best practice to store your data in one tool, which is separate from another tool you use to train your models.
25
-
26
-
Which tool or service is best to store your data depends on the data you have and the service you use for model training. Some commonly used options on Azure are:
27
-
28
-
-**Azure Blob Storage**: Cheapest option for storing data as unstructured data. Ideal for storing files like images, text, and JSON. Often also used to store data as CSV files, as data scientists prefer working with CSV files.
29
-
-**Azure Data Lake Storage** (Gen 2): A more advanced version of the Azure Blob Storage. Also stores files like CSV files and images as unstructured data. A data lake also implements a hierarchical namespace, which means it’s easier to give someone access to a specific file or folder. Storage capacity is virtually limitless so ideal for storing large data.
30
-
-**Azure SQL Database**: Stores data as structured data. Data is read as a table and schema is defined when a table in the database is created. Ideal for data that doesn’t change over time.
31
-
32
22
## Design a data ingestion solution
33
23
34
24
In general, it’s a best practice to extract data from its source before analyzing it. Whether you’re using the data for data engineering, data analysis, or data science, you want to extract the data from its source, transform it, and load it into a serving layer. Such a process is also referred to as **Extract**, **Transform**, and **Load** (**ETL**) or **Extract**, **Load**, and **Transform** (**ELT**). The serving layer makes your data available for the service you use for further data processing like training machine learning models.
Copy file name to clipboardExpand all lines: learn-pr/wwl-data-ai/design-machine-learning-model-training-solution/includes/3-choose-service-train.md
+15-23Lines changed: 15 additions & 23 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -15,33 +15,25 @@ Within Azure, there are several services available for training machine learning
15
15
||**Microsoft Fabric** is an integrated analytics platform designed to streamline data workflows between data analysts, data engineers, and data scientists. With Microsoft Fabric, you can prepare data, train a model, use the trained model to generate predictions, and visualize the data in Power BI reports. Learn more about [Microsoft Fabric](/fabric/get-started/microsoft-fabric-overview?azure-portal=true), and specifically about [the data science features in Microsoft Fabric](/fabric/data-science/?azure-portal=true).|
16
16
||**Azure AI Services** is a collection of prebuilt machine learning models you can use for common machine learning tasks such as object detection in images. The models are offered as an application programming interface (API), so you can easily integrate a model with your application. Some models can be customized with your own training data, saving time and resources to train a new model from scratch. Learn more about [Azure AI Services](/azure/cognitive-services/what-are-cognitive-services?azure-portal=true).|
17
17
18
-
## Understand the difference between services
18
+
## Features and capabilities of Azure Machine Learning
19
19
20
-
Choosing a service to use for training your machine learning models can be challenging. Often, multiple services would fit your scenario. There are some general guidelines to help you:
20
+
Let's focus on **Azure Machine Learning**. Microsoft Azure Machine Learning is a cloud service for training, deploying, and managing machine learning models. It's designed to be used by data scientists, software engineers, devops professionals, and others to manage the end-to-end lifecycle of machine learning projects.
21
21
22
-
- Use Azure AI Services whenever one of the customizable prebuilt models suits your requirements, to **save time and effort**.
23
-
- Use Microsoft Fabric or Azure Databricks if you want to **keep all data-related** (data analytics, data engineering, and data science) **projects within the same service**.
24
-
- Use Microsoft Fabric or Azure Databricks if you need **distributed compute** for working with large datasets (datasets are large when you experience capacity constraints with standard compute). You need to work with [PySpark](https://spark.apache.org/docs/latest/api/python?azure-portal=true) to use the distributed compute.
25
-
- Use Azure Machine Learning or Azure Databricks when you want **full control** over model training and management.
26
-
- Use Azure Machine Learning when **Python** is your preferred programming language.
27
-
- Use Azure Machine Learning when you want an **intuitive user interface** to manage your machine learning lifecycle.
22
+
Azure Machine Learning supports tasks including:
28
23
29
-
> [!Important]
30
-
> There are many factors which may influence your choice of service. Ultimately, it is up to you and your organization to decide what’s the best fit. These are simply guidelines to help you understand how to differentiate between services.
24
+
- Exploring data and preparing it for modeling.
25
+
- Training and evaluating machine learning models.
26
+
- Registering and managing trained models.
27
+
- Deploying trained models for use by applications and services.
28
+
- Reviewing and applying responsible AI principles and practices.
31
29
32
-
## Decide between compute options
30
+
Azure Machine Learning provides the following features and capabilities to support machine learning workloads:
33
31
34
-
Every time you train a model, you should monitor how long it takes to train the model and how much compute is used to execute your code. By monitoring the compute utilization, you know whether to scale your compute up or down.
32
+
- Centralized storage and management of datasets for model training and evaluation.
33
+
- On-demand compute resources on which you can run machine learning jobs, such as training a model.
34
+
- Automated machine learning (AutoML), which makes it easy to run multiple training jobs with different algorithms and parameters to find the best model for your data.
35
+
- Visual tools to define orchestrated *pipelines* for processes such as model training or inferencing.
36
+
- Integration with common machine learning frameworks such as MLflow, which make it easier to manage model training, evaluation, and deployment at scale.
37
+
- Built-in support for visualizing and evaluating metrics for responsible AI, including model explainability, fairness assessment, and others.
35
38
36
-
When you choose to work with Azure instead of training a model on a local device, you have access to scalable and cost-effective compute.
37
39
38
-
| Compute options | Considerations |
39
-
|---|---|
40
-
|**Central Processing Unit** (**CPU**) or a **Graphics Processing Unit** (**GPU**) | For smaller tabular datasets, a CPU is sufficient and cost-effective. For unstructured data like images or text, GPUs are more powerful and efficient. GPUs can also be used for larger tabular datasets, if CPU compute is proving to be insufficient.|
41
-
|**General purpose** or **memory optimized**| Use general purpose to have a balanced CPU-to-memory ratio, which is ideal for testing and development with smaller datasets. Use memory optimized to have a high memory-to-CPU ratio. Great for in-memory analytics, which is ideal when you have larger datasets or when you're working in notebooks. |
42
-
|**Spark**| A Spark cluster consists of a driver node and worker nodes. Your code initially communicates with the driver node. The work is then distributed across the worker nodes. When you use a service, like Azure Databricks, that distributes the work, parts of the workload can be executed in parallel, reducing the processing time.|
43
-
44
-
> [!Important]
45
-
> To make optimal use of a Spark cluster, your code needs to be written in a Spark-friendly language like Scala, SQL, RSpark, or PySpark in order to distribute the workload. If you write in Python, you only use the driver node and leave the worker nodes unused.
46
-
47
-
Which compute options best fit your needs is often a case of trial and error. When running code, you should monitor the compute utilization to understand how much compute resources you're using. If training your model takes too long, even with the largest compute size, you can use GPUs instead of CPUs. Alternatively, you can choose to distribute model training by using Spark compute which require you to rewrite your training scripts.
Let's see how we can get started with Azure Machine Learning in a user interface. You can use **Azure Machine Learning studio**, a browser-based portal for managing your machine learning resources and jobs, to access many types of machine learning capabilities.
2
+
3
+
In Azure Machine Learning studio, you can (among other things):
4
+
5
+
- Import and explore data.
6
+
- Create and use compute resources.
7
+
- Run code in notebooks.
8
+
- Use visual tools to create jobs and pipelines.
9
+
- Use automated machine learning to train models.
10
+
- View details of trained models, including evaluation metrics, responsible AI information, and training parameters.
11
+
- Deploy trained models for on-request and batch inferencing.
12
+
- Import and manage models from a comprehensive model catalog.
13
+
14
+

15
+
16
+
## Provisioning Azure Machine Learning resources
17
+
18
+
The primary resource required for Azure Machine Learning is an *Azure Machine Learning workspace*, which you can provision in an Azure subscription. Other supporting resources, including storage accounts, container registries, virtual machines, and others are created automatically as needed. You can create an Azure Machine Learning workspace in the *Azure portal*.
19
+
20
+
#### Decide between compute options
21
+
22
+
When you use Azure Machine Learning to train a model, you need to select **compute**. Compute refers to the computational resources required to perform the training process. Every time you train a model, you should monitor how long it takes to train the model and how much compute is used to execute your code. By monitoring the compute utilization, you know whether to scale your compute up or down.
23
+
24
+
When you choose to work with Azure instead of training a model on a local device, you have access to scalable and cost-effective compute.
25
+
26
+
| Compute options | Considerations |
27
+
|---|---|
28
+
|**Central Processing Unit** (**CPU**) or a **Graphics Processing Unit** (**GPU**) | For smaller tabular datasets, a CPU is sufficient and cost-effective. For unstructured data like images or text, GPUs are more powerful and efficient. GPUs can also be used for larger tabular datasets, if CPU compute is proving to be insufficient.|
29
+
|**General purpose** or **memory optimized**| Use general purpose to have a balanced CPU-to-memory ratio, which is ideal for testing and development with smaller datasets. Use memory optimized to have a high memory-to-CPU ratio. Great for in-memory analytics, which is ideal when you have larger datasets or when you're working in notebooks. |
30
+
31
+
Which compute options best fit your needs is often a case of trial and error. When running code, you should monitor the compute utilization to understand how much compute resources you're using. If training your model takes too long, even with the largest compute size, you can use GPUs instead of CPUs. Alternatively, you can choose to distribute model training by using Spark compute which require you to rewrite your training scripts.
32
+
33
+
## Azure Automated Machine Learning
34
+
35
+
When you use Azure Machine Learning's Automated Machine Learning capabilities, you are automatically assigned compute. **Azure Automated Machine Learning**, also referred to as automated ML or AutoML, is the process of automating the time-consuming, iterative tasks of machine learning model development.
36
+
37
+
In Azure Machine Learning Studio, you can use AutoML to design and run your automated ML training experiments with the same steps described in this module, without needing to write code. Azure AutoML provides a step-by-step wizard that helps you run machine learning training jobs. The automated training can be used for many machine learning tasks, including regression, time-series forecasting, classification, computer vision, and natural language processing tasks. Within AutoML, you have access to your own datasets. Your trained machine learning models can be deployed as services.
0 commit comments