You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/machine-learning/how-to-train-scikit-learn.md
+29-29Lines changed: 29 additions & 29 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -8,7 +8,7 @@ ms.subservice: training
8
8
ms.author: sgilley
9
9
author: sdgilley
10
10
ms.reviewer: balapv
11
-
ms.date: 03/26/2024
11
+
ms.date: 03/06/2025
12
12
ms.topic: how-to
13
13
ms.custom: sdkv2, update-code
14
14
#Customer intent: As a Python scikit-learn developer, I need to combine open-source with a cloud platform to train, evaluate, and deploy my machine learning models at scale.
@@ -49,7 +49,7 @@ We're using `DefaultAzureCredential` to get access to the workspace. This creden
49
49
50
50
If `DefaultAzureCredential` doesn't work for you, see [`azure-identity reference documentation`](/python/api/azure-identity/azure.identity) or [`Set up authentication`](how-to-setup-authentication.md?tabs=sdk) for more available credentials.
The result of running this script is a workspace handle that you use to manage other resources and jobs.
74
74
75
75
> [!NOTE]
76
-
> Creating `MLClient`will not connect the client to the workspace. The client initialization is lazy and will wait for the first time it needs to make a call. In this article, this will happen during compute creation.
76
+
> Creating `MLClient`won't connect the client to the workspace. The client initialization is lazy and waits for the first time it needs to make a call. In this article, this happens during compute creation.
77
77
78
78
### Create a compute resource
79
79
80
80
Azure Machine Learning needs a compute resource to run a job. This resource can be single or multi-node machines with Linux or Windows OS, or a specific compute fabric like Spark.
81
81
82
82
In the following example script, we provision a Linux [`compute cluster`](./how-to-create-attach-compute-cluster.md?tabs=python). You can see the [`Azure Machine Learning pricing`](https://azure.microsoft.com/pricing/details/machine-learning/) page for the full list of VM sizes and prices. We only need a basic cluster for this example; thus, we pick a Standard_DS3_v2 model with 2 vCPU cores and 7-GB RAM to create an Azure Machine Learning compute.
@@ -91,26 +91,26 @@ Azure Machine Learning allows you to either use a curated (or ready-made) enviro
91
91
92
92
#### Create a custom environment
93
93
94
-
To create your custom environment, you define your Conda dependencies in a YAML file. First, create a directory for storing the file. In this example, we've named the directory`env`.
94
+
To create your custom environment, you define your Conda dependencies in a YAML file. First, create a directory for storing the file. In this example, the name is`env`.
The specification contains some usual packages (such as numpy and pip) that you use in your job.
103
103
104
104
Next, use the YAML file to create and register this custom environment in your workspace. The environment is packaged into a Docker container at runtime.
For more information on creating and using environments, see [Create and use software environments in Azure Machine Learning](how-to-use-environments.md).
109
109
110
110
##### [Optional] Create a custom environment with Intel® Extension for Scikit-Learn
111
111
112
-
Want to speed up your scikit-learn scripts on Intel hardware? Try adding [Intel® Extension for Scikit-Learn](https://www.intel.com/content/www/us/en/developer/tools/oneapi/scikit-learn.html) into your conda yaml file and following the subsequent steps detailed above. We'll show you how to enable these optimizations later in this example:
Want to speed up your scikit-learn scripts on Intel hardware? Try adding [Intel® Extension for Scikit-Learn](https://www.intel.com/content/www/us/en/developer/tools/oneapi/scikit-learn.html) into your conda yaml file and following the subsequent steps detailed above. You'll see how to enable these optimizations later in this example:
#### [Optional] Enable Intel® Extension for Scikit-Learn optimizations for more performance on Intel hardware
141
141
142
-
If you have installed Intel® Extension for Scikit-Learn (as demonstrated in the previous section), you can enable the performance optimizations by adding the two lines of code to the top of the script file, as shown below.
142
+
If you installed Intel® Extension for Scikit-Learn (as demonstrated in the previous section), you can enable the performance optimizations by adding the two lines of code to the top of the script file, as shown below.
143
143
144
144
To learn more about Intel® Extension for Scikit-Learn, visit the package's [documentation](https://intel.github.io/scikit-learn-intelex/).
@@ -163,15 +163,15 @@ You use the general purpose `command` to run the training script and perform you
163
163
- configure the command line action itself—in this case, the command is `python train_iris.py`. You can access the inputs and outputs in the command via the `${{ ... }}` notation; and
164
164
- configure the metadata such as the display name and experiment name; where an experiment is a container for all the iterations one does on a certain project. All the jobs submitted under the same experiment name would be listed next to each other in Azure Machine Learning studio.
Once completed, the job registers a model in your workspace (as a result of training) and output a link for viewing the job in Azure Machine Learning studio.
174
+
Once completed, the job registers a model in your workspace (as a result of training) and outputs a link for viewing the job in Azure Machine Learning studio.
175
175
176
176
> [!WARNING]
177
177
> Azure Machine Learning runs training scripts by copying the entire source directory. If you have sensitive data that you don't want to upload, use a [.ignore file](concept-train-machine-learning-model.md#understand-what-happens-when-you-submit-a-training-job) or don't include it in the source directory.
@@ -189,19 +189,19 @@ As the job is executed, it goes through the following stages:
189
189
190
190
Now that you've seen how to do a simple Scikit-learn training run using the SDK, let's see if you can further improve the accuracy of your model. You can tune and optimize our model's hyperparameters using Azure Machine Learning's [`sweep`](/python/api/azure-ai-ml/azure.ai.ml.sweep) capabilities.
191
191
192
-
To tune the model's hyperparameters, define the parameter space in which to search during training. You do this by replacing some of the parameters (`kernel` and `penalty`) passed to the training job with special inputs from the `azure.ml.sweep` package.
192
+
To tune the model's hyperparameters, define the parameter space in which to search during training. You tune by replacing some of the parameters (`kernel` and `penalty`) passed to the training job with special inputs from the `azure.ml.sweep` package.
Then, you configure sweep on the command job, using some sweep-specific parameters, such as the primary metric to watch and the sampling algorithm to use.
197
197
198
198
In the following code we use random sampling to try different configuration sets of hyperparameters in an attempt to maximize our primary metric, `Accuracy`.
After you've registered your model, you can deploy it the same way as any other registered model in Azure Machine Learning. For more information about deployment, see [Deploy and score a machine learning model with managed online endpoint using Python SDK v2](how-to-deploy-managed-online-endpoint-sdk-v2.md).
222
+
After you register your model, you can deploy it the same way as any other registered model in Azure Machine Learning. For more information about deployment, see [Deploy and score a machine learning model with managed online endpoint using Python SDK v2](how-to-deploy-managed-online-endpoint-sdk-v2.md).
0 commit comments