Skip to content

Commit 18c2ce9

Browse files
authored
Merge pull request #80558 from nibaccam/scikit-learn
New article: Scikit learn
2 parents 7ab037f + 75fc7e2 commit 18c2ce9

File tree

2 files changed

+195
-0
lines changed

2 files changed

+195
-0
lines changed
Lines changed: 193 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,193 @@
1+
---
2+
title: Train and register scikit-learn models
3+
titleSuffix: Azure Machine Learning service
4+
description: This article shows you how to train and register a scikit-learn model using Azure Machine Learning service.
5+
services: machine-learning
6+
ms.service: machine-learning
7+
ms.subservice: core
8+
ms.topic: conceptual
9+
ms.author: minxia
10+
author: mx-iao
11+
ms.date: 06/30/2019
12+
ms.custom: seodec18
13+
---
14+
15+
# Train and register Scikit-learn models at scale with Azure Machine Learning service
16+
17+
This article shows you how to train and register a Scikit-learn model using Azure Machine Learning service. It uses the popular [Iris dataset](https://archive.ics.uci.edu/ml/datasets/iris) to classify iris flower images with the custom [scikit-learn](https://docs.microsoft.com/python/api/azureml-train-core/azureml.train.sklearn.sklearn?view=azure-ml-py) class.
18+
19+
Scikit-learn is an open-source computational framework commonly used for machine learning. With Azure Machine Learning service, you can rapidly scale out open-source training jobs using elastic cloud compute resources. You can also track your training runs, version models, deploy models, and much more.
20+
21+
Whether you're developing a Scikit-learn model from the ground-up or you're bringing an existing model into the cloud, Azure Machine Learning service can help you build production-ready models.
22+
23+
## Prerequisites
24+
25+
Run this code on either of these environments:
26+
- Azure Machine Learning Notebook VM - no downloads or installation necessary
27+
28+
- Complete the [cloud-based notebook quickstart](quickstart-run-cloud-notebook.md) to create a dedicated notebook server pre-loaded with the SDK and the sample repository.
29+
- In the samples folder on the notebook server, find a completed and expanded notebook by navigating to this directory: **how-to-use-azureml > training > train-hyperparameter-tune-deploy-with-sklearn** folder.
30+
31+
- Your own Jupyter Notebook server
32+
33+
- [Install the Azure Machine Learning SDK for Python](setup-create-workspace.md#sdk)
34+
- [Create a workspace configuration file](setup-create-workspace.md#write-a-configuration-file)
35+
- [Download the sample script file](https://github.com/Azure/MachineLearningNotebooks/tree/master/how-to-use-azureml/training/train-hyperparameter-tune-deploy-with-sklearn) `train_iris.py`
36+
37+
You can also find a completed [Jupyter Notebook version](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/training/train-hyperparameter-tune-deploy-with-keras/train-hyperparameter-tune-deploy-with-sklearn.ipynb) of this guide on the GitHub samples page. The notebook includes an expanded section covering intelligent hyperparameter tuning and retrieving the best model by primary metrics.
38+
39+
## Set up the experiment
40+
41+
This section sets up the training experiment by loading the required python packages, initializing a workspace, creating an experiment, and uploading the training data and training scripts.
42+
43+
### Import packages
44+
45+
First, import the necessary Python libraries.
46+
47+
```Python
48+
import os
49+
import urllib
50+
import shutil
51+
import azureml
52+
53+
from azureml.core import Experiment
54+
from azureml.core import Workspace, Run
55+
56+
from azureml.core.compute import ComputeTarget, AmlCompute
57+
from azureml.core.compute_target import ComputeTargetException
58+
```
59+
60+
### Initialize a workspace
61+
62+
The [Azure Machine Learning service workspace](concept-workspace.md) is the top-level resource for the service. It provides you with a centralized place to work with all the artifacts you create. In the Python SDK, you can access the workspace artifacts by creating a [`workspace`](https://docs.microsoft.com/python/api/azureml-core/azureml.core.workspace.workspace?view=azure-ml-py) object.
63+
64+
Create a workspace object from the `config.json` file created in the [prerequisites section](#prerequisites).
65+
66+
```Python
67+
ws = Workspace.from_config()
68+
```
69+
70+
### Create an experiment
71+
72+
Create an experiment and a folder to hold your training scripts. In this example, create an experiment called "sklearn-iris".
73+
74+
```Python
75+
project_folder = './sklearn-iris'
76+
os.makedirs(project_folder, exist_ok=True)
77+
78+
exp = Experiment(workspace=ws, name='sklearn-iris')
79+
```
80+
81+
### Upload dataset and scripts
82+
83+
The [datastore](how-to-access-data.md) is a place where data can be stored and accessed by mounting or copying the data to the compute target. Each workspace provides a default datastore. Upload the data and training scripts to the datastore so that they can be easily accessed during training.
84+
85+
1. Download the Iris dataset locally.
86+
87+
```Python
88+
os.makedirs('./data/iris', exist_ok=True)
89+
```
90+
91+
1. Upload the iris dataset to the default datastore.
92+
93+
```Python
94+
ds = ws.get_default_datastore()
95+
ds.upload(src_dir='./data/iris', target_path='iris', overwrite=True, show_progress=True)
96+
```
97+
98+
1. Upload the Scikit-learn training script, `train_iris.py`.
99+
100+
```Python
101+
shutil.copy('./train_iris.py', project_folder)
102+
```
103+
104+
## Create a compute target
105+
106+
Create a compute target for your Scikit-learn job to run on. Scikit learn only supports single node, CPU computing.
107+
108+
```Python
109+
cluster_name = "cpu-cluster"
110+
111+
try:
112+
compute_target = ComputeTarget(workspace=ws, name=cluster_name)
113+
print('Found existing compute target')
114+
except ComputeTargetException:
115+
print('Creating a new compute target...')
116+
compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_D2_V2',
117+
max_nodes=4)
118+
119+
compute_target = ComputeTarget.create(ws, cluster_name, compute_config)
120+
121+
compute_target.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)
122+
```
123+
124+
For more information on compute targets, see the [what is a compute target](concept-compute-target.md) article.
125+
126+
## Create a Scikit-learn estimator
127+
128+
The [Scikit-learn estimator](https://docs.microsoft.com/python/api/azureml-train-core/azureml.train.dnn.tensorflow?view=azure-ml-py) provides a simple way of launching a Scikit-learn training job on a compute target. It is implemented through the [`SKLearn`](https://docs.microsoft.com/python/api/azureml-train-core/azureml.train.sklearn.sklearn?view=azure-ml-py) class, which can be used to support single-node CPU training.
129+
130+
If your training script needs additional pip or conda packages to run, you can have the packages installed on the resulting docker image by passing their names through the `pip_packages` and `conda_packages` arguments.
131+
132+
```Python
133+
from azureml.train.sklearn import SKLearn
134+
135+
script_params = {
136+
'--kernel': 'linear',
137+
'--penalty': 1.0,
138+
}
139+
140+
estimator = SKLearn(source_directory=project_folder,
141+
script_params=script_params,
142+
compute_target=compute_target,
143+
entry_script='train_iris.py'
144+
pip_packages=['joblib']
145+
)
146+
```
147+
148+
## Submit a run
149+
150+
The [Run object](https://docs.microsoft.com/python/api/azureml-core/azureml.core.run%28class%29?view=azure-ml-py) provides the interface to the run history while the job is running and after it has completed.
151+
152+
```Python
153+
run = experiment.submit(estimator)
154+
run.wait_for_completion(show_output=True)
155+
```
156+
157+
As the Run is executed, it goes through the following stages:
158+
159+
- **Preparing**: A docker image is created according to the TensorFlow estimator. The image is uploaded to the workspace's container registry and cached for later runs. Logs are also streamed to the run history and can be viewed to monitor progress.
160+
161+
- **Scaling**: The cluster attempts to scale up if the Batch AI cluster requires more nodes to execute the run than are currently available.
162+
163+
- **Running**: All scripts in the script folder are uploaded to the compute target, data stores are mounted or copied, and the entry_script is executed. Outputs from stdout and the ./logs folder are streamed to the run history and can be used to monitor the run.
164+
165+
- **Post-Processing**: The ./outputs folder of the run is copied over to the run history.
166+
167+
## Save and register the model
168+
169+
Once you've trained the model, you can save and register it to your workspace. Model registration lets you store and version your models in your workspace to simplify [model management and deployment](concept-model-management-and-deployment.md).
170+
171+
Add the following code to your training script, train_iris.py, to save the model.
172+
173+
``` Python
174+
import joblib
175+
176+
joblib.dump(svm_model_linear, 'model.joblib')
177+
```
178+
179+
Register the model to your workspace with the following code.
180+
181+
```Python
182+
model = run.register_model(model_name='sklearn-iris', model_path='model.joblib')
183+
```
184+
185+
## Next steps
186+
187+
In this article, you trained and registered a Scikit-learn model on Azure Machine Learning service.
188+
189+
* To learn how to deploy a model, continue on to our [model deployment](how-to-deploy-and-where.md) article.
190+
191+
* [Tune hyperparameters](how-to-tune-hyperparameters.md)
192+
193+
* [Track run metrics during training](how-to-track-experiments.md)

articles/machine-learning/service/toc.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -174,6 +174,8 @@
174174
href: how-to-set-up-training-targets.md
175175
- name: Create estimators in training
176176
href: how-to-train-ml-models.md
177+
- name: Use Scikit-learn
178+
href: how-to-train-scikit-learn.md
177179
- name: Use TensorFlow
178180
href: how-to-train-tensorflow.md
179181
- name: Use Keras

0 commit comments

Comments
 (0)