You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Additional advanced preprocessing and featurization are also available, such as data guardrails, encoding, and transforms. [Learn more about what featurization is included](how-to-create-portal-experiments.md#preprocess). Enable this setting with:
96
+
Additional advanced preprocessing and featurization are also available, such as data guardrails, encoding, and transforms. [Learn more about what featurization is included](how-to-create-portal-experiments.md#featurization).
97
+
Enable this setting with:
97
98
98
-
+ Azure Machine Learning studio: Selecting the **View featurization settings** in the **Configuration Run** section [with these steps](how-to-create-portal-experiments.md).
99
+
+ Azure Machine Learning studio: Enable **Automatic featurization** in the **View additional configuration** section [with these steps](how-to-create-portal-experiments.md#create-and-run-experiment).
99
100
100
-
+ Python SDK: Specifying `"feauturization": auto' / 'off' / FeaturizationConfig` for the [`AutoMLConfig` class](/python/api/azureml-train-automl-client/azureml.train.automl.automlconfig.automlconfig).
101
+
+ Python SDK: Specifying `"feauturization": 'auto' / 'off' / 'FeaturizationConfig'` for the [`AutoMLConfig` class](/python/api/azureml-train-automl-client/azureml.train.automl.automlconfig.automlconfig).
Copy file name to clipboardExpand all lines: articles/machine-learning/how-to-configure-auto-train.md
+11-5Lines changed: 11 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -184,14 +184,20 @@ Learn about the specific definitions of these metrics in [Understand automated m
184
184
185
185
### Data featurization
186
186
187
-
In every automated machine learning experiment, your data is [automatically scaled and normalized](concept-automated-ml.md#preprocess) to help *certain* algorithms that are sensitive to features that are on different scales. However, you can also enable additional featurization, such as missing values imputation, encoding, and transforms. [Learn more about what featurization is included](how-to-create-portal-experiments.md#preprocess).
187
+
In every automated machine learning experiment, your data is [automatically scaled and normalized](concept-automated-ml.md#preprocess) to help *certain* algorithms that are sensitive to features that are on different scales. However, you can also enable additional featurization, such as missing values imputation, encoding, and transforms. [Learn more about what featurization is included](how-to-create-portal-experiments.md#featurization).
188
188
189
-
To enable this featurization, specify `"featurization": 'auto'`for the [`AutoMLConfig`class](https://docs.microsoft.com/python/api/azureml-train-automl/azureml.train.automl.automlconfig?view=azure-ml-py).
189
+
When configuring your experiments, you can enable the advanced setting `featurization`. The following table shows the accepted settings for featurization in the [`AutoMLConfig`class](https://docs.microsoft.com/python/api/azureml-train-automl/azureml.train.automl.automlconfig?view=azure-ml-py).
190
+
191
+
|Featurization Configuration | Description |
192
+
|-------------|-------------|
193
+
|`"featurization":` `'FeaturizationConfig'`| Indicates customized featurization step should be used. [Learn how to customize featurization](how-to-configure-auto-train.md#customize-feature-engineering).|
194
+
|`"featurization": 'off'`| Indicates featurization step should not be done automatically.|
195
+
|`"featurization": 'auto'`| Indicates that as part of preprocessing, [data guardrails and featurization steps](how-to-create-portal-experiments.md#advanced-featurization-options) are performed automatically.|
Copy file name to clipboardExpand all lines: articles/machine-learning/how-to-create-portal-experiments.md
+19-23Lines changed: 19 additions & 23 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -10,7 +10,7 @@ ms.author: nibaccam
10
10
author: tsikiksr
11
11
manager: cgronlun
12
12
ms.reviewer: nibaccam
13
-
ms.date: 11/04/2019
13
+
ms.date: 02/04/2020
14
14
15
15
---
16
16
@@ -43,7 +43,7 @@ Otherwise, you'll see a list of your recent automated machine learning experimen
43
43
44
44
## Create and run experiment
45
45
46
-
1. Select **+ Create Experiment** and populate the form.
46
+
1. Select **+ New automated ML run** and populate the form.
47
47
48
48
1. Select a dataset from your storage container, or create a new dataset. Datasets can be created from local files, web urls, datastores, or Azure open datasets.
49
49
@@ -109,16 +109,19 @@ Otherwise, you'll see a list of your recent automated machine learning experimen
109
109
110
110
1. Select forecast horizon: Indicate how many time units (minutes/hours/days/weeks/months/years) will the model be able to predict to the future. The further the model is required to predict into the future, the less accurate it will become. [Learn more about forecasting and forecast horizon](how-to-auto-train-forecast.md).
111
111
112
-
1. (Optional) Addition configurations: additional settings you can use to better control the training job. Otherwise, defaults are applied based on experiment selection and data.
112
+
1. (Optional) View addition configuration settings: additional settings you can use to better control the training job. Otherwise, defaults are applied based on experiment selection and data.
113
113
114
114
Additional configurations|Description
115
115
------|------
116
116
Primary metric| Main metric used for scoring your model. [Learn more about model metrics](how-to-configure-auto-train.md#explore-model-metrics).
117
-
Automatic featurization| Select to enable or disable the preprocessing done by automated machine learning. Preprocessing includes automatic data cleansing, preparing, and transformation to generate synthetic features. [Learn more about preprocessing](#preprocess).
117
+
Automatic featurization| Select to enable or disable the preprocessing done by automated machine learning. Preprocessing includes automatic data cleansing, preparing, and transformation to generate synthetic features. Not supported for the time series forecasting task type. [Learn more about preprocessing](#featurization).
118
+
Explain best model | Select to enable or disable to show explainability of the recommended best model
118
119
Blocked algorithm| Select algorithms you want to exclude from the training job.
119
120
Exit criterion| When any of these criteria are met, the training job is stopped. <br> *Training job time (hours)*: How long to allow the training job to run. <br> *Metric score threshold*: Minimum metric score for all pipelines. This ensures that if you have a defined target metric you want to reach, you do not spend more time on the training job than necessary.
120
121
Validation| Select one of the cross validation options to use in the training job. [Learn more about cross validation](how-to-configure-auto-train.md).
121
-
Concurrency| *Max concurrent iterations*: Maximum number of pipelines (iterations) to test in the training job. The job will not run more than the specified number of iterations. <br> *Max cores per iteration*: Select the multi-core limits you would like to use when using multi-core compute.
122
+
Concurrency| *Max concurrent iterations*: Maximum number of pipelines (iterations) to test in the training job. The job will not run more than the specified number of iterations.
123
+
124
+
1. (Optional) View featurization settings: if you choose to enable **Automatic featurization** in the **Additional configuration settings** form, this form is where you specify which columns to perform those featurizations on, and select which statistical value to use for missing value imputations.
122
125
123
126
<aname="profile"></a>
124
127
@@ -147,17 +150,13 @@ Skewness| Measure of how different this column's data is from a normal distribut
147
150
Kurtosis| Measure of how heavily tailed this column's data is compared to a normal distribution.
148
151
149
152
150
-
<aname="preprocess"></a>
153
+
<aname="featurization"></a>
151
154
152
155
## Advanced featurization options
153
156
154
-
When configuring your experiments, you can enable the advanced setting `feauturization`.
157
+
Automated machine learning offers preprocessing and data guardrails automatically, to help you identify and manage potential issues with your data.
155
158
156
-
|Featurization Configuration | Description |
157
-
| ------------- | ------------- |
158
-
|"feauturization" = 'FeaturizationConfig'| Indicates customized featurization step should be used. [Learn how to customize featurization](how-to-configure-auto-train.md#customize-feature-engineering).|
159
-
|"feauturization" = 'off'| Indicates featurization step should not be done automatically.|
160
-
|"feauturization" = 'auto'| Indicates that as part of preprocessing the following data guardrails and featurization steps are performed automatically.|
159
+
### Preprocessing
161
160
162
161
|Preprocessing steps| Description |
163
162
| ------------- | ------------- |
@@ -173,7 +172,7 @@ When configuring your experiments, you can enable the advanced setting `feauturi
173
172
174
173
### Data guardrails
175
174
176
-
Automated machine learning offers data guardrails to help you identify potential issues with your data (e.g., missing values, class imbalance) and help take corrective actions for improved results. There are many best practices that are available and can be applied to achieve reliable results.
175
+
Data guardrails are applied automatically to help you identify potential issues with your data (e.g., missing values, class imbalance) and help take corrective actions for improved results. There are many best practices that are available and can be applied to achieve reliable results.
177
176
178
177
The following table describes the currently supported data guardrails, and the associated statuses that users may come across when submitting their experiment.
179
178
@@ -187,14 +186,11 @@ Time-series data consistency|**Passed** <br><br><br><br> **Fixed** |<br> The sel
187
186
188
187
## Run experiment and view results
189
188
190
-
Select **Start** to run your experiment. The experiment preparing process can take up to 10 minutes. Training jobs can take an additional 2-3 minutes more for each pipeline to finish running.
189
+
Select **Finish** to run your experiment. The experiment preparing process can take up to 10 minutes. Training jobs can take an additional 2-3 minutes more for each pipeline to finish running.
191
190
192
191
### View experiment details
193
192
194
-
>[!NOTE]
195
-
> Select **Refresh** periodically to view the status of the run.
196
-
197
-
The **Run Detail** screen opens to the **Details** tab. This screen shows you a summary of the experiment run including the **Run status**.
193
+
The **Run Detail** screen opens to the **Details** tab. This screen shows you a summary of the experiment run including a status bar at the top next to the run number.
198
194
199
195
The **Models** tab contains a list of the models created ordered by the metric score. By default, the model that scores the highest based on the chosen metric is at the top of the list. As the training job tries out more models, they are added to the list. Use this to get a quick comparison of the metrics for the models produced so far.
200
196
@@ -214,18 +210,18 @@ Automated ML helps you with deploying the model without writing code:
214
210
215
211
1. You have a couple options for deployment.
216
212
217
-
+ Option 1: To deploy the best model (according to the metric criteria you defined), select Deploy Best Model from the Details tab.
213
+
+ Option 1: To deploy the best model (according to the metric criteria you defined), select the **Deploy best model** button on the **Details** tab.
218
214
219
-
+ Option 2: To deploy a specific model iteration from this experiment, drill down on the model to open its Model details tab and select Deploy Model.
215
+
+ Option 2: To deploy a specific model iteration from this experiment, drill down on the model to open its **Model details** tab and select **Deploy model**.
220
216
221
-
1. Populate the **Deploy Model** pane.
217
+
1. Populate the **Deploy model** pane.
222
218
223
219
Field| Value
224
220
----|----
225
221
Name| Enter a unique name for your deployment.
226
222
Description| Enter a description to better identify what this deployment is for.
227
223
Compute type| Select the type of endpoint you want to deploy: *Azure Kubernetes Service (AKS)* or *Azure Container Instance (ACI)*.
228
-
Name| *Applies to AKS only:* Select the name of the AKS cluster you wish to deploy to.
224
+
Compute name| *Applies to AKS only:* Select the name of the AKS cluster you wish to deploy to.
229
225
Enable authentication | Select to allow for token-based or key-based authentication.
230
226
Use custom deployment assets| Enable this feature if you want to upload your own scoring script and environment file. [Learn more about scoring scripts](how-to-deploy-and-where.md#script).
231
227
@@ -240,7 +236,7 @@ Now you have an operational web service to generate predictions! You can test th
240
236
241
237
## Next steps
242
238
243
-
* Try the end to end [tutorial for creating your first automated ML experiment with Azure Machine Learning](tutorial-first-experiment-automated-ml.md).
239
+
* Try the end to end [tutorial for creating your first automated ML experiment with Azure Machine Learning studio](tutorial-first-experiment-automated-ml.md).
244
240
*[Learn more about automated machine learning](concept-automated-ml.md) and Azure Machine Learning.
Copy file name to clipboardExpand all lines: articles/machine-learning/tutorial-first-experiment-automated-ml.md
+13-9Lines changed: 13 additions & 9 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -9,7 +9,7 @@ ms.topic: tutorial
9
9
ms.author: tzvikei
10
10
author: tsikiksr
11
11
ms.reviewer: nibaccam
12
-
ms.date: 11/04/2019
12
+
ms.date: 02/04/2020
13
13
14
14
# Customer intent: As a non-coding data scientist, I want to use automated machine learning techniques so that I can build a classification model.
15
15
---
@@ -66,12 +66,16 @@ You complete the following experiment set-up and run steps in Azure Machine Lear
66
66
67
67
1. Create a new dataset by selecting **From local files** from the **+Create dataset** drop-down.
68
68
69
+
1. On the **Basic info** form, give your dataset a name and provide an optional description. Automated ML in Azure Machine Learning studio currently only supports tabular datasets, so the dataset type should default to Tabular.
70
+
71
+
1. Select **Next** on the bottom left
72
+
73
+
1. On the **Datastore and file selection** form, select the default datastore that was automatically set up during your workspace creation, **workspaceblobstore (Azure Blob Storage)**. This is where you'll upload your data file to make it available to your workspace.
74
+
69
75
1. Select **Browse**.
70
76
71
77
1. Choose the **bankmarketing_train.csv** file on your local computer. This is the file you downloaded as a [prerequisite](https://automlsamplenotebookdata.blob.core.windows.net/automl-sample-notebook-data/bankmarketing_train.csv).
72
78
73
-
1. Select **Tabular** as your dataset type.
74
-
75
79
1. Give your dataset a unique name and provide an optional description.
76
80
77
81
1. Select **Next** on the bottom left, to upload it to the default container that was automatically set up during your workspace creation.
@@ -133,18 +137,18 @@ You complete the following experiment set-up and run steps in Azure Machine Lear
133
137
Blocked algorithms | Algorithms you want to exclude from the training job| None
134
138
Exit criterion| If a criteria is met, the training job is stopped. |Training job time (hours): 1 <br> Metric score threshold: None
135
139
Validation | Choose a cross-validation type and number of tests.|Validation type:<br> k-fold cross-validation <br> <br> Number of validations: 2
136
-
Concurrency| The maximum number of parallel iterations executed and cores used per iteration| Max concurrent iterations: 5<br> Max cores per iteration: None
140
+
Concurrency| The maximum number of parallel iterations executed per iteration| Max concurrent iterations: 5
137
141
138
142
Select **Save**.
139
143
140
-
1. Select **Finish** to run the experiment. The **Run Detail** screen opens with the **Run status** as the experiment preparation begins.
144
+
1. Select **Finish** to run the experiment. The **Run Detail** screen opens with the **Run status**at the top as the experiment preparation begins.
141
145
142
146
>[!IMPORTANT]
143
147
> Preparation takes **10-15 minutes** to prepare the experiment run.
144
148
> Once running, it takes **2-3 minutes more for each iteration**.
145
149
> Select **Refresh** periodically to see the status of the run as the experiment progresses.
146
150
>
147
-
> In production, you'd likely walk away for a bit. But for this tutorial, we suggest you start exploring the tested algorithms on the Models tab as they complete while the others are still running.
151
+
> In production, you'd likely walk away for a bit. But for this tutorial, we suggest you start exploring the tested algorithms on the **Models** tab as they complete while the others are still running.
148
152
149
153
## Explore models
150
154
@@ -162,11 +166,11 @@ Automated machine learning in Azure Machine Learning studio allows you to deploy
162
166
163
167
For this experiment, deployment to a web service means that the financial institution now has an iterative and scalable web solution for identifying potential fixed term deposit customers.
164
168
165
-
Once the run is complete, navigate back to the **Run Detail** page and select the **Models** tab. Select **Refresh**.
169
+
Once the run is complete, navigate back to the **Run Detail** page and select the **Models** tab.
166
170
167
171
In this experiment context, **VotingEnsemble** is considered the best model, based on the **AUC_weighted** metric. We deploy this model, but be advised, deployment takes about 20 minutes to complete. The deployment process entails several steps including registering the model, generating resources, and configuring them for the web service.
168
172
169
-
1. Select the **Deploy Best Model** button in the bottom-left corner.
173
+
1. Select the **Deploy best model** button in the bottom-left corner.
170
174
171
175
1. Populate the **Deploy a model** pane as follows:
172
176
@@ -214,7 +218,7 @@ In this automated machine learning tutorial, you used Azure Machine Learning stu
214
218
> [!div class="nextstepaction"]
215
219
> [Consume a web service](how-to-consume-web-service.md#consume-the-service-from-power-bi)
216
220
217
-
+ Learn more about [preprocessing](how-to-create-portal-experiments.md#preprocess).
221
+
+ Learn more about [featurization](how-to-create-portal-experiments.md#featurization).
218
222
+ Learn more about [data profiling](how-to-create-portal-experiments.md#profile).
219
223
+ Learn more about [automated machine learning](concept-automated-ml.md).
220
224
+ For more information on classification metrics and charts, see the [Understand automated machine learning results](how-to-understand-automated-ml.md#classification) article.
0 commit comments