You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Automated machine learning supports data that resides on your local desktop or in the cloud such as Azure Blob Storage. The data can be read into scikit-learn supported data formats. You can read the data into:
68
-
* Numpy arrays X (features) and y (target variable or also known as label)
69
+
70
+
* Numpy arrays X (features) and y (target variable, also known as label)
69
71
* Pandas dataframe
70
72
71
73
>[!Important]
@@ -88,55 +90,25 @@ Examples:
88
90
```python
89
91
import pandas as pd
90
92
from sklearn.model_selection import train_test_split
For remote executions, training data must be accessible from the remote compute. The class [`Datasets`](https://docs.microsoft.com/python/api/azureml-core/azureml.core.dataset.dataset?view=azure-ml-py) in the SDK exposes functionality to:
118
103
119
-
Define X and y as dprep reference, which will be passed to automated machine learning `AutoMLConfig` object similar to below:
104
+
* easily transfer data from static files or URL sources into your workspace
105
+
* make your data available to training scripts when running on cloud compute resources
120
106
121
-
```python
122
-
123
-
X = dprep.auto_read_file(path=ds.path('digitsdata/X_train.csv'))
124
-
y = dprep.auto_read_file(path=ds.path('digitsdata/y_train.csv'))
See the [how-to](how-to-train-with-datasets.md#option-2--mount-files-to-a-remote-compute-target) for an example of using the `Dataset` class to mount data to your compute target.
136
108
137
109
## Train and validation data
138
110
139
-
You can specify separate train and validation set directly in the `AutoMLConfig` method.
111
+
You can specify separate train and validation sets directly in the `AutoMLConfig`constructor.
140
112
141
113
### K-Folds Cross Validation
142
114
@@ -170,7 +142,7 @@ There are several options that you can use to configure your automated machine l
170
142
171
143
Some examples include:
172
144
173
-
1. Classification experiment using AUC weighted as the primary metric with a max time of 12,000 seconds per iteration, with the experiment to end after 50 iterations and 2 crossvalidation folds.
145
+
1. Classification experiment using AUC weighted as the primary metric with a max time of 12,000 seconds per iteration, with the experiment to end after 50 iterations and 2 cross-validation folds.
174
146
175
147
```python
176
148
automl_classifier = AutoMLConfig(
@@ -197,12 +169,10 @@ Some examples include:
197
169
n_cross_validations=5)
198
170
```
199
171
200
-
The three different `task` parameter values determine the list of models to apply. Use the `whitelist` or `blacklist` parameters to further modify iterations with the available models to include or exclude. The list of supported models can be found on [SupportedModels Class](https://docs.microsoft.com/en-us/python/api/azureml-train-automl/azureml.train.automl.constants.supportedmodels?view=azure-ml-py).
172
+
The three different `task` parameter values (the third task-type is `forecasting`, and uses the same algorithm pool as `regression` tasks) determine the list of models to apply. Use the `whitelist` or `blacklist` parameters to further modify iterations with the available models to include or exclude. The list of supported models can be found on [SupportedModels Class](https://docs.microsoft.com/en-us/python/api/azureml-train-automl/azureml.train.automl.constants.supportedmodels?view=azure-ml-py).
201
173
202
174
### Primary Metric
203
-
The primary metric; as shown in the examples above determines the metric to be used during model training for optimization. The primary metric you can select is determined by the task type you choose. Below is a list of available metrics.
204
-
205
-
Learn about the specific definitions of these in [Understand automated machine learning results](how-to-understand-automated-ml.md).
175
+
The primary metric determines the metric to be used during model training for optimization. The available metrics you can select is determined by the task type you choose, and the following table shows valid primary metrics for each task type.
206
176
207
177
|Classification | Regression | Time Series Forecasting
208
178
|-- |-- |--
@@ -212,9 +182,11 @@ Learn about the specific definitions of these in [Understand automated machine l
Learn about the specific definitions of these in [Understand automated machine learning results](how-to-understand-automated-ml.md).
186
+
215
187
### Data preprocessing & featurization
216
188
217
-
In every automated machine learning experiment, your data is [automatically scaled and normalized](concept-automated-ml.md#preprocess) to help algorithms perform well. However, you can also enable additional preprocessing/featurization, such as missing values imputation, encoding, and transforms. [Learn more about what featurization is included](how-to-create-portal-experiments.md#preprocess).
189
+
In every automated machine learning experiment, your data is [automatically scaled and normalized](concept-automated-ml.md#preprocess) to help *certain*algorithms that are sensitive to features that are on different scales. However, you can also enable additional preprocessing/featurization, such as missing values imputation, encoding, and transforms. [Learn more about what featurization is included](how-to-create-portal-experiments.md#preprocess).
218
190
219
191
To enable this featurization, specify `"preprocess": True` for the [`AutoMLConfig` class](https://docs.microsoft.com/python/api/azureml-train-automl/azureml.train.automl.automlconfig?view=azure-ml-py).
220
192
@@ -225,12 +197,13 @@ To enable this featurization, specify `"preprocess": True` for the [`AutoMLConfi
225
197
> your input data automatically.
226
198
227
199
### Time Series Forecasting
228
-
For time series forecasting task type you have additional parameters to define.
229
-
1. time_column_name - This is a required parameter which defines the name of the column in your training data containing date/time series.
230
-
1. max_horizon - This defines the length of time you want to predict out based on the periodicity of the training data. For example if you have training data with daily time grains, you define how far out in days you want the model to train for.
231
-
1. grain_column_names - This defines the name of columns which contain individual time series data in your training data. For example, if you are forecasting sales of a particular brand by store, you would define store and brand columns as your grain columns.
200
+
The time series `forecasting` task requires additional parameters in the configuration object:
201
+
202
+
1.`time_column_name`: Required parameter that defines the name of the column in your training data containing a valid time-series.
203
+
1.`max_horizon`: Defines the length of time you want to predict out based on the periodicity of the training data. For example if you have training data with daily time grains, you define how far out in days you want the model to train for.
204
+
1.`grain_column_names`: Defines the name of columns which contain individual time series data in your training data. For example, if you are forecasting sales of a particular brand by store, you would define store and brand columns as your grain columns. Separate time-series and forecasts will be created for each grain/grouping.
232
205
233
-
See example of these settings being used below, notebook example is available [here](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/automated-machine-learning/forecasting-orange-juice-sales/auto-ml-forecasting-orange-juice-sales.ipynb).
206
+
For examples of the settings used below, see the [sample notebook](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/automated-machine-learning/forecasting-orange-juice-sales/auto-ml-forecasting-orange-juice-sales.ipynb).
234
207
235
208
```python
236
209
# Setting Store and Brand as grains for training.
@@ -339,11 +312,11 @@ run = experiment.submit(automl_config, show_output=True)
339
312
>Setting `show_output` to `True` results in output being shown on the console.
340
313
341
314
### Exit Criteria
342
-
There a few options you can define to complete your experiment.
343
-
1. No Criteria - If you do not define any exit parameters the experiment will continue until no further progress is made on your primary metric.
344
-
1. Number of iterations - You define the number of iterations for the experiment to run. You can optional add iteration_timeout_minutes to define a time limit in minutes per each iteration.
345
-
1. Exit after a length of time - Using experiment_timeout_minutes in your settings you can define how long in minutes should an experiment continue in run.
346
-
1. Exit after a score has been reached - Using experiment_exit_score you can choose to complete the experiment after a score based on your primary metric has been reached.
315
+
There are a few options you can define to end your experiment.
316
+
1. No Criteria: If you do not define any exit parameters the experiment will continue until no further progress is made on your primary metric.
317
+
1. Number of iterations: You define the number of iterations for the experiment to run. You can optionally add `iteration_timeout_minutes` to define a time limit in minutes per each iteration.
318
+
1. Exit after a length of time: Using `experiment_timeout_minutes` in your settings allows you to define how long in minutes should an experiment continue in run.
319
+
1. Exit after a score has been reached: Using `experiment_exit_score` will complete the experiment after a primary metric score has been reached.
0 commit comments