Skip to content

Commit 09b086c

Browse files
authored
Merge pull request #91587 from cartacioS/patch-17
New AutoML Config Inputs
2 parents 4b63946 + 9a8bb33 commit 09b086c

File tree

1 file changed

+35
-36
lines changed

1 file changed

+35
-36
lines changed

articles/machine-learning/service/how-to-configure-auto-train.md

Lines changed: 35 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -60,14 +60,14 @@ Use the `task` parameter in the `AutoMLConfig` constructor to specify your exper
6060
from azureml.train.automl import AutoMLConfig
6161

6262
# task can be one of classification, regression, forecasting
63-
automl_config = AutoMLConfig(task="classification")
63+
automl_config = AutoMLConfig(task = "classification")
6464
```
6565

6666
## Data source and format
6767

68-
Automated machine learning supports data that resides on your local desktop or in the cloud such as Azure Blob Storage. The data can be read into scikit-learn supported data formats. You can read the data into:
68+
Automated machine learning supports data that resides on your local desktop or in the cloud such as Azure Blob Storage. The data can be read into a Pandas DataFrame or an Azure Machine Learning dataset. The following code examples demonstrate how to store the data in these formats. [Learn more about datatsets](https://github.com/MicrosoftDocs/azure-docs-pr/pull/how-to-create-register-datasets.md).
6969

70-
* Numpy arrays X (features) and y (target variable, also known as label)
70+
* TabularDataset
7171
* Pandas dataframe
7272

7373
>[!Important]
@@ -77,13 +77,14 @@ Automated machine learning supports data that resides on your local desktop or i
7777
7878
Examples:
7979

80-
* Numpy arrays
80+
* TabularDataset
81+
```python
82+
from azureml.core.dataset import Dataset
8183

82-
```python
83-
digits = datasets.load_digits()
84-
X_digits = digits.data
85-
y_digits = digits.target
86-
```
84+
tabular_dataset = Dataset.Tabular.from_delimited_files("https://automldemods.blob.core.windows.net/datasets/PlayaEvents2016,_1.6MB,_3.4k-rows.cleaned.2.tsv")
85+
train_dataset, test_dataset = tabular_dataset.random_split(percentage = 0.1, seed = 42)
86+
label = "Label"
87+
```
8788

8889
* Pandas dataframe
8990

@@ -92,9 +93,8 @@ Examples:
9293
from sklearn.model_selection import train_test_split
9394

9495
df = pd.read_csv("https://automldemods.blob.core.windows.net/datasets/PlayaEvents2016,_1.6MB,_3.4k-rows.cleaned.2.tsv", delimiter="\t", quotechar='"')
95-
y_df = df["Label"]
96-
x_df = df.drop(["Label"], axis=1)
97-
x_train, x_test, y_train, y_test = train_test_split(x_df, y_df, test_size=0.1, random_state=42)
96+
train_data, test_data = train_test_split(df, test_size = 0.1, random_state = 42)
97+
label = "Label"
9898
```
9999

100100
## Fetch data for running experiment on remote compute
@@ -145,14 +145,14 @@ Some examples include:
145145
1. Classification experiment using AUC weighted as the primary metric with a max time of 12,000 seconds per iteration, with the experiment to end after 50 iterations and 2 cross-validation folds.
146146

147147
```python
148-
automl_classifier = AutoMLConfig(
148+
automl_classifier=AutoMLConfig(
149149
task='classification',
150150
primary_metric='AUC_weighted',
151151
max_time_sec=12000,
152152
iterations=50,
153153
blacklist_models='XGBoostClassifier',
154-
X=X,
155-
y=y,
154+
training_data=train_data,
155+
label_column_name=label,
156156
n_cross_validations=2)
157157
```
158158
2. Below is an example of a regression experiment set to end after 100 iterations, with each iteration lasting up to 600 seconds with 5 validation cross folds.
@@ -164,8 +164,8 @@ Some examples include:
164164
iterations=100,
165165
whitelist_models='kNN regressor'
166166
primary_metric='r2_score',
167-
X=X,
168-
y=y,
167+
training_data=train_data,
168+
label_column_name=label,
169169
n_cross_validations=5)
170170
```
171171

@@ -222,12 +222,12 @@ time_series_settings = {
222222
'max_horizon': n_test_periods
223223
}
224224

225-
automl_config = AutoMLConfig(task='forecasting',
225+
automl_config = AutoMLConfig(task = 'forecasting',
226226
debug_log='automl_oj_sales_errors.log',
227227
primary_metric='normalized_root_mean_squared_error',
228228
iterations=10,
229-
X=X_train,
230-
y=y_train,
229+
training_data=train_data,
230+
label_column_name=label,
231231
n_cross_validations=5,
232232
path=project_folder,
233233
verbosity=logging.INFO,
@@ -263,8 +263,8 @@ automl_classifier = AutoMLConfig(
263263
task='classification',
264264
primary_metric='AUC_weighted',
265265
iterations=20,
266-
X=X_train,
267-
y=y_train,
266+
training_data=train_data,
267+
label_column_name=label,
268268
n_cross_validations=5,
269269
**ensemble_settings
270270
)
@@ -277,8 +277,8 @@ automl_classifier = AutoMLConfig(
277277
task='classification',
278278
primary_metric='AUC_weighted',
279279
iterations=20,
280-
X=X_train,
281-
y=y_train,
280+
training_data=data_train,
281+
label_column_name=label,
282282
n_cross_validations=5,
283283
enable_voting_ensemble=False,
284284
enable_stack_ensemble=False
@@ -469,7 +469,7 @@ LogisticRegression
469469

470470
## Explain the model (interpretability)
471471

472-
Automated machine learning allows you to understand feature importance. During the training process, you can get global feature importance for the model. For classification scenarios, you can also get class-level feature importance. You must provide a validation dataset (X_valid) to get feature importance.
472+
Automated machine learning allows you to understand feature importance. During the training process, you can get global feature importance for the model. For classification scenarios, you can also get class-level feature importance. You must provide a validation dataset (validation_data) to get feature importance.
473473

474474
There are two ways to generate feature importance.
475475

@@ -479,7 +479,7 @@ There are two ways to generate feature importance.
479479
from azureml.train.automl.automlexplainer import explain_model
480480

481481
shap_values, expected_values, overall_summary, overall_imp, per_class_summary, per_class_imp = \
482-
explain_model(fitted_model, X_train, X_test)
482+
explain_model(fitted_model, train_data, test_data)
483483

484484
#Overall feature importance
485485
print(overall_imp)
@@ -493,16 +493,15 @@ There are two ways to generate feature importance.
493493
* To view feature importance for all iterations, set `model_explainability` flag to `True` in AutoMLConfig.
494494

495495
```python
496-
automl_config = AutoMLConfig(task = 'classification',
497-
debug_log = 'automl_errors.log',
498-
primary_metric = 'AUC_weighted',
499-
max_time_sec = 12000,
500-
iterations = 10,
501-
verbosity = logging.INFO,
502-
X = X_train,
503-
y = y_train,
504-
X_valid = X_test,
505-
y_valid = y_test,
496+
automl_config = AutoMLConfig(task='classification',
497+
debug_log='automl_errors.log',
498+
primary_metric='AUC_weighted',
499+
max_time_sec=12000,
500+
iterations=10,
501+
verbosity=logging.INFO,
502+
training_data=train_data,
503+
label_column_name=y_train,
504+
validation_data=test_data,
506505
model_explainability=True,
507506
path=project_folder)
508507
```

0 commit comments

Comments
 (0)