Skip to content

Commit c71b935

Browse files
authored
Update how-to-auto-train-remote.md
1 parent 4380aff commit c71b935

File tree

1 file changed

+20
-18
lines changed

1 file changed

+20
-18
lines changed

articles/machine-learning/service/how-to-auto-train-remote.md

Lines changed: 20 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -67,35 +67,39 @@ Cluster name restrictions include:
6767

6868
## Access data using TabularDataset function
6969

70-
Defined X and y as `TabularDataset`s, which are passed to Automated ML in the AutoMLConfig. `from_delimited_files` by default sets the `infer_column_types` to true, which will infer the columns type automatically.
70+
Defined training_data as `TabularDataset`and the label , which are passed to Automated ML in the AutoMLConfig. `from_delimited_files` by default sets the `infer_column_types` to true, which will infer the columns type automatically.
7171

7272
If you do wish to manually set the column types, you can set the `set_column_types` argument to manually set the type of each columns. In the following code sample, the data comes from the sklearn package.
7373

7474
```python
75+
from sklearn import datasets
76+
from azureml.core.dataset import Dataset
77+
from scipy import sparse
78+
import numpy as np
79+
import pandas as pd
80+
import os
81+
7582
# Create a project_folder if it doesn't exist
7683
if not os.path.isdir('data'):
7784
os.mkdir('data')
7885

7986
if not os.path.exists(project_folder):
8087
os.makedirs(project_folder)
8188

82-
from sklearn import datasets
83-
from azureml.core.dataset import Dataset
84-
from scipy import sparse
85-
import numpy as np
86-
import pandas as pd
89+
X = pd.DataFrame(data_train.data[100:,:])
90+
y = pd.DataFrame(data_train.target[100:])
8791

88-
data_train = datasets.load_digits()
92+
# merge X and y
93+
label = "digit"
94+
X[label] = y
8995

90-
pd.DataFrame(data_train.data[100:,:]).to_csv("data/X_train.csv", index=False)
91-
pd.DataFrame(data_train.target[100:]).to_csv("data/y_train.csv", index=False)
96+
training_data = X
9297

98+
training_data.to_csv('data/digits.csv')
9399
ds = ws.get_default_datastore()
94100
ds.upload(src_dir='./data', target_path='digitsdata', overwrite=True, show_progress=True)
95101

96-
X = Dataset.Tabular.from_delimited_files(path=ds.path('digitsdata/X_train.csv'))
97-
y = Dataset.Tabular.from_delimited_files(path=ds.path('digitsdata/y_train.csv'))
98-
102+
training_data = Dataset.Tabular.from_delimited_files(path=ds.path('digitsdata/digits.csv'))
99103
```
100104

101105
## Create run configuration
@@ -128,22 +132,20 @@ import logging
128132

129133
automl_settings = {
130134
"name": "AutoML_Demo_Experiment_{0}".format(time.time()),
135+
"experiment_timeout_minutes" : 20,
136+
"enable_early_stopping" : True,
131137
"iteration_timeout_minutes": 10,
132-
"iterations": 20,
133138
"n_cross_validations": 5,
134139
"primary_metric": 'AUC_weighted',
135-
"preprocess": False,
136140
"max_concurrent_iterations": 10,
137-
"verbosity": logging.INFO
138141
}
139142

140143
automl_config = AutoMLConfig(task='classification',
141144
debug_log='automl_errors.log',
142145
path=project_folder,
143146
compute_target=compute_target,
144-
run_configuration=run_config,
145-
X = X,
146-
y = y,
147+
training_data=training_data,
148+
label_column_name=label,
147149
**automl_settings,
148150
)
149151
```

0 commit comments

Comments
 (0)