You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
|**Larger than 20,000 rows**| Train/validation data split is applied. The default is to take 10% of the initial training data set as the validation set. In turn, that validation set is used for metrics calculation.
127
-
|**Smaller than 20,000 rows**| Cross-validation approach is applied. The default number of folds depends on the number of rows. <br> **If the dataset is less than 1,000 rows**, 10 folds are used. <br> **If the rows are between 1,000 and 20,000**, then three folds are used.
127
+
|**Smaller than or equal to 20,000 rows**| Cross-validation approach is applied. The default number of folds depends on the number of rows. <br> **If the dataset is fewer than 1,000 rows**, 10 folds are used. <br> **If the rows are equal to or between 1,000 and 20,000**, then three folds are used.
128
128
129
-
### Large data
130
-
131
-
Automated ML supports a limited number of algorithms for training on large data that can successfully build models for big data on small virtual machines. Automated ML heuristics depend on properties such as data size, virtual machine memory size, experiment timeout and featurization settings to determine if these large data algorithms should be applied. [Learn more about what models are supported in automated ML](#supported-algorithms).
132
-
133
-
* For regression, [Online Gradient Descent Regressor](/python/api/nimbusml/nimbusml.linear_model.onlinegradientdescentregressor?preserve-view=true&view=nimbusml-py-latest) and
134
-
[Fast Linear Regressor](/python/api/nimbusml/nimbusml.linear_model.fastlinearregressor?preserve-view=true&view=nimbusml-py-latest)
135
-
136
-
* For classification, [Averaged Perceptron Classifier](/python/api/nimbusml/nimbusml.linear_model.averagedperceptronbinaryclassifier?preserve-view=true&view=nimbusml-py-latest) and [Linear SVM Classifier](/python/api/nimbusml/nimbusml.linear_model.linearsvmbinaryclassifier?preserve-view=true&view=nimbusml-py-latest); where the Linear SVM classifier has both large data and small data versions.
137
-
138
-
If you want to override these heuristics, apply the following settings:
139
-
140
-
Task | Setting | Notes
141
-
|---|---|---
142
-
Block data streaming algorithms | Use the `blocked_algorithms` parameter in the `set_training()` function and list the model(s) you don't want to use. | Results in either run failure or long run time
143
-
Use data streaming algorithms| Use the `allowed_algorithms` parameter in the `set_training()` function and list the model(s) you want to use.|
144
-
Use data streaming algorithms <br> [(studio UI experiments)](how-to-use-automated-ml-for-ml-models.md#create-and-run-experiment)|Block all models except the big data algorithms you want to use. |
145
129
146
130
## Compute to run experiment
147
131
@@ -206,15 +190,33 @@ Automated machine learning tries different models and algorithms during the auto
206
190
The task method determines the list of algorithms/models, to apply. Use the `allowed_algorithms` or `blocked_algorithms` parameters in the `set_training()` setter function to further modify iterations with the available models to include or exclude.
207
191
208
192
In the following list of links you can explore the supported algorithms per machine learning task listed below.
209
-
210
-
* Classification Algorithms (Tabular Data)
211
-
* Regression Algorithms (Tabular Data)
212
-
* Time Series Forecasting Algorithms (Tabular Data)
193
+
194
+
Classification | Regression | Time Series Forecasting
0 commit comments