Based on: AutoGluon
AutoGluon wraps the AutoGluon TabularPredictor to provide fully automated model selection, hyperparameter tuning, and stacking/ensembling within an Octopus workflow. Unlike Octo, which exposes fine-grained control over optimization, AutoGluon aims for a hands-off experience: you configure a quality preset and a time budget, and AutoGluon handles the rest.
-
Initialize the TabularPredictor. A
TabularPredictoris created with the target column, evaluation metric (mapped from Octopus metric names to AutoGluon scorers), and verbosity level. -
Fit on training data. AutoGluon's
fit()method is called with the combined feature + target DataFrame. Internally, AutoGluon:- Performs automatic feature engineering (type inference, missing value handling, encoding).
- Trains a portfolio of model types (controlled by
included_model_typesor the full default set). - Tunes hyperparameters using the strategy defined by the
presets. - Builds multi-layer stacking ensembles when using higher-quality presets
(
"good_quality"and above). - Uses
num_bag_foldsfor bagging/cross-validation within each model.
-
Evaluate performance. After training, the module evaluates on train, dev (out-of-fold), and test partitions. Scores are computed using both AutoGluon's built-in metrics and Octopus's metric implementations for cross-comparison.
-
Feature importance. Permutation feature importance is computed on the test set using AutoGluon's
feature_importance()method with confidence bands (15 shuffle sets, 95% confidence). If feature groups are defined, group-level importances are also calculated. -
Sklearn-compatible model. The fitted AutoGluon predictor is wrapped in a sklearn-compatible class (
SklearnClassifierorSklearnRegressor) so that downstream Octopus code (e.g., feature importance methods) can use it seamlessly. -
No feature selection. AutoGluon does not perform feature selection -- it returns all input features. To select features, place AutoGluon after a feature-selection module in the workflow.
When included_model_types is not set, AutoGluon considers all available
model families:
| Code | Model |
|---|---|
GBM |
LightGBM |
CAT |
CatBoost |
XGB |
XGBoost |
RF |
Random Forest |
XT |
Extra Trees |
KNN |
K-Nearest Neighbors |
LR |
Linear/Logistic Regression |
NN_TORCH |
PyTorch Neural Network |
FASTAI |
FastAI Neural Network |
| Parameter | Default | Description |
|---|---|---|
presets |
["medium_quality"] |
Quality presets: "best_quality", "high_quality", "good_quality", "medium_quality" |
time_limit |
None |
Total training time in seconds |
num_bag_folds |
5 |
Bagging folds |
included_model_types |
None |
Restrict to specific model types (see table above) |
fit_strategy |
"sequential" |
"sequential" or "parallel" |
verbosity |
2 |
Logging level (0--4) |
num_cpus |
"auto" |
CPUs to allocate |
memory_limit |
"auto" |
Memory limit in GB |
AutoGluon is ideal when:
- You want a fully automated baseline with minimal configuration effort.
- You want to compare Octo's manually-configured pipeline against an AutoML approach.
- You need access to model types not available in Octo (e.g., neural networks, KNN, linear models, LightGBM).
- Time-constrained scenarios where setting a
time_limitand apresetslevel is sufficient.
- AutoGluon does not perform feature selection. All input features are passed through. Combine it with upstream feature-selection modules if needed.
- Requires the
autogluonoptional dependency (pip install octopus[autogluon]). - Higher-quality presets (
"best_quality","high_quality") use multi-layer stacking which is memory-intensive and can be slow. - The module integrates with Ray for resource management, which can conflict with Octo's own Ray usage if not configured carefully.