emdgroup · nihaase · Feb 16, 2026
diff --git a/docs/concepts/concepts.md b/docs/concepts/concepts.md
@@ -5,12 +5,13 @@ This section explains the core concepts and methodologies used in Octopus to hel
 ## What You'll Learn
 
 - **Nested Cross-Validation** - Understand the nested CV approach that makes Octopus suitable for small datasets
-- **Core Concepts** - Key terms, architecture, and how Octopus works internally
+- **Workflow & Modules** - How to chain feature selection and ML modules into multi-step pipelines
 - **Understanding Results** - How to interpret and use the predictions and metrics from Octopus
 
 ## Quick Navigation
 
 - [Nested Cross-Validation](nested_cv.md) - Learn about the unique CV strategy that prevents overfitting
+- [Workflow & Modules](workflow/index.md) - Build pipelines that progressively reduce features and train models
 - [Understanding Results](understanding_results.md) - How to read and use Octopus outputs
 
 If you're new to Octopus, we recommend starting with "Nested Cross-Validation" to understand why this tool is different.
diff --git a/docs/concepts/workflow/SUMMARY.md b/docs/concepts/workflow/SUMMARY.md
@@ -0,0 +1,12 @@
+- [Workflow & Modules](index.md)
+- Feature Selection
+    - [Boruta](boruta.md)
+    - [EFS](efs.md)
+    - [MRMR](mrmr.md)
+    - [RFE](rfe.md)
+    - [RFE2](rfe2.md)
+    - [ROC](roc.md)
+    - [SFS](sfs.md)
+- Machine Learning
+    - [AutoGluon](autogluon.md)
+    - [Octo](octo.md)
diff --git a/docs/concepts/workflow/autogluon.md b/docs/concepts/workflow/autogluon.md
@@ -0,0 +1,98 @@
+# AutoGluon
+
+*Based on: [AutoGluon](https://github.com/autogluon/autogluon)*
+
+AutoGluon wraps the [AutoGluon TabularPredictor](https://auto.gluon.ai/) to
+provide fully automated model selection, hyperparameter tuning, and
+stacking/ensembling within an Octopus workflow. Unlike Octo, which exposes
+fine-grained control over optimization, AutoGluon aims for a hands-off
+experience: you configure a quality preset and a time budget, and AutoGluon
+handles the rest.
+
+## How it works
+
+1. **Initialize the TabularPredictor.** A `TabularPredictor` is created with the
+   target column, evaluation metric (mapped from Octopus metric names to
+   AutoGluon scorers), and verbosity level.
+
+2. **Fit on training data.** AutoGluon's `fit()` method is called with the
+   combined feature + target DataFrame. Internally, AutoGluon:
+    - Performs automatic feature engineering (type inference, missing value
+      handling, encoding).
+    - Trains a portfolio of model types (controlled by `included_model_types` or
+      the full default set).
+    - Tunes hyperparameters using the strategy defined by the `presets`.
+    - Builds multi-layer stacking ensembles when using higher-quality presets
+      (`"good_quality"` and above).
+    - Uses `num_bag_folds` for bagging/cross-validation within each model.
+
+3. **Evaluate performance.** After training, the module evaluates on train, dev
+   (out-of-fold), and test partitions. Scores are computed using both
+   AutoGluon's built-in metrics and Octopus's metric implementations for
+   cross-comparison.
+
+4. **Feature importance.** Permutation feature importance is computed on the test
+   set using AutoGluon's `feature_importance()` method with confidence bands
+   (15 shuffle sets, 95% confidence). If feature groups are defined, group-level
+   importances are also calculated.
+
+5. **Sklearn-compatible model.** The fitted AutoGluon predictor is wrapped in a
+   sklearn-compatible class (`SklearnClassifier` or `SklearnRegressor`) so that
+   downstream Octopus code (e.g., feature importance methods) can use it
+   seamlessly.
+
+6. **No feature selection.** AutoGluon does not perform feature selection -- it
+   returns all input features. To select features, place AutoGluon after a
+   feature-selection module in the workflow.
+
+## Supported model types
+
+When `included_model_types` is not set, AutoGluon considers all available
+model families:
+
+| Code | Model |
+|------|-------|
+| `GBM` | LightGBM |
+| `CAT` | CatBoost |
+| `XGB` | XGBoost |
+| `RF` | Random Forest |
+| `XT` | Extra Trees |
+| `KNN` | K-Nearest Neighbors |
+| `LR` | Linear/Logistic Regression |
+| `NN_TORCH` | PyTorch Neural Network |
+| `FASTAI` | FastAI Neural Network |
+
+## Key parameters
+
+| Parameter | Default | Description |
+|-----------|---------|-------------|
+| `presets` | `["medium_quality"]` | Quality presets: `"best_quality"`, `"high_quality"`, `"good_quality"`, `"medium_quality"` |
+| `time_limit` | `None` | Total training time in seconds |
+| `num_bag_folds` | `5` | Bagging folds |
+| `included_model_types` | `None` | Restrict to specific model types (see table above) |
+| `fit_strategy` | `"sequential"` | `"sequential"` or `"parallel"` |
+| `verbosity` | `2` | Logging level (0--4) |
+| `num_cpus` | `"auto"` | CPUs to allocate |
+| `memory_limit` | `"auto"` | Memory limit in GB |
+
+## When to use
+
+AutoGluon is ideal when:
+
+- You want a **fully automated baseline** with minimal configuration effort.
+- You want to **compare** Octo's manually-configured pipeline against an
+  AutoML approach.
+- You need access to model types not available in Octo (e.g., neural networks,
+  KNN, linear models, LightGBM).
+- Time-constrained scenarios where setting a `time_limit` and a `presets` level
+  is sufficient.
+
+## Limitations
+
+- AutoGluon **does not perform feature selection**. All input features are passed
+  through. Combine it with upstream feature-selection modules if needed.
+- Requires the `autogluon` optional dependency (`pip install octopus[autogluon]`).
+- Higher-quality presets (`"best_quality"`, `"high_quality"`) use multi-layer
+  stacking which is memory-intensive and can be slow.
+- The module integrates with Ray for resource management, which can conflict with
+  Octo's own Ray usage if not configured carefully.
diff --git a/docs/concepts/workflow/boruta.md b/docs/concepts/workflow/boruta.md
@@ -0,0 +1,72 @@
+# Boruta -- Shadow-Feature Statistical Test
+
+
+Boruta is a statistically principled, "all-relevant" feature selection method.
+Unlike most other modules that select a fixed-size subset, Boruta asks a
+different question: *which features are genuinely more important than random
+noise?* It answers this by creating "shadow" copies of every feature, training a
+model on both real and shadow features, and using a statistical test to decide
+which real features carry true signal.
+
+## How it works
+
+1. **Hyperparameter optimization.** A `GridSearchCV` tunes the tree-based model
+   (RandomForest, ExtraTrees, or XGBoost) on the full feature set. Only
+   tree-based models are supported because Boruta relies on
+   `feature_importances_` from the trained model.
+
+2. **Shadow feature generation.** For every real feature, a "shadow" copy is
+   created by randomly permuting its values across samples. This destroys any
+   relationship with the target while preserving the marginal distribution.
+
+3. **Iterative importance comparison.** Over multiple rounds:
+    - A model is trained on the combined real + shadow feature set.
+    - The maximum importance among all shadow features in this round is recorded
+      (the "shadow max").
+    - Each real feature's importance is compared to the shadow max.
+    - A hit counter tracks how often each real feature exceeds the shadow max.
+
+4. **Statistical testing.** After all rounds, a binomial test (with Bonferroni
+   correction for multiple testing) is applied to each real feature's hit count:
+    - **Confirmed**: The feature is significantly more important than random
+      noise at the `alpha` significance level.
+    - **Tentative**: The evidence is inconclusive.
+    - **Rejected**: The feature is not significantly better than noise.
+
+    Only *Confirmed* features are returned.
+
+5. **Post-selection evaluation.** The selected features are evaluated on dev
+   (cross-validated) and test sets using both a refit and a grid-search + refit
+   strategy, matching the pattern used by RFE and SFS.
+
+## Key parameters
+
+| Parameter | Default | Description |
+|-----------|---------|-------------|
+| `model` | `""` (auto) | Tree-based model only (`RandomForest`, `ExtraTrees`, or `XGB`) |
+| `cv` | `5` | Cross-validation folds for hyperparameter tuning |
+| `perc` | `100` | Percentile threshold for shadow-feature comparison (100 = max shadow importance) |
+| `alpha` | `0.05` | Significance level for the statistical test |
+
+## When to use
+
+Boruta is particularly well-suited when:
+
+- You want to find **all relevant features** rather than a fixed-size subset.
+  This is valuable for interpretability or when downstream models benefit from
+  having every informative feature available.
+- The dataset has many noise features and you want a principled way to separate
+  signal from noise.
+- You are uncertain about how many features to keep and prefer letting a
+  statistical test decide.
+
+## Limitations
+
+- Only supports tree-based models (RandomForest, ExtraTrees, XGBoost). CatBoost
+  is not supported because the BorutaPy implementation requires sklearn-style
+  `feature_importances_`.
+- Runtime grows with the number of features (shadow features double the feature
+  space) and the number of Boruta iterations.
+- The `perc` parameter (percentile of shadow importances) can affect sensitivity:
+  lowering it below 100 makes the test more conservative.
+- Does not support time-to-event targets.
diff --git a/docs/concepts/workflow/efs.md b/docs/concepts/workflow/efs.md
@@ -0,0 +1,75 @@
+# EFS -- Ensemble Feature Selection
+
+
+EFS takes a fundamentally different approach to feature selection: instead of
+evaluating features individually, it trains many models on random feature
+subsets, then uses ensemble optimization to find the best *combination* of
+models. Features that appear in the winning ensemble are selected. This
+diversity-driven approach is especially effective for high-dimensional datasets
+where individual feature rankings may be unstable.
+
+## How it works
+
+1. **Generate random feature subsets.** EFS creates `n_subsets` (default 100)
+   random subsets, each containing `subset_size` (default 30) features drawn
+   from the full feature set.
+
+2. **Train a model per subset.** For each subset, a `GridSearchCV` tunes and
+   trains the chosen model (CatBoost, XGBoost, RandomForest, or ExtraTrees).
+   Cross-validated predictions are collected for every training sample.
+
+3. **Build a model table.** Each trained model is recorded along with its CV
+   performance, the features it used (excluding those with zero importance), and
+   its out-of-fold predictions. Models are sorted by performance.
+
+4. **Ensemble scan (hill-climbing).** Starting from the single best model, the
+   module incrementally adds the next-best model and computes the ensemble
+   performance (averaged predictions across models). This scan identifies the
+   number of top models that, when ensembled, give the best combined score.
+
+5. **Ensemble optimization with replacement.** Starting from the models found in
+   the scan, the optimizer iteratively tests adding each of the top
+   `max_n_models` models (with replacement) to the ensemble. At each iteration,
+   the model that improves ensemble performance the most is added. The process
+   repeats for up to `max_n_iterations` or until no improvement is found.
+
+6. **Feature aggregation.** The final optimized ensemble is a weighted
+   collection of models (weights = number of times each model appears). The
+   union of all features used by the ensemble models becomes the selected
+   feature set. Feature importance is reported as *counts* (how many times a
+   feature appeared across ensemble models) and *relative counts* (counts /
+   total models).
+
+## Key parameters
+
+| Parameter | Default | Description |
+|-----------|---------|-------------|
+| `model` | `""` (auto) | Model name -- `CatBoost`, `XGB`, `RandomForest`, or `ExtraTrees` |
+| `subset_size` | `30` | Number of features per random subset |
+| `n_subsets` | `100` | Number of random subsets to create |
+| `cv` | `5` | Cross-validation folds |
+| `max_n_iterations` | `50` | Iterations for ensemble optimization |
+| `max_n_models` | `30` | Maximum models to consider in optimization |
+
+## When to use
+
+EFS is ideal when:
+
+- The dataset is **high-dimensional** (hundreds to thousands of features) and
+  individual feature rankings are noisy or inconsistent.
+- You want a **diversity-driven** selection that captures complementary sets of
+  features rather than just the top-ranked ones.
+- Compute resources are available for training many models in parallel.
+
+## Limitations
+
+- Computationally heavy: `n_subsets` models are trained, each with a grid
+  search. With 100 subsets and a 4-parameter grid this can mean thousands of
+  model fits.
+- The random subset generation means results are seed-dependent. Different seeds
+  may produce different feature sets, though the ensemble optimization helps
+  stabilize this.
+- Does not produce scores or predictions in the standard format (scores and
+  predictions DataFrames are empty); it primarily returns feature counts as
+  importance measures.
+- Does not support time-to-event targets.
diff --git a/docs/concepts/workflow/index.md b/docs/concepts/workflow/index.md
@@ -0,0 +1,116 @@
+# Workflow & Modules
+
+## Overview
+
+Real-world datasets often contain many columns, but only a subset of them actually helps
+a machine-learning model make accurate predictions. Finding that subset -- **feature selection** --
+is a core goal of Octopus.
+
+A **workflow** is an ordered list of **tasks** that are executed one after another.
+Each task wraps a module, and each module either selects features, trains models, or both.
+By chaining tasks together you build a pipeline that progressively narrows the feature set:
+start with cheap, fast filters to discard obvious noise, then hand the reduced set to more
+expensive methods for further refinement.
+
+### Module types
+
+Octopus ships two kinds of modules:
+
+| Type | Purpose | Examples |
+|------|---------|----------|
+| **Feature Selection** | Reduce the number of features | [ROC](roc.md), [MRMR](mrmr.md), [RFE](rfe.md), [RFE2](rfe2.md), [SFS](sfs.md), [Boruta](boruta.md), [EFS](efs.md) |
+| **Machine Learning** | Train models, optimize hyperparameters, and optionally select features | [Octo](octo.md), [AutoGluon](autogluon.md) |
+
+Both types return a list of **selected features** that the next task in the workflow can consume.
+
+### How tasks are connected
+
+Every task has a `task_id` (starting at 0) and an optional `depends_on` parameter pointing to
+the `task_id` of a prior task.
+
+- The **first task** (`depends_on=None`) receives all columns listed in `feature_cols`.
+- A **dependent task** (`depends_on=N`) receives only the features selected by task *N*,
+  plus any scores, predictions, and feature-importance tables that task *N* produced.
+
+### Example workflow
+
+A typical three-step pipeline looks like this:
+
+```
+Task 0 (Octo)          all 30 features
+        |
+        v               selected_features (e.g. 20)
+Task 1 (MRMR)          receives 20 features from Task 0
+        |
+        v               selected_features (e.g. 15)
+Task 2 (Octo)          receives 15 features from Task 1
+```
+
+In Python this translates to:
+
+```python
+from octopus import OctoClassification
+from octopus.modules import Mrmr, Octo
+
+study = OctoClassification(
+    ...,
+    workflow=[
+        Octo(
+            task_id=0,
+            depends_on=None,
+            description="step1_octo_full",
+            models=["ExtraTreesClassifier"],
+            n_trials=100,
+            n_folds_inner=5,
+            max_features=30,
+        ),
+        Mrmr(
+            task_id=1,
+            depends_on=0,
+            description="step2_mrmr",
+            n_features=15,
+        ),
+        Octo(
+            task_id=2,
+            depends_on=1,
+            description="step3_octo_reduced",
+            models=["ExtraTreesClassifier"],
+            n_trials=100,
+            n_folds_inner=5,
+            ensemble_selection=True,
+        ),
+    ],
+)
+
+study.fit(data=df)
+```
+
+!!! tip
+    Ordering matters: tasks with `depends_on=None` must appear before tasks that reference
+    them, and `task_id` values must form a contiguous sequence starting at 0.
+
+---
+
+## Feature Selection Modules
+
+The table below lists all feature-selection modules roughly ordered from cheapest to most
+expensive:
+
+| Module | Wraps | Description |
+|--------|-------|-------------|
+| **[ROC](roc.md)** | scipy, networkx (custom) | Removes correlated features using graph-based grouping |
+| **[MRMR](mrmr.md)** | Custom implementation | Maximum Relevance Minimum Redundancy filter |
+| **[RFE](rfe.md)** | sklearn `RFECV` | Recursive Feature Elimination with cross-validation |
+| **[RFE2](rfe2.md)** | Extends Octo (custom) | RFE using Octo's Optuna-based models |
+| **[SFS](sfs.md)** | mlxtend / sklearn | Sequential forward/backward selection |
+| **[Boruta](boruta.md)** | Custom (based on BorutaPy) | Shadow-feature statistical test |
+| **[EFS](efs.md)** | Custom implementation | Ensemble of models on random feature subsets |
+
+---
+
+## Machine Learning Modules
+
+| Module | Description |
+|--------|-------------|
+| **[Octo](octo.md)** | Core ML module with HPO, ensembling, and feature importance |
+| **[AutoGluon](autogluon.md)** | AutoGluon TabularPredictor wrapper |