Fix Analyse/Predict, fix root cause of sklearn warnings, transformer issue by anwurl · Pull Request #364 · emdgroup/octopus

anwurl · 2026-03-13T07:28:13Z

[done]

deal with changing study output folders
fix warning messages that show up in the notebook (new sklearn behaviour)
fix training preprocessing bug

Copilot

Pull request overview

This PR addresses recent analyse/predict regressions and notebook friction by improving how studies are located on disk and by fixing Octo’s preprocessing output labeling so feature columns remain correctly ordered (especially for mixed numeric/categorical inputs).

Changes:

Add find_latest_study() to locate the newest timestamped study directory by name prefix and update the example notebook to use it.
Fix/prevent ColumnTransformer-induced feature column reordering by relabeling + reordering transformed outputs back to feature_cols, and tighten FI partition validation/reproducibility behavior.
Add a dedicated regression test suite covering mixed-type column ordering and FI label correctness.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
`tests/modules/octo/test_column_ordering.py`	New tests validating processed column ordering and FI labels for mixed numeric/categorical data.
`octopus/predict/notebook_utils.py`	Adds `find_latest_study()` and exports it for notebook usage.
`octopus/modules/octo/training.py`	Centralizes transform→DataFrame relabeling/reordering; improves FI partition validation and RNG scoping; adjusts SHAP prediction wrapper.
`octopus/modules/octo/bag.py`	Removes unnecessary input conversion and makes `_estimator_type` an sklearn-compatible property.
`examples/analyse_study_classification.ipynb`	Updates notebook to select the latest timestamped study run automatically.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

tests/modules/octo/test_column_ordering.py

octopus/modules/octo/training.py

octopus/predict/notebook_utils.py

Copilot

Pull request overview

This PR addresses regressions and UX issues in the analyse/predict workflow by making study selection robust to timestamped output folders and fixing preprocessing/prediction edge cases (notably ColumnTransformer column ordering and sklearn feature-name warnings).

Changes:

Add find_latest_study() helper and update the example analysis notebook to automatically pick the newest timestamped study output.
Fix Octo Training preprocessing output relabeling/reordering so processed feature columns match feature_cols across train/dev/test and prediction paths.
Adjust sklearn-compatibility behavior in Bag and SHAP/predict paths to reduce warnings and improve robustness.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
tests/modules/octo/test_column_ordering.py	Adds regression tests covering mixed-type ColumnTransformer ordering + FI label correctness.
octopus/predict/notebook_utils.py	Introduces `find_latest_study()` for timestamped study directory discovery.
octopus/modules/octo/training.py	Centralizes processed-output relabeling/reordering; tightens predict/FI partition validation; adjusts SHAP call paths for sklearn compatibility.
octopus/modules/octo/bag.py	Removes DataFrame→numpy conversion and fixes `_estimator_type` to be sklearn-compatible.
examples/analyse_study_classification.ipynb	Uses `find_latest_study()` to select the latest run automatically.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

tests/modules/octo/test_column_ordering.py

+def _get_default_model_params(model_name: str) -> dict:
+    """Get default parameters for a model from its hyperparameter configuration."""
+    model_config = Models.get_config(model_name)
+    params = {}
+
+    for hp in model_config.hyperparameters:
+        if isinstance(hp, FixedHyperparameter):
+            params[hp.name] = hp.value
+        elif isinstance(hp, CategoricalHyperparameter):
+            params[hp.name] = hp.choices[0] if hp.choices else None
+        elif isinstance(hp, IntHyperparameter):
+            params[hp.name] = int((hp.low + hp.high) / 2)
+        elif isinstance(hp, FloatHyperparameter):
+            if hp.log:
+                params[hp.name] = np.sqrt(hp.low * hp.high)
+            else:
+                params[hp.name] = (hp.low + hp.high) / 2
+        else:
+            raise AssertionError(f"Unsupported Hyperparameter type: {type(hp)}.")
+
+    if model_config.n_jobs:
+        params[model_config.n_jobs] = 1
+    if model_config.model_seed:
+        params[model_config.model_seed] = 42


anwurl marked this pull request as draft March 13, 2026 07:28

anwurl self-assigned this Mar 13, 2026

anwurl added Analysis Predict labels Mar 13, 2026

anwurl requested a review from Copilot March 13, 2026 09:58

Copilot started reviewing on behalf of anwurl March 13, 2026 09:58 View session

Copilot AI reviewed Mar 13, 2026

View reviewed changes

tests/modules/octo/test_column_ordering.py Show resolved Hide resolved

octopus/modules/octo/training.py Show resolved Hide resolved

octopus/modules/octo/training.py Outdated Show resolved Hide resolved

octopus/predict/notebook_utils.py Show resolved Hide resolved

anwurl marked this pull request as ready for review March 13, 2026 10:34

anwurl requested a review from Copilot March 13, 2026 10:36

Copilot started reviewing on behalf of anwurl March 13, 2026 10:36 View session

Copilot AI reviewed Mar 13, 2026

View reviewed changes

anwurl changed the title ~~Fix and Revise Analyse/Predict~~ Fix Analyse/Predict, fix root cause of sklearn warnings, transformer issue Mar 13, 2026

anwurl requested review from dasmy, kalama-ai and nihaase March 13, 2026 10:52

anwurl added 5 commits March 14, 2026 20:43

find latest wf directory

8e39d1d

fix sklearn warnings

2d6682d

fix train transformer

3bab23c

several fixes

986efb5

copilot fixes

eb90dc0

anwurl force-pushed the fix/analyze_predict branch from 8c979b1 to eb90dc0 Compare March 14, 2026 19:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix Analyse/Predict, fix root cause of sklearn warnings, transformer issue#364

Fix Analyse/Predict, fix root cause of sklearn warnings, transformer issue#364
anwurl wants to merge 5 commits intomainfrom
fix/analyze_predict

anwurl commented Mar 13, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

anwurl commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

anwurl commented Mar 13, 2026 •

edited

Loading