Fix Analyse/Predict, fix root cause of sklearn warnings, transformer issue#364
Fix Analyse/Predict, fix root cause of sklearn warnings, transformer issue#364
Conversation
There was a problem hiding this comment.
Pull request overview
This PR addresses recent analyse/predict regressions and notebook friction by improving how studies are located on disk and by fixing Octo’s preprocessing output labeling so feature columns remain correctly ordered (especially for mixed numeric/categorical inputs).
Changes:
- Add
find_latest_study()to locate the newest timestamped study directory by name prefix and update the example notebook to use it. - Fix/prevent ColumnTransformer-induced feature column reordering by relabeling + reordering transformed outputs back to
feature_cols, and tighten FI partition validation/reproducibility behavior. - Add a dedicated regression test suite covering mixed-type column ordering and FI label correctness.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
tests/modules/octo/test_column_ordering.py |
New tests validating processed column ordering and FI labels for mixed numeric/categorical data. |
octopus/predict/notebook_utils.py |
Adds find_latest_study() and exports it for notebook usage. |
octopus/modules/octo/training.py |
Centralizes transform→DataFrame relabeling/reordering; improves FI partition validation and RNG scoping; adjusts SHAP prediction wrapper. |
octopus/modules/octo/bag.py |
Removes unnecessary input conversion and makes _estimator_type an sklearn-compatible property. |
examples/analyse_study_classification.ipynb |
Updates notebook to select the latest timestamped study run automatically. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
There was a problem hiding this comment.
Pull request overview
This PR addresses regressions and UX issues in the analyse/predict workflow by making study selection robust to timestamped output folders and fixing preprocessing/prediction edge cases (notably ColumnTransformer column ordering and sklearn feature-name warnings).
Changes:
- Add
find_latest_study()helper and update the example analysis notebook to automatically pick the newest timestamped study output. - Fix Octo
Trainingpreprocessing output relabeling/reordering so processed feature columns matchfeature_colsacross train/dev/test and prediction paths. - Adjust sklearn-compatibility behavior in
Bagand SHAP/predict paths to reduce warnings and improve robustness.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| tests/modules/octo/test_column_ordering.py | Adds regression tests covering mixed-type ColumnTransformer ordering + FI label correctness. |
| octopus/predict/notebook_utils.py | Introduces find_latest_study() for timestamped study directory discovery. |
| octopus/modules/octo/training.py | Centralizes processed-output relabeling/reordering; tightens predict/FI partition validation; adjusts SHAP call paths for sklearn compatibility. |
| octopus/modules/octo/bag.py | Removes DataFrame→numpy conversion and fixes _estimator_type to be sklearn-compatible. |
| examples/analyse_study_classification.ipynb | Uses find_latest_study() to select the latest run automatically. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
| def _get_default_model_params(model_name: str) -> dict: | ||
| """Get default parameters for a model from its hyperparameter configuration.""" | ||
| model_config = Models.get_config(model_name) | ||
| params = {} | ||
|
|
||
| for hp in model_config.hyperparameters: | ||
| if isinstance(hp, FixedHyperparameter): | ||
| params[hp.name] = hp.value | ||
| elif isinstance(hp, CategoricalHyperparameter): | ||
| params[hp.name] = hp.choices[0] if hp.choices else None | ||
| elif isinstance(hp, IntHyperparameter): | ||
| params[hp.name] = int((hp.low + hp.high) / 2) | ||
| elif isinstance(hp, FloatHyperparameter): | ||
| if hp.log: | ||
| params[hp.name] = np.sqrt(hp.low * hp.high) | ||
| else: | ||
| params[hp.name] = (hp.low + hp.high) / 2 | ||
| else: | ||
| raise AssertionError(f"Unsupported Hyperparameter type: {type(hp)}.") | ||
|
|
||
| if model_config.n_jobs: | ||
| params[model_config.n_jobs] = 1 | ||
| if model_config.model_seed: | ||
| params[model_config.model_seed] = 42 |
8c979b1 to
eb90dc0
Compare
[done]