Conversation
There was a problem hiding this comment.
Pull request overview
Enables mypy type-checking for examples/ and tests/ (per #359) by removing those directories from mypy/pre-commit exclusions and applying typing-focused adjustments across tests and a few core typing surfaces.
Changes:
- Enable mypy on
examples/andtests/via.pre-commit-config.yamlandpyproject.toml. - Update tests/examples with additional annotations,
type: ignoremarkers, and small refactors to satisfy mypy. - Add
Models.get_registered_models()and adjust some core typing (e.g., workflow/TrainingConfig types) to support stricter checking.
Reviewed changes
Copilot reviewed 36 out of 36 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/workflows/test_roc_octo_roc_workflow.py | Typing-driven refactors and added None-guards for models. |
| tests/workflows/test_octo_t2e.py | Adjust assertions/typing and add models is not None guard. |
| tests/workflows/test_octo_regression.py | Make make_regression return shape explicit and add models guards. |
| tests/workflows/test_octo_multiclass.py | Add models is not None guards. |
| tests/workflows/test_octo_classification.py | Add models is not None guards. |
| tests/workflows/test_ag_workflows.py | Make make_regression return shape explicit. |
| tests/study/test_core.py | Refactor kwargs construction to satisfy mypy call typing. |
| tests/predict/test_predict.py | Add (currently inaccurate) type annotation + ignore for sklearn dataset. |
| tests/modules/test_training_feature_importances.py | Add stronger typing, use model registry accessor, restructure configs. |
| tests/modules/test_mrmr.py | Tighten correlation-type typing and adjust make_regression usage. |
| tests/modules/test_module_result.py | Add None-guards and use UPath.open(). |
| tests/modules/roc/test_roc.py | Add mypy ignore for intentionally invalid arg type in negative test. |
| tests/modules/octo/test_model_fitted_validation.py | Switch to ModelName, use registry accessor, typing cleanups. |
| tests/modules/octo/test_ensemble_selection_unit.py | Use ModelName for config_training. |
| tests/modules/octo/test_ensemble_selection.py | Add typing for model map and ModelName mapping. |
| tests/modules/octo/test_classes_attribute.py | Use Models.get_registered_models() instead of private registry. |
| tests/models/test_model_parameter_compatibility.py | Type model names as ModelName and adjust sorting. |
| tests/models/test_model_config.py | Use BaseModel-derived dummy model to satisfy typing expectations. |
| tests/models/test_hyperparameter.py | Switch to ModelName (introduces a runtime issue for "TestModel"). |
| tests/metrics/test_metrics_uniqueness.py | Minor typing improvements and cleanup. |
| tests/manager/test_core.py | Add mypy ignore for intentional frozen-instance mutation. |
| tests/infrastructure/test_file_io.py | Adjust categorical dtype construction for typing (introduces nondeterminism risk). |
| pyproject.toml | Remove examples/tests from mypy exclude. |
| octopus/study/validation.py | Widen workflow validator type to Sequence[Task] and message formatting. |
| octopus/study/core.py | Change workflow annotation to Sequence[Task] (validator still enforces list). |
| octopus/modules/octo/training.py | Make TrainingConfig TypedDict(total=False) to relax typing constraints. |
| octopus/models/core.py | Add Models.get_registered_models() and error-message formatting. |
| octopus/manager/workflow_runner.py | Change workflow annotation to Sequence[Task] (validator still enforces list). |
| octopus/manager/core.py | Change workflow annotation to Sequence[Task] (validator still enforces list). |
| examples/wf_roc_octo.py | Add Bunch annotation and mypy ignore for sklearn dataset loader typing. |
| examples/wf_multiclass_wine.py | Add Bunch annotation and mypy ignore for sklearn dataset loader typing. |
| examples/use_own_hyperparameters.py | Add Bunch annotation and mypy ignore for sklearn dataset loader typing. |
| examples/multi_workflow.py | Add Bunch annotation and mypy ignore for sklearn dataset loader typing. |
| examples/basic_regression.py | Add Bunch annotation and mypy ignore for sklearn dataset loader typing. |
| examples/basic_classification.py | Add Bunch annotation and mypy ignore for sklearn dataset loader typing. |
| .pre-commit-config.yaml | Enable mypy hook for examples/ and tests/ by narrowing exclusions. |
Comments suppressed due to low confidence (1)
octopus/modules/octo/training.py:43
- Changing
TrainingConfigtoTypedDict(total=False)makes all keys optional, butTrainingstill indexes required keys directly (e.g.self.config_training["outl_reduction"],ml_model_type,ml_model_params), which can now slip past type-checking and fail withKeyError. Prefer aTypedDictwith required keys andNotRequired[...]only for truly optional entries (e.g.positive_class).
class TrainingConfig(TypedDict, total=False):
"""Training configuration type."""
outl_reduction: int
n_input_features: int
ml_model_type: ModelName
ml_model_params: dict
positive_class: int | None
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
…ed synchronously on different branches
| return decorator | ||
|
|
||
| @classmethod | ||
| def get_registered_models(cls) -> list[ModelName]: |
There was a problem hiding this comment.
Do we want this function? In the code you normally only want models that fit to your ml_type. Can be a potential error if used somewhere.
There was a problem hiding this comment.
Should we make it private by adding a leading underscore, @nihaase ?
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 39 out of 39 changed files in this pull request and generated 6 comments.
Comments suppressed due to low confidence (1)
octopus/modules/octo/training.py:43
- Changing
TrainingConfigtoTypedDict(total=False)makes all keys optional, butTrainingaccesses several of them with[...](e.g.outl_reduction,ml_model_type,ml_model_params), which are required at runtime. This weakens type safety and can hide missing-key bugs. Prefer keeping required keys as required (useRequired[...]/NotRequired[...]from PEP 655, or split into required/optional TypedDicts) and update tests to provide the required keys.
class TrainingConfig(TypedDict, total=False):
"""Training configuration type."""
outl_reduction: int
n_input_features: int
ml_model_type: ModelName
ml_model_params: dict
positive_class: int | None
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
| IntHyperparameter, | ||
| ) | ||
| from octopus.models.model_name import ModelName | ||
| from octopus.types import MLType |
| bc: Bunch = load_breast_cancer(as_frame=True) # type: ignore[assignment] | ||
| df = bc["frame"].reset_index() |
| breast_cancer: Bunch = load_breast_cancer(as_frame=True) # type: ignore[assignment] | ||
|
|
||
| df = breast_cancer["frame"].reset_index() |
| diabetes: Bunch = load_diabetes(as_frame=True) # type: ignore[assignment] | ||
|
|
||
| df = diabetes["frame"].reset_index() | ||
| features = [str(feature) for feature in diabetes["feature_names"]] |
| wine: Bunch = load_wine(as_frame=True) # type: ignore[assignment] | ||
|
|
||
| df = wine["frame"].reset_index() | ||
| df.columns = df.columns.str.replace(" ", "_") |
| max_features=4, | ||
| feature_groups={}, | ||
| config_training={"ml_model_type": "RidgeRegressor"}, | ||
| config_training={"ml_model_type": ModelName.RidgeRegressor}, |
| workflow: list[Task] = field(validator=[validators.instance_of(list)]) | ||
| workflow: Sequence[Task] = field(validator=[validators.instance_of(list)]) |
There was a problem hiding this comment.
I don't understand why we have this mismatch: Sequence -- list
| def load_breast_cancer_data() -> tuple[pd.DataFrame, list[str], list[str]]: | ||
| """Load the breast cancer dataset and return pandas dataframe, feature list, and target list.""" | ||
| breast_cancer: Bunch = load_breast_cancer(as_frame=True) # type: ignore[assignment] | ||
|
|
||
| df = breast_cancer["frame"].reset_index() | ||
| df.columns = df.columns.str.replace(" ", "_") | ||
| features = [feature.replace(" ", "_") for feature in breast_cancer["feature_names"]] | ||
| targets = [str(target) for target in breast_cancer["target_names"]] | ||
|
|
||
| return df, features, targets |
There was a problem hiding this comment.
Should we put data code in a separate folder?
Fixes #359