Skip to content

enable mypy in examples and tests dirs#361

Open
dasmy wants to merge 3 commits intomainfrom
fix/359
Open

enable mypy in examples and tests dirs#361
dasmy wants to merge 3 commits intomainfrom
fix/359

Conversation

@dasmy
Copy link
Collaborator

@dasmy dasmy commented Mar 12, 2026

Fixes #359

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Enables mypy type-checking for examples/ and tests/ (per #359) by removing those directories from mypy/pre-commit exclusions and applying typing-focused adjustments across tests and a few core typing surfaces.

Changes:

  • Enable mypy on examples/ and tests/ via .pre-commit-config.yaml and pyproject.toml.
  • Update tests/examples with additional annotations, type: ignore markers, and small refactors to satisfy mypy.
  • Add Models.get_registered_models() and adjust some core typing (e.g., workflow/TrainingConfig types) to support stricter checking.

Reviewed changes

Copilot reviewed 36 out of 36 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
tests/workflows/test_roc_octo_roc_workflow.py Typing-driven refactors and added None-guards for models.
tests/workflows/test_octo_t2e.py Adjust assertions/typing and add models is not None guard.
tests/workflows/test_octo_regression.py Make make_regression return shape explicit and add models guards.
tests/workflows/test_octo_multiclass.py Add models is not None guards.
tests/workflows/test_octo_classification.py Add models is not None guards.
tests/workflows/test_ag_workflows.py Make make_regression return shape explicit.
tests/study/test_core.py Refactor kwargs construction to satisfy mypy call typing.
tests/predict/test_predict.py Add (currently inaccurate) type annotation + ignore for sklearn dataset.
tests/modules/test_training_feature_importances.py Add stronger typing, use model registry accessor, restructure configs.
tests/modules/test_mrmr.py Tighten correlation-type typing and adjust make_regression usage.
tests/modules/test_module_result.py Add None-guards and use UPath.open().
tests/modules/roc/test_roc.py Add mypy ignore for intentionally invalid arg type in negative test.
tests/modules/octo/test_model_fitted_validation.py Switch to ModelName, use registry accessor, typing cleanups.
tests/modules/octo/test_ensemble_selection_unit.py Use ModelName for config_training.
tests/modules/octo/test_ensemble_selection.py Add typing for model map and ModelName mapping.
tests/modules/octo/test_classes_attribute.py Use Models.get_registered_models() instead of private registry.
tests/models/test_model_parameter_compatibility.py Type model names as ModelName and adjust sorting.
tests/models/test_model_config.py Use BaseModel-derived dummy model to satisfy typing expectations.
tests/models/test_hyperparameter.py Switch to ModelName (introduces a runtime issue for "TestModel").
tests/metrics/test_metrics_uniqueness.py Minor typing improvements and cleanup.
tests/manager/test_core.py Add mypy ignore for intentional frozen-instance mutation.
tests/infrastructure/test_file_io.py Adjust categorical dtype construction for typing (introduces nondeterminism risk).
pyproject.toml Remove examples/tests from mypy exclude.
octopus/study/validation.py Widen workflow validator type to Sequence[Task] and message formatting.
octopus/study/core.py Change workflow annotation to Sequence[Task] (validator still enforces list).
octopus/modules/octo/training.py Make TrainingConfig TypedDict(total=False) to relax typing constraints.
octopus/models/core.py Add Models.get_registered_models() and error-message formatting.
octopus/manager/workflow_runner.py Change workflow annotation to Sequence[Task] (validator still enforces list).
octopus/manager/core.py Change workflow annotation to Sequence[Task] (validator still enforces list).
examples/wf_roc_octo.py Add Bunch annotation and mypy ignore for sklearn dataset loader typing.
examples/wf_multiclass_wine.py Add Bunch annotation and mypy ignore for sklearn dataset loader typing.
examples/use_own_hyperparameters.py Add Bunch annotation and mypy ignore for sklearn dataset loader typing.
examples/multi_workflow.py Add Bunch annotation and mypy ignore for sklearn dataset loader typing.
examples/basic_regression.py Add Bunch annotation and mypy ignore for sklearn dataset loader typing.
examples/basic_classification.py Add Bunch annotation and mypy ignore for sklearn dataset loader typing.
.pre-commit-config.yaml Enable mypy hook for examples/ and tests/ by narrowing exclusions.
Comments suppressed due to low confidence (1)

octopus/modules/octo/training.py:43

  • Changing TrainingConfig to TypedDict(total=False) makes all keys optional, but Training still indexes required keys directly (e.g. self.config_training["outl_reduction"], ml_model_type, ml_model_params), which can now slip past type-checking and fail with KeyError. Prefer a TypedDict with required keys and NotRequired[...] only for truly optional entries (e.g. positive_class).
class TrainingConfig(TypedDict, total=False):
    """Training configuration type."""

    outl_reduction: int
    n_input_features: int
    ml_model_type: ModelName
    ml_model_params: dict
    positive_class: int | None


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

@dasmy dasmy marked this pull request as ready for review March 12, 2026 15:00
return decorator

@classmethod
def get_registered_models(cls) -> list[ModelName]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want this function? In the code you normally only want models that fit to your ml_type. Can be a potential error if used somewhere.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we make it private by adding a leading underscore, @nihaase ?

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 39 out of 39 changed files in this pull request and generated 6 comments.

Comments suppressed due to low confidence (1)

octopus/modules/octo/training.py:43

  • Changing TrainingConfig to TypedDict(total=False) makes all keys optional, but Training accesses several of them with [...] (e.g. outl_reduction, ml_model_type, ml_model_params), which are required at runtime. This weakens type safety and can hide missing-key bugs. Prefer keeping required keys as required (use Required[...]/NotRequired[...] from PEP 655, or split into required/optional TypedDicts) and update tests to provide the required keys.
class TrainingConfig(TypedDict, total=False):
    """Training configuration type."""

    outl_reduction: int
    n_input_features: int
    ml_model_type: ModelName
    ml_model_params: dict
    positive_class: int | None


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

IntHyperparameter,
)
from octopus.models.model_name import ModelName
from octopus.types import MLType
Comment on lines +70 to 71
bc: Bunch = load_breast_cancer(as_frame=True) # type: ignore[assignment]
df = bc["frame"].reset_index()
Comment on lines +10 to +12
breast_cancer: Bunch = load_breast_cancer(as_frame=True) # type: ignore[assignment]

df = breast_cancer["frame"].reset_index()
Comment on lines +22 to +25
diabetes: Bunch = load_diabetes(as_frame=True) # type: ignore[assignment]

df = diabetes["frame"].reset_index()
features = [str(feature) for feature in diabetes["feature_names"]]
Comment on lines +33 to +36
wine: Bunch = load_wine(as_frame=True) # type: ignore[assignment]

df = wine["frame"].reset_index()
df.columns = df.columns.str.replace(" ", "_")
max_features=4,
feature_groups={},
config_training={"ml_model_type": "RidgeRegressor"},
config_training={"ml_model_type": ModelName.RidgeRegressor},
Comment on lines -36 to +40
workflow: list[Task] = field(validator=[validators.instance_of(list)])
workflow: Sequence[Task] = field(validator=[validators.instance_of(list)])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why we have this mismatch: Sequence -- list

Comment on lines +8 to +17
def load_breast_cancer_data() -> tuple[pd.DataFrame, list[str], list[str]]:
"""Load the breast cancer dataset and return pandas dataframe, feature list, and target list."""
breast_cancer: Bunch = load_breast_cancer(as_frame=True) # type: ignore[assignment]

df = breast_cancer["frame"].reset_index()
df.columns = df.columns.str.replace(" ", "_")
features = [feature.replace(" ", "_") for feature in breast_cancer["feature_names"]]
targets = [str(target) for target in breast_cancer["target_names"]]

return df, features, targets
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we put data code in a separate folder?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Mypy is disabled for tests and examples folders in .pre-commit-config.yaml and pyproject.toml

4 participants