remove metrics input by nihaase · Pull Request #348 · emdgroup/octopus

nihaase · 2026-03-10T10:26:45Z

No description provided.

Copilot

Pull request overview

Removes the user-configurable metrics input from study configuration and runtime context, shifting metric selection toward target_metric and (in some modules) automatic derivation via Metrics.get_by_type().

Changes:

Remove metrics from OctoStudy/subclasses and from StudyContext, and drop it from context construction.
Update Octo/AutoGluon modules to derive metric sets via Metrics.get_by_type(study_context.ml_type) and adjust scoring calls accordingly.
Update tests and examples to stop passing metrics=... and remove tests that validated metrics configuration.

Reviewed changes

Copilot reviewed 18 out of 18 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
`octopus/study/core.py`	Removes `metrics` API (abstract property + attrs fields) and stops passing it into `StudyContext`.
`octopus/study/context.py`	Drops `metrics` from the frozen runtime context.
`octopus/modules/octo/core.py`	Changes Octo module result scoring to concatenate scores for all metrics derived from ML type.
`octopus/modules/octo/bag.py`	Adds optional `metric` parameter to `get_performance()` and threads it through `get_performance_df()`.
`octopus/modules/autogluon/core.py`	Switches metric list derivation to `Metrics.get_by_type(study_context.ml_type)`.
`tests/workflows/test_roc_octo_roc_workflow.py`	Removes `metrics=[...]` from workflow execution test study instantiation.
`tests/workflows/test_octo_t2e.py`	Removes `metrics=[...]` from time-to-event workflow execution test.
`tests/workflows/test_octo_regression.py`	Removes `metrics=[...]` from regression workflow execution test.
`tests/workflows/test_octo_multiclass.py`	Removes multiclass `metrics` test and stops passing `metrics=[...]` in other multiclass tests.
`tests/workflows/test_octo_classification.py`	Removes `metrics=[...]` from classification workflow execution test.
`tests/workflows/test_ag_workflows.py`	Removes `metrics=[...]` from AutoGluon workflow tests.
`tests/study/test_core.py`	Removes parametrized test that asserted default/custom `metrics` behaviour.
`tests/predict/test_predict.py`	Removes `metrics=[...]` from study creation helper.
`tests/manager/test_core.py`	Removes `metrics=[...]` from `StudyContext` fixture.
`tests/infrastructure/test_fsspec.py`	Removes `metrics=[...]` from fsspec-backed experiment test.
`examples/wf_roc_octo.py`	Removes `metrics=[...]` from example study construction.
`examples/wf_octo_autogluon.py`	Removes `metrics=[...]` from example study construction.
`examples/wf_multiclass_wine.py`	Removes `metrics=[...]` from example study construction.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-10T10:31:23Z

octopus/modules/octo/core.py

+        all_metrics = Metrics.get_by_type(study_context.ml_type)
+        best_scores = pd.concat(
+            [best_bag.get_performance_df(metric=m) for m in all_metrics],
+            ignore_index=True,
+        )


OctoModuleTemplate.fit() now concatenates score DataFrames for all metrics of the ML type. Several downstream consumers (e.g. StudyLoader.build_performance_summary() / show_target_metric_performance) currently assume scores.parquet effectively contains a single metric and build keys that omit the metric column, which will silently overwrite values when multiple metrics are present. Either keep scores limited to study_context.target_metric, or update the downstream aggregation to include the metric name in its keys / grouping so all metrics are preserved.

Copilot · 2026-03-10T10:31:23Z

octopus/modules/octo/core.py

+        all_metrics = Metrics.get_by_type(study_context.ml_type)
+        best_scores = pd.concat(
+            [best_bag.get_performance_df(metric=m) for m in all_metrics],
+            ignore_index=True,
+        )


Computing best_scores by calling best_bag.get_performance_df() once per metric likely recomputes predictions/performance repeatedly (each call goes through get_predictions() / get_performance_from_predictions()). If you want to persist multiple metrics, consider computing predictions once and evaluating multiple metrics from that cached structure (or extending the bag API to return a multi-metric scores DataFrame in one pass) to avoid an avoidable O(#metrics) overhead per outer split.

Copilot · 2026-03-10T10:31:23Z

octopus/modules/octo/bag.py

+    def get_performance(self, metric: str | None = None):
        """Get performance using get_performance_from_predictions utility.

-        This is a simpler alternative to get_performance() that:
-        1. Gets predictions from bag.get_predictions()
-        2. Calculates performance using get_performance_from_predictions()
-        3. Restructures output to match expected format
+        Args:
+            metric: The metric to evaluate. Defaults to self.target_metric when None.

        Returns:
            dict: Dictionary with performance values in the same format as get_performance()
        """


The updated docstring/return description is self-referential: it says the returned dict matches the format of get_performance(), but this method is get_performance(). Clarify what shape the returned dict has (keys like train_avg, dev_avg, etc.) and/or what legacy method it is intended to be compatible with.

Copilot · 2026-03-10T10:31:23Z

octopus/modules/autogluon/core.py

        # Test scores using Octopus metrics for comparison
        assert study_context.target_metric is not None, "target_metric should be set during fit()"
-        all_metrics = list(dict.fromkeys([*study_context.metrics, study_context.target_metric]))
+        all_metrics = Metrics.get_by_type(study_context.ml_type)


Switching from the explicit [...study_context.metrics, study_context.target_metric] list to Metrics.get_by_type(study_context.ml_type) will evaluate (and persist) every registered metric for the ML type. This can materially increase runtime and the size/noise of scores.parquet/performance_results.json, and it also hits the same downstream aggregation issue where multiple metrics in scores.parquet are not preserved. Consider restricting this comparison set to {study_context.target_metric} (or a small curated set) unless/until the rest of the pipeline is updated to handle multi-metric score artifacts.

remove metrics input

ed7d36c

nihaase requested a review from Copilot March 10, 2026 10:26

nihaase linked an issue Mar 10, 2026 that may be closed by this pull request

Distinguish between target_metric and result_metrics #135

Open

Copilot started reviewing on behalf of nihaase March 10, 2026 10:27 View session

Copilot AI reviewed Mar 10, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

remove metrics input#348

remove metrics input#348
nihaase wants to merge 1 commit intomainfrom
feat/remove-metrics-input

nihaase commented Mar 10, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 10, 2026

Uh oh!

Copilot AI Mar 10, 2026

Uh oh!

Copilot AI Mar 10, 2026

Uh oh!

Copilot AI Mar 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

nihaase commented Mar 10, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants