Aggregate logs in `evaluate` #483

LarsKue · 2025-05-21T19:56:52Z

Fixes #481

codecov · 2025-05-21T19:58:17Z

Codecov Report

Attention: Patch coverage is 90.25974% with 15 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
...ximators/backend_approximators/jax_approximator.py	91.52%	5 Missing ⚠️
...s/backend_approximators/tensorflow_approximator.py	89.58%	5 Missing ⚠️
...mators/backend_approximators/torch_approximator.py	89.36%	5 Missing ⚠️

Files with missing lines	Coverage Δ
...ximators/backend_approximators/jax_approximator.py	`94.11% <91.52%> (-3.56%)`	⬇️
...s/backend_approximators/tensorflow_approximator.py	`90.90% <89.58%> (-3.54%)`	⬇️
...mators/backend_approximators/torch_approximator.py	`91.30% <89.36%> (-4.16%)`	⬇️

Copilot

Pull Request Overview

Implements cumulative logging and averaging in the evaluate method for all backend approximators to fix metrics aggregation issues.

Introduce _aggregate_logs and _mean_logs helpers to accumulate and normalize batch metrics.
Override evaluate in Torch, TensorFlow, and JAX approximators using their respective *EpochIterator and callback flows.
Wire up Keras CallbackList and ensure per-batch callbacks with aggregated logs.

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File	Description
bayesflow/approximators/backend_approximators/torch_approximator.py	Added `_aggregate_logs`, `_mean_logs`, and updated `evaluate` with TorchEpochIterator and callbacks.
bayesflow/approximators/backend_approximators/tensorflow_approximator.py	Added `_aggregate_logs`, `_mean_logs`, and updated `evaluate` with TFEpochIterator and callbacks.
bayesflow/approximators/backend_approximators/jax_approximator.py	Added `_aggregate_logs`, `_mean_logs`, and updated `evaluate` with JAXEpochIterator, state sync, and callbacks.

Comments suppressed due to low confidence (2)

bayesflow/approximators/backend_approximators/jax_approximator.py:87

[nitpick] The loop variable 'iterator' shadows the epoch_iterator and may be confusing; consider renaming it to 'batch_data' or similar to clarify its purpose.

for step, iterator in epoch_iterator:

bayesflow/approximators/backend_approximators/tensorflow_approximator.py:31

Add unit tests for this new evaluate implementation to verify that log aggregation and averaging behave as expected across multiple batches.

def evaluate(

bayesflow/approximators/backend_approximators/torch_approximator.py

bayesflow/approximators/backend_approximators/jax_approximator.py

Copilot

Pull Request Overview

Adds an aggregate option to the evaluate method in all backend approximators, allowing batch-wise metrics to be summed and averaged rather than overwritten each step.

Introduce _aggregate_fn and _reduce_fn in evaluate to accumulate and average metrics.
Add aggregate and return_dict parameters to control output format.
Ensure consistency in callback invocation and test function setup across Torch, TensorFlow, and JAX backends.

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File	Description
bayesflow/approximators/backend_approximators/torch_approximator.py	Added `evaluate` override with optional aggregation logic
bayesflow/approximators/backend_approximators/tensorflow_approximator.py	Added `evaluate` override with optional aggregation logic
bayesflow/approximators/backend_approximators/jax_approximator.py	Added `evaluate` override with optional aggregation logic and JAX state handling

Comments suppressed due to low confidence (4)

bayesflow/approximators/backend_approximators/torch_approximator.py:26

Add unit tests covering both aggregate=True and aggregate=False paths to verify that metrics are correctly summed and averaged.

        aggregate=False,

bayesflow/approximators/backend_approximators/torch_approximator.py:29

Add a call to self._assert_compile_called("evaluate") at the start of evaluate to ensure the model has been compiled before evaluation.

# TODO: respect compiled trainable state

bayesflow/approximators/backend_approximators/jax_approximator.py:26

The default aggregate=True in JAX differs from aggregate=False in the Torch and TensorFlow backends. Align the default value for consistency across backends.

        aggregate=True,

bayesflow/approximators/backend_approximators/torch_approximator.py:16

This new evaluate method lacks docstrings for the aggregate and return_dict parameters; please add descriptions and expected behavior.

    def evaluate(

bayesflow/approximators/backend_approximators/torch_approximator.py

… aggregate-logs-in-evaluate # Conflicts: # bayesflow/approximators/backend_approximators/jax_approximator.py # bayesflow/approximators/backend_approximators/tensorflow_approximator.py # bayesflow/approximators/backend_approximators/torch_approximator.py

LarsKue · 2025-05-22T20:32:45Z

Superceded by #485

aggregate logs in evaluate

9fa0743

LarsKue requested a review from Copilot May 21, 2025 19:56

LarsKue self-assigned this May 21, 2025

LarsKue added the fix Pull request that fixes a bug label May 21, 2025

LarsKue requested a review from stefanradev93 May 21, 2025 19:57

Copilot AI reviewed May 21, 2025

View reviewed changes

bayesflow/approximators/backend_approximators/torch_approximator.py Show resolved Hide resolved

bayesflow/approximators/backend_approximators/torch_approximator.py Show resolved Hide resolved

bayesflow/approximators/backend_approximators/jax_approximator.py Outdated Show resolved Hide resolved

use up-to-date keras code

cf3f397

LarsKue requested a review from Copilot May 21, 2025 20:11

Copilot AI reviewed May 21, 2025

View reviewed changes

bayesflow/approximators/backend_approximators/torch_approximator.py Show resolved Hide resolved

LarsKue added 4 commits May 21, 2025 16:14

guard for total_steps = 0

c59d8ff

guard for total_steps = 0

a3b37d4

fix default aggregate=False

677c363

LarsKue mentioned this pull request May 22, 2025

Correctly track train / validation losses #485

Merged

LarsKue closed this May 22, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Aggregate logs in `evaluate` #483

Aggregate logs in `evaluate` #483

Uh oh!

LarsKue commented May 21, 2025

Uh oh!

codecov bot commented May 21, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

LarsKue commented May 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Aggregate logs in evaluate #483

Aggregate logs in evaluate #483

Uh oh!

Conversation

LarsKue commented May 21, 2025

Uh oh!

codecov bot commented May 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

LarsKue commented May 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Aggregate logs in `evaluate` #483

Aggregate logs in `evaluate` #483

codecov bot commented May 21, 2025 •

edited

Loading