Implement Feature for Issue #379: MMD Hypothesis Test #384

thegialeo · 2025-04-01T13:59:04Z

Description

This PR introduces the first draft implementation of the MMD hypothesis test feature based on the discussion in issue #379.

Note: The implementation is not complete and should not be merged at this stage.

The key open questions and considerations are outlined below:

Related Issue

Feature Request: Reintroduce trainer.mmd_hypothesis_test() equivalent in bayesflow v2 #379

Type of Change

🚀 New feature (non-breaking change which adds functionality, no existing code was changed)

Open Questions

Clarification on `bayesflow.types.Tensor` Conversions

What is the best way to parse from np.ndarray to bayesflow.types.Tensor while staying backend agnostic?
Similarly, how should I convert back from bayesflow.types.Tensor to np.ndarray/float, given that maximum_mean_discrepancy and approximator.summary_network operate with bayesflow.types.Tensor?

Data Type Consistency

Should observed_data/observed_summaries and reference_data/reference_summaries be of type bayesflow.types.Tensor, or should they take np.ndarray as arguments and cast as needed?
Since other functions in diagnostics/metrics generally use np.ndarray or Mapping[str, np.ndarray] as arguments, should we maintain this consistency across the new utility functions?

Naming & Import Conventions

diagnostics/plots already includes mmd_hypothesis_test.py, which defines mmd_hypothesis_test(). To avoid namespace collisions, should we rename either the plot function or the metric function?
Current convention in the codebase favors from package.module import function/class over import package.module and accessing function/class through the imported package.module, so naming both functions the same could be problematic at some point in the future.

Unit Tests

Is tests/test_diagnostics/test_diagnostics_metrics.py the correct location for new unit tests for the implemented functions?

@vpratz

…ckage - Create function signatures based on @vpratz's clarification in related issue bayesflow-org#379

…: maximum_mean_discrepency takes bf.types.Tensor and return bf.types.Tensor, but at the moment np.ndarray are provided and float return expected

…summary_network takes and return bf.types.Tensor, at the moment np.ndarray is assumed

…pothesis_test.py

vpratz · 2025-04-02T11:42:34Z

Thanks a lot for the PR!

What is the best way to parse from np.ndarray to bayesflow.types.Tensor while staying backend agnostic?
Similarly, how should I convert back from bayesflow.types.Tensor to np.ndarray/float

Keras offers two functions for that, keras.ops.convert_to_tensor and keras.ops.convert_to_numpy, which would be appropriate here.

Since other functions in diagnostics/metrics generally use np.ndarray or Mapping[str, np.ndarray] as arguments, should we maintain this consistency across the new utility functions?

Yes, I think following the existing functions in style and signature would be good here.

Naming & Import Conventions

Good points. I missed the naming collision, so maybe a prefix like compute_ as you proposed initially might be better here.

I think we would want to follow the pattern that we provide the functions by importing them in diagnostics/metrics/__init__.py, so that the public user interface becomes diagnostics.metrics.compute_mmd_hypothesis_test(). For this to work well, it would be great if you could move the module content from the docstring (e.g. the examples section) to the individual functions, as the module itself will be hidden from the users.

@stefanradev93 @paul-buerkner As you are more involved in the diagnostics interface, could one of you please comment what you would prefer regarding naming/structure?

thegialeo · 2025-04-02T12:32:41Z

Thanks for the input. The proposed changes have been pushed. I left listing of Functions and Dependencies in the module docstring for the devs.

…om_summaries into diagnostics.metrics

- unit test output shape of compute_mmd_hypothesis_test_from_summaries

…ximator to be ContinuousApproximator due to assumption of attribute summary_network existing

…None

…st on simple distributions like uniform and normal

…dimensions except the first one

…erence data + add corresponding test cases

paul-buerkner · 2025-04-03T06:55:49Z

Thank you for working on this PR! The naming overlap is indeed not ideal.

Perhaps we can actually rename the plot function. From having a quick look at the code (this is one of the few diagnostics I didn't edit myself yet), it seems as if the functionality is much much general that just for MMD. Since it takes samples from one distribution and compares it to a single empirical value, it could in theory be any test statistic not just MMD. @stefanradev93 can you confirm? If true, I think we should rename the plot function to something more general and then not have the compute_ prefix for the metric, since that is actually specific to MMD.

@vpratz What do you think about this suggestion?

thegialeo · 2025-04-03T10:13:59Z

Thank you for the feedback!

Technically, the functions implemented in this PR do not perform a hypothesis test themselves—they only compute the MMD values. Including "hypothesis_test" in the name might be misleading. Do you have any thoughts on alternative naming?

@paul-buerkner I agree that renaming the plot function makes sense. This would free up mmd_hypothesis_test for a user-facing function that could handle computation and visualization in one, whether the inputs are raw data, summaries, or precomputed MMD values. However, renaming would be a breaking change for existing users who rely on mmd_hypothesis_test, and since this PR is explicitly intended to be non-breaking, I suggest keeping the renaming and user function changes for a separate PR.

Would you like to create an issue for tracking this, or should I go ahead and do it?

…form

…t on backend agnotics Tensors instead of numpy arrays

LarsKue · 2025-04-22T14:43:26Z

This was accidentally closed. We will investigate how to restore the branch and reopen PRs.

…sis-test

vpratz · 2025-04-25T12:51:22Z

Thanks for the changes! As we might change the observed data in the approximator's adapter, I'd prefer if we pass the data to mmd_comparison in the usual way as a dict, and remove the possibility to directly pass a SummaryNetwork there. Users who know what they do can extract the summaries themselves and directly use the mmd_comparison_from_summaries function. I will try to change the code accordingly and ask for your feedback when I'm ready.

vpratz · 2025-04-25T13:07:52Z

@stefanradev93 @LarsKue Do we want to create a "standard" way to obtain the summary outputs from a ContinuousApproximator given some data? As far as I can tell this is currently not possible, and one has to replicate what it is doing internally to obtain them. As this is something we might want to look at for some applications (especially model misspecification, robustness,) having easy access would be nice I think.

This function enables easy access to the summary space. Naming can still be discussed, as well as better integration/reuse in other functions of the approximator.

- remove somewhat redundant mmd_comparison_from_summaries function - rename mmd_comparison to the more general summary_space_comparison, with configurable distance function (default MMD) - only allow calling summary_space_comparison when we can obtain the summary variables directly from the approximator. For all other use cases, directly refer to bootstrap_comparison - update tests to reflect those changes - remove redundant docstrings from the module

vpratz · 2025-04-25T15:57:38Z

I have refactored the functions, please take a look at the individual commit descriptions for details.

From my side, open questions are mainly naming-related, feel free to suggest ideas:

name of the file with the functions (not end-user facing): I propose model_comparison.py, as this is where it comes from
name of the summary_space_comparison function: is this ok like this, or do we want to put a bootstrap into this as well?
name of the ContinuousApproximator.summary_outputs function: There are many competing terms like summary variables/embeddings/summary space... Did we already pick a term we use in BayesFlow and what would you choose here?

Tagging @paul-buerkner @stefanradev93 @LarsKue for those questions.

Are you happy with those changes, or do you see room for improvement? @thegialeo

thegialeo · 2025-04-28T07:31:29Z

Thanks for the updates — the changes look good to me!

The renaming definitely makes sense and improves clarity.
I’m fine with the new behavior of raising an Exception when summary_network=None. It’s a more explicit approach and prevents silent misuse, even though it might require the user to handle the None case in their own codebase.
Removing the old test case for summary_network=None makes sense given the change in behavior, but it would be great to add back the summary_network=None test case for the updated behavior to maintain full test coverage.

vpratz · 2025-04-28T08:06:45Z

Thanks for the feedback and good spot with the test, I have added the missing test case.

stefanradev93 · 2025-04-29T19:07:33Z

There are many competing terms like summary variables/embeddings/summary space... Did we already pick a term we use in BayesFlow and what would you choose here?

Summary variables are the variables that get summarized. I would call the summarized variables summaries and the corresponding space summary space, consistent with our papers.

- add summaries function to ModelCommparisonApproximator as well - add tests for the approximator.summaries functions

vpratz · 2025-04-30T12:04:21Z

Thanks for the comment. I have renamed the function to summaries. If you do not have other comments/requests, I will merge this PR when the tests have passed.

thegialeo added 6 commits April 1, 2025 13:56

- Add mmd_hypothesis_test.py module to the diagnostics/metrics pa…

51249c0

…ckage - Create function signatures based on @vpratz's clarification in related issue bayesflow-org#379

draft implementation for mmd_hypothesis_test_from_summaries() -> TODO…

e04ea09

…: maximum_mean_discrepency takes bf.types.Tensor and return bf.types.Tensor, but at the moment np.ndarray are provided and float return expected

draft implementation for mmd_hypothesis_test() -> TODO: approximator.…

f01c5db

…summary_network takes and return bf.types.Tensor, at the moment np.ndarray is assumed

add draft module docstring

f843462

run pre-commit

927072f

add paper reference to module docstring of diagnostics/metrics/mmd_hy…

0333c77

…pothesis_test.py

thegialeo added 3 commits April 2, 2025 14:18

add type casting between np.ndarray and bf.types.Tensor with keras.ops

48c2f8c

change functions names to have compute_ prefix (see PR comments)

4e91154

move module docstring to function docstrings

d6aa068

thegialeo added 14 commits April 2, 2025 14:39

import compute_mmd_hypothesis_test and compute_mmd_hypothesis_test_fr…

76f88b5

…om_summaries into diagnostics.metrics

- update compute_mmd_hypothesis_test_from_summaries implementation

f2f863c

- unit test output shape of compute_mmd_hypothesis_test_from_summaries

update implementation of compute_mmd_hypothesis_test + restrict appro…

d8cadf9

…ximator to be ContinuousApproximator due to assumption of attribute summary_network existing

add unit test for output shape of compute_mmd_hypothesis_test

c8f54c0

handle case for when ContinuousApproximator.summary_network is None

ef9dd00

add unit test case for when ContinuousApproximator.summary_network = …

f5687e8

…None

unit test for computed MMD values to be positive

5b502af

add test cases for same and different distributions mmd hypothesis te…

ec1e1d3

…st on simple distributions like uniform and normal

add Raises for unequal shapes of observed and reference summaries on …

8bacff9

…dimensions except the first one

add Raises to docstring

454304b

add test cases for Raises

ab0b895

add test cases for indirect Raises through compute_hypothesis_test

6fa1925

add Raises to compute_hypothesis_test for unmatching observed and ref…

3c6c33c

…erence data + add corresponding test cases

rename unit test functions

c2e52c4

thegialeo marked this pull request as draft April 2, 2025 16:33

thegialeo added 2 commits April 3, 2025 12:15

remove transitive Raises unit test to avoid coupling in the testing

b29a365

adjust mock summary_network in unit tests to be a deterministic trans…

f67da8a

…form

thegialeo added 2 commits April 22, 2025 14:32

adjust mock comparison_fn in unit tests of bootstrap_comparison to ac…

a39ac6e

…t on backend agnotics Tensors instead of numpy arrays

formatting to avoid too long lines

1485e82

thegialeo marked this pull request as ready for review April 22, 2025 13:37

thegialeo mentioned this pull request Apr 22, 2025

Refactor mmd_hypothesis_test Naming and Introduce User-Facing Wrapper Function #428

Open

4 tasks

stefanradev93 deleted the branch bayesflow-org:dev April 22, 2025 14:37

stefanradev93 closed this Apr 22, 2025

LarsKue reopened this Apr 22, 2025

[no ci] Merge remote-tracking branch 'upstream/dev' into feat-hypothe…

0db4b5f

…sis-test

stefanradev93 requested review from LarsKue and paul-buerkner April 25, 2025 12:40

vpratz self-assigned this Apr 25, 2025

vpratz added 3 commits April 25, 2025 15:32

Add summary_outputs method to approximator.

5f2625a

This function enables easy access to the summary space. Naming can still be discussed, as well as better integration/reuse in other functions of the approximator.

Rename mmd_hypothesis_test.py to model_misspecification.py

7ee14fa

vpratz added 2 commits April 28, 2025 07:24

Merge remote-tracking branch 'upstream/dev' into feat-hypothesis-test

8c363fc

Merge remote-tracking branch 'upstream/dev' into feat-hypothesis-test

eba5f19

add test for case summary_network=None in summary_space_comparison

87c061c

rename summary_outputs to summaries

e23206d

- add summaries function to ModelCommparisonApproximator as well - add tests for the approximator.summaries functions

Merge remote-tracking branch 'upstream/dev' into feat-hypothesis-test

9a5212d

vpratz merged commit 62675c3 into bayesflow-org:dev May 2, 2025
9 checks passed

vpratz mentioned this pull request May 2, 2025

Feature Request: Reintroduce trainer.mmd_hypothesis_test() equivalent in bayesflow v2 #379

Closed

vpratz mentioned this pull request Jun 16, 2025

Release 2.0.4 #510

Merged

Implement Feature for Issue #379: MMD Hypothesis Test #384

Implement Feature for Issue #379: MMD Hypothesis Test #384

Uh oh!

Conversation

thegialeo commented Apr 1, 2025

Description

Related Issue

Type of Change

Open Questions

Clarification on bayesflow.types.Tensor Conversions

Data Type Consistency

Naming & Import Conventions

Unit Tests

Uh oh!

vpratz commented Apr 2, 2025

Uh oh!

thegialeo commented Apr 2, 2025

Uh oh!

paul-buerkner commented Apr 3, 2025

Uh oh!

thegialeo commented Apr 3, 2025

Uh oh!

LarsKue commented Apr 22, 2025

Uh oh!

vpratz commented Apr 25, 2025

Uh oh!

vpratz commented Apr 25, 2025

Uh oh!

vpratz commented Apr 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

thegialeo commented Apr 28, 2025

Uh oh!

vpratz commented Apr 28, 2025

Uh oh!

stefanradev93 commented Apr 29, 2025

Uh oh!

vpratz commented Apr 30, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Clarification on `bayesflow.types.Tensor` Conversions

vpratz commented Apr 25, 2025 •

edited

Loading