Skip to content

Conversation

@thegialeo
Copy link
Contributor

Description

This PR introduces the first draft implementation of the MMD hypothesis test feature based on the discussion in issue #379.

Note: The implementation is not complete and should not be merged at this stage.

The key open questions and considerations are outlined below:

Related Issue

Type of Change

🚀 New feature (non-breaking change which adds functionality, no existing code was changed)

Open Questions

Clarification on bayesflow.types.Tensor Conversions

  • What is the best way to parse from np.ndarray to bayesflow.types.Tensor while staying backend agnostic?
  • Similarly, how should I convert back from bayesflow.types.Tensor to np.ndarray/float, given that maximum_mean_discrepancy and approximator.summary_network operate with bayesflow.types.Tensor?

Data Type Consistency

  • Should observed_data/observed_summaries and reference_data/reference_summaries be of type bayesflow.types.Tensor, or should they take np.ndarray as arguments and cast as needed?
  • Since other functions in diagnostics/metrics generally use np.ndarray or Mapping[str, np.ndarray] as arguments, should we maintain this consistency across the new utility functions?

Naming & Import Conventions

  • diagnostics/plots already includes mmd_hypothesis_test.py, which defines mmd_hypothesis_test(). To avoid namespace collisions, should we rename either the plot function or the metric function?
  • Current convention in the codebase favors from package.module import function/class over import package.module and accessing function/class through the imported package.module, so naming both functions the same could be problematic at some point in the future.

Unit Tests

  • Is tests/test_diagnostics/test_diagnostics_metrics.py the correct location for new unit tests for the implemented functions?

…ckage

- Create function signatures based on @vpratz's clarification in related issue bayesflow-org#379
…: maximum_mean_discrepency takes bf.types.Tensor and return bf.types.Tensor, but at the moment np.ndarray are provided and float return expected
…summary_network takes and return bf.types.Tensor, at the moment np.ndarray is assumed
@vpratz
Copy link
Collaborator

vpratz commented Apr 2, 2025

Thanks a lot for the PR!

What is the best way to parse from np.ndarray to bayesflow.types.Tensor while staying backend agnostic?
Similarly, how should I convert back from bayesflow.types.Tensor to np.ndarray/float

Keras offers two functions for that, keras.ops.convert_to_tensor and keras.ops.convert_to_numpy, which would be appropriate here.

Since other functions in diagnostics/metrics generally use np.ndarray or Mapping[str, np.ndarray] as arguments, should we maintain this consistency across the new utility functions?

Yes, I think following the existing functions in style and signature would be good here.

Naming & Import Conventions

Good points. I missed the naming collision, so maybe a prefix like compute_ as you proposed initially might be better here.

I think we would want to follow the pattern that we provide the functions by importing them in diagnostics/metrics/__init__.py, so that the public user interface becomes diagnostics.metrics.compute_mmd_hypothesis_test(). For this to work well, it would be great if you could move the module content from the docstring (e.g. the examples section) to the individual functions, as the module itself will be hidden from the users.

@stefanradev93 @paul-buerkner As you are more involved in the diagnostics interface, could one of you please comment what you would prefer regarding naming/structure?

@thegialeo
Copy link
Contributor Author

Thanks for the input. The proposed changes have been pushed. I left listing of Functions and Dependencies in the module docstring for the devs.

@thegialeo thegialeo marked this pull request as draft April 2, 2025 16:33
@paul-buerkner
Copy link
Contributor

Thank you for working on this PR! The naming overlap is indeed not ideal.

Perhaps we can actually rename the plot function. From having a quick look at the code (this is one of the few diagnostics I didn't edit myself yet), it seems as if the functionality is much much general that just for MMD. Since it takes samples from one distribution and compares it to a single empirical value, it could in theory be any test statistic not just MMD. @stefanradev93 can you confirm? If true, I think we should rename the plot function to something more general and then not have the compute_ prefix for the metric, since that is actually specific to MMD.

@vpratz What do you think about this suggestion?

@thegialeo
Copy link
Contributor Author

Thank you for the feedback!

Technically, the functions implemented in this PR do not perform a hypothesis test themselves—they only compute the MMD values. Including "hypothesis_test" in the name might be misleading. Do you have any thoughts on alternative naming?

@paul-buerkner I agree that renaming the plot function makes sense. This would free up mmd_hypothesis_test for a user-facing function that could handle computation and visualization in one, whether the inputs are raw data, summaries, or precomputed MMD values. However, renaming would be a breaking change for existing users who rely on mmd_hypothesis_test, and since this PR is explicitly intended to be non-breaking, I suggest keeping the renaming and user function changes for a separate PR.

Would you like to create an issue for tracking this, or should I go ahead and do it?

@thegialeo thegialeo marked this pull request as ready for review April 22, 2025 13:37
@stefanradev93 stefanradev93 deleted the branch bayesflow-org:dev April 22, 2025 14:37
@LarsKue
Copy link
Contributor

LarsKue commented Apr 22, 2025

This was accidentally closed. We will investigate how to restore the branch and reopen PRs.

@LarsKue LarsKue reopened this Apr 22, 2025
@vpratz
Copy link
Collaborator

vpratz commented Apr 25, 2025

Thanks for the changes! As we might change the observed data in the approximator's adapter, I'd prefer if we pass the data to mmd_comparison in the usual way as a dict, and remove the possibility to directly pass a SummaryNetwork there. Users who know what they do can extract the summaries themselves and directly use the mmd_comparison_from_summaries function. I will try to change the code accordingly and ask for your feedback when I'm ready.

@vpratz
Copy link
Collaborator

vpratz commented Apr 25, 2025

@stefanradev93 @LarsKue Do we want to create a "standard" way to obtain the summary outputs from a ContinuousApproximator given some data? As far as I can tell this is currently not possible, and one has to replicate what it is doing internally to obtain them. As this is something we might want to look at for some applications (especially model misspecification, robustness,) having easy access would be nice I think.

@vpratz vpratz self-assigned this Apr 25, 2025
vpratz added 3 commits April 25, 2025 15:32
This function enables easy access to the summary space. Naming can still
be discussed, as well as better integration/reuse in other functions of
the approximator.
- remove somewhat redundant mmd_comparison_from_summaries function
- rename mmd_comparison to the more general summary_space_comparison,
  with configurable distance function (default MMD)
- only allow calling summary_space_comparison when we can obtain the
  summary variables directly from the approximator. For all other use
  cases, directly refer to bootstrap_comparison
- update tests to reflect those changes
- remove redundant docstrings from the module
@vpratz
Copy link
Collaborator

vpratz commented Apr 25, 2025

I have refactored the functions, please take a look at the individual commit descriptions for details.

From my side, open questions are mainly naming-related, feel free to suggest ideas:

  • name of the file with the functions (not end-user facing): I propose model_comparison.py, as this is where it comes from
  • name of the summary_space_comparison function: is this ok like this, or do we want to put a bootstrap into this as well?
  • name of the ContinuousApproximator.summary_outputs function: There are many competing terms like summary variables/embeddings/summary space... Did we already pick a term we use in BayesFlow and what would you choose here?

Tagging @paul-buerkner @stefanradev93 @LarsKue for those questions.

Are you happy with those changes, or do you see room for improvement? @thegialeo

@thegialeo
Copy link
Contributor Author

Thanks for the updates — the changes look good to me!

  • The renaming definitely makes sense and improves clarity.
  • I’m fine with the new behavior of raising an Exception when summary_network=None. It’s a more explicit approach and prevents silent misuse, even though it might require the user to handle the None case in their own codebase.
  • Removing the old test case for summary_network=None makes sense given the change in behavior, but it would be great to add back the summary_network=None test case for the updated behavior to maintain full test coverage.

@vpratz
Copy link
Collaborator

vpratz commented Apr 28, 2025

Thanks for the feedback and good spot with the test, I have added the missing test case.

@stefanradev93
Copy link
Contributor

  • There are many competing terms like summary variables/embeddings/summary space... Did we already pick a term we use in BayesFlow and what would you choose here?

Summary variables are the variables that get summarized. I would call the summarized variables summaries and the corresponding space summary space, consistent with our papers.

- add summaries function to ModelCommparisonApproximator as well
- add tests for the approximator.summaries functions
@vpratz
Copy link
Collaborator

vpratz commented Apr 30, 2025

Thanks for the comment. I have renamed the function to summaries. If you do not have other comments/requests, I will merge this PR when the tests have passed.

@vpratz vpratz merged commit 62675c3 into bayesflow-org:dev May 2, 2025
9 checks passed
@vpratz vpratz mentioned this pull request Jun 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants