Evaluation, Reproducibility, Benchmarks Meeting 40

Minutes of Meeting 40

Date: 28th January, 2026

Present

Annika
Olivier
Carole
Nick

Next Meeting

Carole has a conflict, maybe we should move to March 4th? -done
Nick will have many conflicts moving forward. When more people are present, we should choose a new member for the secretary role

Update From CI Project

Paper is submitted!
Carole had some comments. If it goes to revisions, those can be integrated

Brainstorming Session re. CI Project Implementation

Thinking is that it could be integrated into metrics reloaded
- This is also a good stage for warnings to be presented to the user
- Metrics reloaded does not depend on SciKit Learn, but SciPy and (we think) Statsmodels is already imported
In contrast with metrics reloaded, the CI project doesn't really give recommendations (yet)
- Maybe we think of it more as just elucidating "risks" of poor statistical power etc. for the user
- Can also have task-specific "defaults" that are not exactly recommendations, but would nudge the user in that direction
One of the main findings is that with very low sample sizes, parametric methods work a bit better. This could be an example of a branching workflow in the implementation
- A warning/default here
Also check for micro- vs. macro-averaging
- A subtle difference but often overlooked
There is a known issue with approximating average precision in finite/small samples
- Re. approximating an integral of a function that is not monotonic
- Could be a warning here, but no great solution
With small sample sizes, very narrow distributions by chance can violate I.I.D. assumption and lead to erroneously narrow CIs
It is OK if we have to tell the user that computing the CI based on the data they provide is not possible -- or at least not possible to do well
Want to make sure we decouple metrics reloaded from CI tool -- can use both, but each can also be used independently
Specifics
- For segmentation -- just CIs over the mean and median -- CIs over differences can be implemented later (complication is that it currently accepts just 1 csv per model. Would need to cross-match these and handle asymmetric missing data, column names etc., also what to do when we have so many pairwise differences and may need to handle multiplicity etc.)
Strategy
- If Olivier can share the code, then Carole can integrate these things into Metrics Reloaded
  - The paper has associated code that's public, but documentation is somewhat lacking
  - Most of the code was for simulation studies, which obviously will not be included
  - Better to use the SciPy implementation rather than the custom implementation (both are tested/good, but no reason to reinvent the wheel)
After this is implemented, it could be a feature of the Metrics Reloaded software paper
- The way we write this depends heavily on which journal we target
- JMLR is a well-respected journal that has a software section (4 pages only?)
- We did some brainstorming for which journal here back in July of 2024
- Who are we trying to reach with this paper? Just medical imaging community, or machine learning community more generally?
  - Could maybe sent to TMI or MedIA and see what happens? No software track, but they are familiar with the tooling
    - If they don't want software papers, then it would be fast (probably)

Copyright (c) MONAI Consortium

Evaluation, Reproducibility, Benchmarks Meeting 40

Minutes of Meeting 40

Present

Next Meeting

Update From CI Project

Brainstorming Session re. CI Project Implementation

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!