Skip to content

Conversation

@tiemvanderdeure
Copy link
Contributor

@tiemvanderdeure tiemvanderdeure commented Sep 16, 2025

closes #51

Todo:

  • add tests
  • move WARN_UNORDERED to somewhere more discoverable

@codecov
Copy link

codecov bot commented Sep 16, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 91.54%. Comparing base (d50f3c6) to head (d3122db).
⚠️ Report is 10 commits behind head on dev.

Additional details and impacted files
@@            Coverage Diff             @@
##              dev      #52      +/-   ##
==========================================
- Coverage   92.22%   91.54%   -0.68%     
==========================================
  Files          14       14              
  Lines         720      769      +49     
==========================================
+ Hits          664      704      +40     
- Misses         56       65       +9     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@tiemvanderdeure tiemvanderdeure marked this pull request as draft September 17, 2025 12:35
"$ContinuousBoyceIndexDoc"
ContinuousBoyceIndex
"$ContinuousBoyceIndexDoc"
cbi(x, y; kw...) = ContinuousBoyceIndex(; kw...)(x, y)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ablaom what would be the right way to go here? I know the other functions don't have this interface, but here I think it would make a lot of sense to allow cbi(ŷ, y; n_bins=5)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, you have to make n_bins part of the struct. So you do ContinuousBoyceIndex(nbins=5)(yhat, y).

However, if you want, you can define a pure functional version Functions.continuous_boyce_index here and refactor so that your struct version calls that. And then documentation can point out the core implementation, like we do for MatthewsCorrelation.

@tiemvanderdeure
Copy link
Contributor Author

Sorry, forgot about this one a little bit but let's finish it :). I cleaned up some code and moved the main thing to functions.jl as you suggested.

The funny thing about the CBI is that there are so many implementations and most of them are either a little weird or slightly wrong. But I've looked at the original at the original paper to make sure this implementation follows their idea very closely.

@tiemvanderdeure tiemvanderdeure marked this pull request as ready for review November 3, 2025 10:05
@ablaom
Copy link
Member

ablaom commented Nov 3, 2025

This is looking pretty good. Thanks for the follow through.

There's some missing coverage for some corner-case (?) logic in the core function. Be great if you can address that.


ContinuousBoyceIndex(; kw...) = _ContinuousBoyceIndex(; kw...) |> robust_measure |> fussy_measure

function (m::_ContinuousBoyceIndex)(ŷ::UnivariateFiniteArray, y::NonMissingCatArrOrSub; warn=true)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The type annotation for yhat is too strict. Either remove it altogether, or you could do ::AbstractArray{<:UnivariateFinite}. For consider

julia> y = categorical(rand("ab", 10), ordered=true);

julia> d = CategoricalDistributions.Distributions.fit(UnivariateFinite, y);

julia> yhat = fill(d, 10);

julia> ContinuousBoyceIndex()(yhat, y)
ERROR: MethodError: no method matching (::StatisticalMeasures._ContinuousBoyceIndex)(::Vector{…}, ::CategoricalVector{…})
The object of type `StatisticalMeasures._ContinuousBoyceIndex` exists, but no method is defined for this combination of argument types when trying to treat it as a callable object.

@ablaom
Copy link
Member

ablaom commented Nov 4, 2025

Are you absolutely sure you want this @info logging? Generally our measures are silent, or there is a way to silence them.

julia> cbi(yhat, y)
[ Info: removing 30 bins without any observations
0.5410565658852077

kind_of_proxy=StatisticalMeasures.LearnAPI.Distribution(),
orientation=Score(),
external_aggregation_mode=Mean(),
human_name = "continuous boyce index",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is Boyce not a proper name?

Suggested change
human_name = "continuous boyce index",
human_name = "continuous Boyce index",

Copy link
Member

@ablaom ablaom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to have this. Just a few minor points.

I'd be inclined to call the core function Functions.continuous_boyce_index as we're generally verbose about those, but I'll leave it up to you.

Note to self: I have reviewed the traits.

Co-authored-by: Anthony Blaom, PhD <[email protected]>
@tiemvanderdeure
Copy link
Contributor Author

Are you absolutely sure you want this @info logging? Generally our measures are silent, or there is a way to silence them.

yeah maybe not. Let me think about that. Thanks for the thorough review

@tiemvanderdeure
Copy link
Contributor Author

Okay I ended up changes a few more things:

  • added verbosity, as you suggested. I also made it so you can set it to 0 to avoid the warning about the ordered levels
  • I actually changed some defaults, so it follows the implementation in the R packages less closely but makes much more sense. Basically since we know that yhat is always probabilities and those should be on a scale from 0 to 1, we can just set min and max to 0 and 1 and set a reasonable bin width without any additional math. In R it's the wild west and some packages will return values between 0 and 100 or 1000 from their models, and so the packages implementing CBI set the width of the bins based on the data.

This measure is really nice for presence-only models, but it's pretty weird you can get such different values from it based on some of these parameters (and that there aren't really any agreed-upon defaults).

I've kept the name at Functions.cbi - we also have Functions.auc and the full name is pretty long

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

continuous boyce index

2 participants