Enhance performance evaluations #1034

ablaom · 2025-12-02T04:28:16Z

This PR provides a few enhancements for the results of evaluate or evaluate!, which estimate
various kinds of out-of-sample performance of MLJ models (resp., machines). These should make evaluate more convenient when applying it to batches of models, to be compared:

The estimate of standard errors, which is currently only calculated for display of a
returned object, is now calculated when the object is constructed, and is a new,
user-accessible property, uncertainty_radius_95.
Users can now "tag" their estimates, by doing evaluate("some tag" => model, ...)
instead of evaluate(model, ...) and the returned object has a new user-accessible
property tag for storing this. Tags are auto-generated using the model name when not supplied,
but for deeply wrapped models, this is often inadequate, hence the addition of user
tags. The tag is shown when the object is displayed.
Users can now evaluate a vector of models, or tagged models, as in the following
example, where we can see the user-supplied tags in the output:

evaluate(["const" => ConstantClassifier(), "knn" => KNNClassifier()], X , y)
# 2-element Vector{...}
#  PerformanceEvaluation("const", 0.698 ± 0.0062)
#  PerformanceEvaluation("knn", 2.22e-16 ± 0.0)

Similar changes apply to the evaluate!(::Machine, ...) form.

In the future we might add a summarize(evaluations) to convert the kind of information displayed here in a table.

I found a few corner-case bugs in the display of performance evaluation objects, which I
have fixed. I have also added a lot more testing of the display, and added examples to the
docstrings for evaluate and evaluate!.

This PR closes #1031.

ablaom · 2025-12-02T05:21:56Z

cc @LucasMatSP @mohdibntarek

LucasMatSP · 2025-12-02T12:50:19Z

Nice feature!

ablaom · 2025-12-02T18:28:45Z

In the vector case, perhaps we should just parallelize with multiple threads by default. It's just serial for now. @OkonSamuel What do you think?

OkonSamuel · 2025-12-02T18:54:13Z

src/resampling.jl

+
+# multiple model evaluations:
+evaluate(
+    models_or_pairs::AbstractVector{<:Union{Machine,Pair{String,<:Model}}}, args...;


Is there any reason we are allowing Machines to be passed here. I say this because of the type <:Union{Machine,Pair{String,<:Model}} I thought this would have been <:Union{Model,Pair{String,<:Model}}

OkonSamuel · 2025-12-02T19:06:33Z

In the vector case, perhaps we should just parallelize with multiple threads by default. It's just serial for now. @OkonSamuel What do you think?

If we did this we will encounter some issues. For example we might have a data race. I don't think anything prevents someone from doing this

mach1 = machine(ConstantClassifier(), X, y)
evaluate!(["const1" => mach1, "const2" => mach1])

But if we ran this on the same thread, we will be modifying the same machine object from different threads which could lead to race issue.

Although this shouldn't be an issue if we just use the regular evaluate method and if the models run of different datasets, this is perfect. The only problem here is if the model runs on the same dataset, then we will have different copies of the same dataset (No?).

ablaom added 3 commits December 2, 2025 08:45

add uncertainty_radius_95 to PerformanceEvaluation structs;

2a06338

add the tag field to PerformanceEvaluation objects

e41417c

add multi-model support for evaluate/evaluate!

dcc5889

OkonSamuel self-assigned this Dec 2, 2025

OkonSamuel reviewed Dec 2, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Enhance performance evaluations #1034

Enhance performance evaluations #1034

Uh oh!

ablaom commented Dec 2, 2025 •

edited

Loading

Uh oh!

ablaom commented Dec 2, 2025

Uh oh!

LucasMatSP commented Dec 2, 2025

Uh oh!

ablaom commented Dec 2, 2025

Uh oh!

OkonSamuel Dec 2, 2025

Uh oh!

OkonSamuel commented Dec 2, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Enhance performance evaluations #1034

Are you sure you want to change the base?

Enhance performance evaluations #1034

Uh oh!

Conversation

ablaom commented Dec 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ablaom commented Dec 2, 2025

Uh oh!

LucasMatSP commented Dec 2, 2025

Uh oh!

ablaom commented Dec 2, 2025

Uh oh!

OkonSamuel Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

OkonSamuel commented Dec 2, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ablaom commented Dec 2, 2025 •

edited

Loading