Skip to content

Conversation

@ablaom
Copy link
Member

@ablaom ablaom commented Dec 2, 2025

This PR provides a few enhancements for the results of evaluate or evaluate!, which estimate
various kinds of out-of-sample performance of MLJ models (resp., machines). These should make evaluate more convenient when applying it to batches of models, to be compared:

  • The estimate of standard errors, which is currently only calculated for display of a
    returned object, is now calculated when the object is constructed, and is a new,
    user-accessible property, uncertainty_radius_95.

  • Users can now "tag" their estimates, by doing evaluate("some tag" => model, ...)
    instead of evaluate(model, ...) and the returned object has a new user-accessible
    property tag for storing this. Tags are auto-generated using the model name when not supplied,
    but for deeply wrapped models, this is often inadequate, hence the addition of user
    tags. The tag is shown when the object is displayed.

  • Users can now evaluate a vector of models, or tagged models, as in the following
    example, where we can see the user-supplied tags in the output:

evaluate(["const" => ConstantClassifier(), "knn" => KNNClassifier()], X , y)
# 2-element Vector{...}
#  PerformanceEvaluation("const", 0.698 ± 0.0062)
#  PerformanceEvaluation("knn", 2.22e-16 ± 0.0)

Similar changes apply to the evaluate!(::Machine, ...) form.

In the future we might add a summarize(evaluations) to convert the kind of information displayed here in a table.

I found a few corner-case bugs in the display of performance evaluation objects, which I
have fixed. I have also added a lot more testing of the display, and added examples to the
docstrings for evaluate and evaluate!.

This PR closes #1031.

@ablaom
Copy link
Member Author

ablaom commented Dec 2, 2025

cc @LucasMatSP @mohdibntarek

@OkonSamuel OkonSamuel self-assigned this Dec 2, 2025
@LucasMatSP
Copy link
Collaborator

Nice feature!

@ablaom
Copy link
Member Author

ablaom commented Dec 2, 2025

In the vector case, perhaps we should just parallelize with multiple threads by default. It's just serial for now. @OkonSamuel What do you think?


# multiple model evaluations:
evaluate(
models_or_pairs::AbstractVector{<:Union{Machine,Pair{String,<:Model}}}, args...;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any reason we are allowing Machines to be passed here. I say this because of the type <:Union{Machine,Pair{String,<:Model}} I thought this would have been <:Union{Model,Pair{String,<:Model}}

@OkonSamuel
Copy link
Member

In the vector case, perhaps we should just parallelize with multiple threads by default. It's just serial for now. @OkonSamuel What do you think?

If we did this we will encounter some issues. For example we might have a data race. I don't think anything prevents someone from doing this

mach1 = machine(ConstantClassifier(), X, y)
evaluate!(["const1" => mach1, "const2" => mach1])

But if we ran this on the same thread, we will be modifying the same machine object from different threads which could lead to race issue.

Although this shouldn't be an issue if we just use the regular evaluate method and if the models run of different datasets, this is perfect. The only problem here is if the model runs on the same dataset, then we will have different copies of the same dataset (No?).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Allow evaluate to have multiple models as argument

4 participants